Amazon S3
version LATEST
Set up
Source
To get set up with the Amazon S3 connector, you will need the following:
- Region, eg: eu-west-1
- AWS Access Key
- AWS Secret Key
To get these AWS credentials, see Amazon documentation here.
Extract
For each extract, the following information is required:
- S3 Bucket Name
- Load type
- Load a single file.
- Load all files inside a folder.
- Path Name
- When “Load a single file” is selected, this is the path name of a single file to load.
- When “Load all files inside a folder” is selected, this will indicate the path name of the folder containing the files to be loaded.
Data structure
When loading all files from inside a folder, the data structure of each file must be identical.
- File Type
- CSV
- JSON
- Parquet
- XLS
- XLSX
- XML
- Dynamo DB export
- Header - if this is checked, the output columns are generated from the key names of the first row of data in the input files, otherwise, the column names will be named as column1, column2… columnN
- Sheet Name - Applies only to xls and xlsx files. Enter a value to load the data from the specified sheet. If this is left blank, kleene loads the data from the first sheet.
The following option and behavior apply when “Load all files inside a folder” has been selected:
- Load only files newer than X number of days ago - any files older than X number of days ago will be ignored
Once the files have been loaded, the list of ingested files will be stored within the _KLEENE_FILENAME column in the destination warehouse table.
- Filter Files - When this option is selected you can select all files that follow a certain pattern.
By inputting a wildcard in the Regex pattern
input you can return certain files in a folder. For example if you put *.csv
in the regex pattern all files that follow the pattern <something>.csv
will be extracted.
Additional Indo
- For incremental loads, only files that have not been loaded previously will be processed.
- Zipped files can be loaded in as well, if files end with .zip extension they will be automatically unzipped and loaded.
Limitations
- When “Load all files inside a folder” has been selected, only files that with extensions that ends with the respective file type are processed. For example, when File Type: JSON is selected, only files ending with .JSON are processed.
- For DynamoDB export files, a file extension of “JSON.gz” is expected for the input files.
- For JSON files, a list of JSON objects is expected on the top level, where each object is a single row in the output. If the top-level object is a single JSON object or hashmap, only a single row will be output.
- For XML files, no additional un-nesting or processing is done, only one row is output. The entire XML is output as JSON format as a single row into the table column named content.
- Format and File limitations are rooted in the Snowflake environment, please see a list of these here
Updated 6 months ago