This component downloads files from S3 to /data/out/files.
- Supports
*wildcards - Handles subfolders
- Can process only new files
- Skips files stored in Glacier & Glacier Deep Archive
loginType(required) -- Login type (credentialsorrole)accessKeyId(required ifloginTypeis"credentials") -- AWS Access Key ID#secretAccessKey(required ifloginTypeis"credentials") -- AWS Secret Access KeyaccountId(required if loginType is "role") - AWS Account IDbucket(required) -- AWS S3 bucket name (the region will be autodetected)key(required) -- Search key prefix, optionally ending with a*wildcard. All filed downloaded using a wildcard are stored in/data/out/files/wildcard.saveAs(optional) -- Store all downloaded files in a specified folder.includeSubfolders(optional) -- Download all subfolders. Available only when using a wildcard in the search key prefix.- The subfolder structure will be flattened, replacing
/in the path with-, e.g.,folder1/file1.csv=>folder1-file1.csv. - Existing
-characters will be escaped to avoid colisions with another-, e.g.,collision-file.csv=>collision--file.csv.
- The subfolder structure will be flattened, replacing
newFilesOnly(optional) -- Download only new files.- The last downloaded file's timestamp is stored in the
lastDownloadedFileTimestampproperty of the state file. - If multiple files have the same timestamp,
processedFilesInLastTimestampSecondrecords all processed files within that second.
- The last downloaded file's timestamp is stored in the
limit(optional, default0) -- Maximum number of files to download.- If
keymatches more files thanlimit, the oldest files will be downloaded first. - When used with
newFilesOnly, the extractor will process up tolimitnew files that have not been downloaded yet.
- If
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfile.csv",
"includeSubfolders": false,
"newFilesOnly": false
}
}{
"parameters": {
"accountId": "1234567890",
"bucket": "myBucket",
"key": "myfile.csv",
"includeSubfolders": false,
"newFilesOnly": false
}
}{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"saveAs": "myfolder",
"includeSubfolders": false,
"newFilesOnly": false
}
}{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"includeSubfolders": true,
"newFilesOnly": true
}
}Note: state.json must be provided in this case.
{
"parameters": {
"accessKeyId": "AKIA****",
"#secretAccessKey": "****",
"bucket": "myBucket",
"key": "myfolder/*",
"includeSubfolders": true,
"newFilesOnly": true,
"limit": 100
}
}Note: state.json has to be provided in this case.
- Create an AWS S3 bucket and IAM user using
aws-services.jsonCloudFormation template. - Create a
.envfile. Use the output of theaws-servicesCloudFront stack to populate the variables, along with your Redshift credentials.
AWS_S3_BUCKET=
AWS_REGION=
UPLOAD_USER_AWS_ACCESS_KEY=
UPLOAD_USER_AWS_SECRET_KEY=
DOWNLOAD_USER_AWS_ACCESS_KEY=
DOWNLOAD_USER_AWS_SECRET_KEY=
KEBOOLA_USER_AWS_ACCESS_KEY=
KEBOOLA_USER_AWS_SECRET_KEY=
ACCOUNT_ID=
ROLE_NAME=
KBC_PROJECTID=
KBC_STACKID=
- Build Docker images
docker-compose build
- Install Composer packages
docker-compose run --rm dev composer install --prefer-dist --no-interaction
Run tests with the following command.
docker-compose run --rm dev ./vendor/bin/phpunit
MIT licensed, see the LICENSE file.