I have an AWS S3 bucket filled with data parameterized by date. I'd like to extract that data one date at a time using the AWS CLI (reference), specifically the aws s3 sync
command.
The following command does what I expect it to do:
aws s3 sync s3://my-bucket-1 . --exclude "*" --include "*2018-01-17*" --dryrun
Running this command from my command line generates a (dryrun) download
for every file in my bucket containing the substring 2018-01-17
.
Great! To simplify the necessary file operations, I've written a small CLI wrapper around this executor. This wrapper is in Python, and uses the subprocess.run
facility to do its work. The entire operation boils down to the following call:
subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '"*"', '--include', '"*2018-01-17*"', '--dryrun'])
The problem is that when I run this statement, I get a (dryrun) download
back for every file in the bucket. That is, data is returned that corresponds with bucket entries from 01-18, 01-19, and so on. The --exclude
/--include
rules fail to apply, and the result is the same as if I had simply run aws s3 sync s3://my-bucket-1 .
Why does this occur?