1

I have an AWS S3 bucket filled with data parameterized by date. I'd like to extract that data one date at a time using the AWS CLI (reference), specifically the aws s3 sync command.

The following command does what I expect it to do:

aws s3 sync s3://my-bucket-1 . --exclude "*" --include "*2018-01-17*" --dryrun

Running this command from my command line generates a (dryrun) download for every file in my bucket containing the substring 2018-01-17.

Great! To simplify the necessary file operations, I've written a small CLI wrapper around this executor. This wrapper is in Python, and uses the subprocess.run facility to do its work. The entire operation boils down to the following call:

subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '"*"', '--include', '"*2018-01-17*"', '--dryrun'])

The problem is that when I run this statement, I get a (dryrun) download back for every file in the bucket. That is, data is returned that corresponds with bucket entries from 01-18, 01-19, and so on. The --exclude/--include rules fail to apply, and the result is the same as if I had simply run aws s3 sync s3://my-bucket-1 .

Why does this occur?

Aleksey Bilogur
  • 3,286
  • 2
  • 22
  • 43
  • 1
    I think when using a list, it's unnecessary to quote your arguments. IE `'"*2018-01-17*"'` perhaps should be `'*2018-01-17*'`. See [this question](https://stackoverflow.com/questions/14928860/passing-double-quote-shell-commands-in-python-to-subprocess-popen) which describes a solution that uses unquoted arguments in a list where quotes would otherwise be used in the string version of the command. – sytech Jan 20 '18 at 17:27
  • A quick test confirms that this is the correct answer. I'm mystified *why* though. – Aleksey Bilogur Jan 20 '18 at 17:30
  • 1
    It's a design decision that was made. I guess the idea is that Python will do the right thing for you. Suppose you pass variables into commands that may or may not need quoting. In the string-version, quoting helps identify that the contained string within the double-quotes is part of the same argument. When you pass arguments in a list, it's already clear what's what... So the assumption is that if you have a `"` in the part of the command, it is to be interpreted literally. Hope that makes sense. – sytech Jan 20 '18 at 17:33

1 Answers1

4

When using the list form of invocation, you should not use those additional double quotes. Normally, when your command is given as a single string, quotes can be identify that the contents between the double quotes is all part of a single argument.

If you use double quotes like that inside of a list item, it's understood that it should be parsed to pass the quote literally as an argument, so it is escaping your quote and passing it literally. Consequently, nothing matches your include and exclude parameters because the argument contains a literal ".

So, the following should be the corrected arguments.

subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '*', '--include', '*2018-01-17*', '--dryrun'])
sytech
  • 7,848
  • 2
  • 22
  • 49