Amazon S3: Files found while listing but missing during copy

Question

I have written a code where I am listing the files in a S3 location and then in the loop copying those files to a temp folder for processing. After processing, I then move these files to an archive location.

This code-logic gets executed frequently in a cronjob.

Lately, my code is failing because in subsequent executions of the code, it somehow finds some files (which were moved in the previous execution) while listing them, but fails while trying to copy since the files don't actually exists.

The error I get is - A client error (404) occurred when calling the HeadObject operation: Key {some-file} does not exists.

Can someone please help me understand why I am facing this issue and how to resolve it?

Any help will be greatly appreciated.

You may wish to use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/) to copy files. It has an `aws s3 cp` command and also an `aws s3 sync` command that can synchronize files between S3 and a local directory. — John Rotenstein, Oct 17 '16 at 23:31
@johnRotenstein That is what I am using and stuck with this issue. The retry mechanism makes sense, wondering if there are any other options that I should evaluate to pick up the best one. — Saurabh Agrawal, Oct 18 '16 at 00:18
It would be helpful to see some of your code. The important parts, at least. — jzonthemtn, Oct 18 '16 at 15:28

score 0 · Answer 1 · answered Oct 17 '16 at 19:18

0

You may be running into a consistency error. S3 has read-after-write consistency when you PUT a new object into a bucket, and eventual consistency for all other operations on objects.

To resolve it, you may need to wait longer between subsequent executions of the script.

answered Oct 17 '16 at 19:18

Jack Bracken

1,073
13
23

how do you define "longer" when you say "...need to wait longer...". (Though I understand it may depend on a specific scenario). Any guidelines? – Saurabh Agrawal Oct 17 '16 at 20:37
1

Typically just a few seconds, but there is no guarantee and in some extreme cases it can take much longer. http://www.stackdriver.com/eventual-consistency-really-eventual/ http://stackoverflow.com/questions/23786609/what-is-maximum-amazon-s3-replication-time-on-file-upload – Jack Bracken Oct 17 '16 at 20:45

score 0 · Answer 2 · answered Oct 18 '16 at 00:56

When you create a new file, it might take time for it to be fully synchronized between all the AZs in the region. This might require you to retry the read if it fails. Remember to apply a random and increasing wait between the read, to not overload your system and also to solve any possible race conditions. Also do not retry forever, as you want to make sure the execution of the current script ends before a new one.

Another possibility would be to run your script as a lambda function with a trigger on S3 file creation.

Amazon S3: Files found while listing but missing during copy

2 Answers2