Now Amazon itself has a wrapper implemented over distcp, namely : s3distcp .
S3DistCp is an extension of DistCp that is optimized to work with
Amazon Web Services (AWS), particularly Amazon Simple Storage Service
(Amazon S3). You use S3DistCp by adding it as a step in a job flow.
Using S3DistCp, you can efficiently copy large amounts of data from
Amazon S3 into HDFS where it can be processed by subsequent steps in
your Amazon Elastic MapReduce (Amazon EMR) job flow. You can also use
S3DistCp to copy data between Amazon S3 buckets or from HDFS to Amazon
S3
Example Copy log files from Amazon S3 to HDFS
This following example illustrates how to copy log files stored in an Amazon S3 bucket into HDFS. In this example the --srcPattern option is used to limit the data copied to the daemon logs.
elastic-mapreduce --jobflow j-3GY8JC4179IOJ --jar \
s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
--args '--src,s3://myawsbucket/logs/j-3GY8JC4179IOJ/node/,\
--dest,hdfs:///output,\
--srcPattern,.*daemons.*-hadoop-.*'