How to save S3 object to a file using boto3

Question

I'm trying to do a "hello world" with new boto3 client for AWS.

The use-case I have is fairly simple: get object from S3 and save it to the file.

In boto 2.X I would do it like this:

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can't find a clean way to do the same thing, so I'm manually iterating over the "Streaming" object:

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any "native" boto3 function that will do the same task?

score 229 · Accepted Answer · edited Oct 02 '19 at 16:39

229

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

Note that s3_client.download_file won't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).

edited Oct 02 '19 at 16:39

Acumenus

41,481
14
116
107

answered Apr 14 '15 at 20:15

Daniel

6,767
2
36
32

1

@Daniel: Thanks for your reply. Can you reply the answer if i want to upload file using multipart upload in boto3. – Rahul KP Nov 02 '15 at 07:16
1

@RahulKumarPatle the `upload_file` method will automatically use multipart uploads for large files. – Daniel Nov 03 '15 at 16:26
@Daniel - Regarding multipart_upload, I created a [SO question](http://stackoverflow.com/questions/34303775/complete-a-multipart-upload-with-boto3). The `upload_file` method doesn't seem to automatically use multipart upload for file sizes that exceed the `multipart_threshold` configuration; at least, I haven't been able to get it to work that way. I'd love to be wrong! Any help is greatly appreciated. – blehman Dec 16 '15 at 16:47
4

How do you pass you credentials using this approach? – JHowIX Feb 01 '16 at 21:31
1

@JHowIX you can either configure the credentials globally (e.g. see http://boto3.readthedocs.org/en/latest/guide/quickstart.html#configuration) or you can pass them when creating the client. See http://boto3.readthedocs.org/en/latest/reference/core/session.html#boto3.session.Session.client for more info on available options! – Daniel Feb 01 '16 at 21:52
1

it's beyond my understanding how `.upload_file` and `.download_file` arguments order is not the same. – Vlad Nikiporoff Aug 05 '16 at 15:57
2

@VladNikiporoff "Upload from source to destination" "Download from source to destination" – jkdev Oct 30 '16 at 17:35
Hey @Daniel, a follow up question to the statement `s3_client.download_file won't create a directory`. What happens when this is called using API Gateway + Lambda? Where will it store the file, if I enter the `Filename` parameter as `/tmp/filename`? – Junkrat May 19 '21 at 09:42

quodlibetor · Answer 2 · 2016-06-23T20:48:45.777

63

boto3 now has a nicer interface than the client:

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn't tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.

edited Jun 23 '16 at 20:48

answered Feb 12 '16 at 16:27

quodlibetor

7,215
4
30
46

1

Great example, and to add in since the original question asks about saving an object, the relevant method here is `my_bucket.upload_file()` (or `my_bucket.upload_fileobj()` if you have a BytesIO object). – SMX May 19 '17 at 16:06
2

Exactly where do the docs say that `resource` does a better job at retrying? I couldn't find any such indication. – Acumenus Oct 02 '19 at 16:25

score 46 · Answer 3 · edited Jun 12 '18 at 03:51

46

For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

In python3 both StringIO and cStringIO are gone. Use the StringIO import like:

from io import StringIO

To support both version:

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

edited Jun 12 '18 at 03:51

Sohaib Farooqi

4,651
2
23
39

answered Aug 25 '16 at 12:40

cgseller

3,085
2
16
20

16

That's the answer. Here's the question: "How do you save a string to an S3 object using boto3?" – jkdev Oct 31 '16 at 02:49
for python3 I had to use import io; fake_handl e= io.StringIO(contents) – Felix May 12 '17 at 03:26

score 18 · Answer 4 · edited Oct 02 '19 at 14:24

18

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

edited Oct 02 '19 at 14:24

Acumenus

41,481
14
116
107

answered Jan 24 '17 at 15:26

Lord Sumner

444
4
4

14

Never put your AWS_ACCESS_KEY_ID or your AWS_SECRET_ACCESS_KEY in your code. These should be defined with the awscli `aws configure` command and they will be found automatically by `botocore`. – Miles Erickson Mar 16 '17 at 23:36

score 3 · Answer 5 · answered Jul 18 '18 at 23:49

3

Note: I'm assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

answered Jul 18 '18 at 23:49

Tushar Niras

2,645
2
18
20

This code will not download from inside and s3 folder, is there a way to do it using this way? – Marilu Aug 03 '20 at 16:01

Martin Thoma · Answer 6 · 2020-02-18T09:37:11.103

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

Doesn't work. `NameError: name '_s3_path_split' is not defined` — Dave Liu, Feb 17 '20 at 23:02
@DaveLiu Thank you for the hint; I've adjusted the code. The package should have worked before, though. — Martin Thoma, Feb 18 '20 at 09:37

score 2 · Answer 7 · answered Oct 27 '20 at 14:50

If you wish to download a version of a file, you need to use get_object.

import boto3

bucket = 'bucketName'
prefix = 'path/to/file/'
filename = 'fileName.ext'

s3c = boto3.client('s3')
s3r = boto3.resource('s3')

if __name__ == '__main__':
    for version in s3r.Bucket(bucket).object_versions.filter(Prefix=prefix + filename):
        file = version.get()
        version_id = file.get('VersionId')
        obj = s3c.get_object(
            Bucket=bucket,
            Key=prefix + filename,
            VersionId=version_id,
        )
        with open(f"{filename}.{version_id}", 'wb') as f:
            for chunk in obj['Body'].iter_chunks(chunk_size=4096):
                f.write(chunk)

Ref: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html

How to save S3 object to a file using boto3

7 Answers7

Linked