0

I'm an aws python newbie and trying to account for total bucket size shown via metrics tab on UI vs calculating sizes one folder at a time in a give bucket. I tried to fetch it by setting an inventory configuration but it doesn't show what I'm looking for.

I have an s3 bucket names my_bucket with versioning enabled.
It has 100 Objects and 26 subfolders (will 100000+ objects in each subfolder and atleast two versions for each of the object)

WHAT I AM TRYING TO DO: Calculate and display total size including versions for each of the 180 subfolders.

A  Size 1GB  
B  Size 10TB    
.  
.  
.  
Z Size 13TB

HOW I AM TRYING TO DO Find a solution which combines
the profile based authentication from link one and use the bucket.object_versions
with the level one folder size calculation from link 2
while also taking into consideration the versions. (Link2 doesn't have versions)

Link1 https://stackoverflow.com/a/58125684/4590025
Link2 https://stackoverflow.com/a/49763268/4590025

import boto3

PROFILE = "my_profile"
BUCKET = "my_bucket"

session = boto3.Session(profile_name = PROFILE)
s3 = session.resource('s3')
bucket = s3.Bucket(BUCKET)

#bucket.object_versions.do_something_with_it


conn = boto3.client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='my_bucket')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

I also referred to https://stackoverflow.com/a/48867829 and I'm not sure how to go about utilizing the two and Currently when I run it I get below error despite setting the session:

Traceback (most recent call last):
  File ".\folder_size.py", line 17, in <module>
    for key in conn.list_objects(Bucket='my_bucket')['Contents']:
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 622, in _make_api_call
    operation_model, request_dict, request_context)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\client.py", line 641, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\signers.py", line 160, in sign
    auth.add_auth(request)
  File "C:\Users\ginger\AppData\Local\Programs\Python\Python37\lib\site-packages\botocore\auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
PS C:\Users\ginger\test>
DJ_Stuffy_K
  • 477
  • 2
  • 6
  • 19
  • Let's start simple. `Unable to locate credentials` means that it cannot find credentials. Have you run `aws configure` to provide a set of credentials it can use? – John Rotenstein Dec 10 '20 at 09:18
  • 1
    **Second fact:** The `list_objects()` API call will only return a maximum of 1000 objects. You will need to use [paginators](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#paginators) to keep retrieving 1000 objects. Alternatively, I think you could use `s3_resource.Bucket('xxx').objects.all()` (using the **resource** method, not the **client** method) to loop through all objects. – John Rotenstein Dec 10 '20 at 09:20
  • Yes Sir, I use an iam role which saves credentials to a profile named my_profile and I can run commands via aws --profile my_profile s3 ls my_bucket; – DJ_Stuffy_K Dec 10 '20 at 15:45

1 Answers1

1

The issue is that the program uses:

conn = boto3.client('s3')

This is ignoring the profile that was set earlier:

session = boto3.Session(profile_name = PROFILE)

Thus, if you want to create an S3 client with the profile, then it should use:

conn = session.client('s3')

To avoid the problem with pagination, you could use the resource method to retrieve all objects:

for object in bucket.objects.all():
    folder = object.key.split('/')[0]
    print("Key %s in folder %s. %d bytes" % (object.key, folder, object.size))
...
John Rotenstein
  • 165,783
  • 13
  • 223
  • 298
  • Thanks for pointing that out and catching it. Is there a way to enforce profile_name=PROFILE with `conn = session.client('s3')` ? right now I'm not able to use the aws credentials of my profile via the session.client() route. – DJ_Stuffy_K Jan 14 '21 at 20:40
  • Use `session = boto3.Session(profile_name = PROFILE)` and then `conn = session.client('s3')`. – John Rotenstein Jan 15 '21 at 01:01