List S3 buckets using Python, AWS CLI

Updated On - April 12, 2020  |  By Mahesh Mogal

In this blog, we will learn how to list down all buckets in our AWS account using Python and AWS CLI. We will learn different ways to list buckets and filter them using tags.

Using AWS CLI

Listing All buckets

We can list buckets with CLI in one single command.

aws s3api list-buckets --profile admin-analyticshut
list buckets in AWS with CLI
Listing buckets with AWS CLI

If you have lots of buckets this output will become difficult to follow. But AWS CLI now supports query parameters. Using query parameters we can extract the required information from the output.

aws s3api list-buckets --query "Buckets[].Name" --profile admin-analyticshut
List S3 buckets using AWS CLI and query parameter
AWS CLI - listing buckets with query flag

We can also use jq (a lightweight command-line JSON parser) to do some funky things. The following code will print bucket names along with tags associated with them.

for bucket in `aws s3api list-buckets --profile admin-analyticshut | jq .Buckets[].Name | tr -d \"`; do
    echo $bucket
    tags=$(aws s3api get-bucket-tagging --bucket elasticbeanstalk-ap-south-1-195556345987 --profile admin-analyticshut | jq -c '.[][] | {(.Key): .Value}' | tr '\n' '\t')
    echo $tags
done

Using Python

Listing all buckets

We can also easily list down all buckets in the AWS account using python.

import boto3
from botocore.exceptions import ClientError
#
# setting up configured profile on your machine.
# You can ignore this step if you want use default AWS CLI profile.
#
boto3.setup_default_session(profile_name='admin-analyticshut')
#
# Option 1: S3 client list of buckets with name and is creation date
#
s3 = boto3.client('s3')

response = s3.list_buckets()['Buckets']
for bucket in response:
    print('Bucket name: {}, Created on: {}'.format(bucket['Name'], bucket['CreationDate']))

When we run the above code we will get the following output.

listing AWS buckets with python
Python - listing buckets with boto3 client

Boto3 also provides us with Bucket resources. We can use its all() function to list down all buckets in the AWS account.

import boto3
from botocore.exceptions import ClientError
#
# setting up configured profile on your machine.
# You can ignore this step if you want use default AWS CLI profile.
#
boto3.setup_default_session(profile_name='admin-analyticshut')
#
# option 2: S3 resource object will return list of all bucket resources.
# This is useful if we want to further process each bucket resource.
#
s3 = boto3.resource('s3')
buckets = s3.buckets.all()

for bucket in buckets:
    print(bucket)
Python listing aws buckets with Boto3 resource
Python listing AWS buckets with Boto3 resource

I also tried buckets filtering based on tags. You can have 100s if not thousands of buckets in the account and the best way to filter them is using tags. Boto3 does provide a filter method for bucket resources. But I did not find how we can use it. So I tried a workaround to filter buckets using tag value in python.

#
# Option 3: Filtering buckets
# This is not working as i have expected. There is filter function for bucket resource.
# But there is no mention of how to use it.
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ServiceResource.buckets
# If anyone know how to use it please let us all know
# buckets1 = s3.buckets.filter(Filters=[{'Name': 'tag:Status', 'Values': ['Logs']}])


# Filtering all buckets with specific tag value
# Same method can be sued to filter buckets with specific string in its name.
for bucket in buckets:
    try:
        tag_set = s3.BucketTagging(bucket.name).tag_set
        for tag in tag_set:
            tag_values = list(tag.values())
            if tag_values[0] == 'Status' and tag_values[1] == 'Logs':
                print(bucket.name)
    except ClientError as e:
        pass
        # print('No Tags') 

If you find how to use the filter method for this approach please let me know. Here is the actual function give by boto3.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.ServiceResource.buckets

Conclusion

We have learned how to list down buckets in the AWS account using CLI as well as Python. Next in this series, we will learn more about performing S3 operations using CLI and python. If you are interested, please subscribe to the newsletter. See you in the next blog.

Listing S3 buckets using Python & CLI
Mahesh Mogal
I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Partitioning in Hive

Using Partitioning, We can increase hive query performance. But if we do not choose partitioning column correctly it can create small file issue.

Partitioning in Hive
Read More
Hive Data Manipulation - Loading Data to Hive Tables

We will learn how to load and populate data to hive table. We will also learn how to copy data to hive tables from local system.

Loading Data to Hive Tables
Read More
Create, Alter, Delete Tables in Hive

We will learn how to create Hive tables, also altering table columns, adding comments and table properties and deleting Hive tables.

manage tables in hive -2
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap