How to extract and manipulate all the .zip files stored in a folder of an Amazon S3 bucket.            

Part 1:- The Extraction

How to extract and manipulate all the .zip files stored in a folder of an Amazon S3 bucket. Part 1:- The Extraction

Ok, let's be frank ! It's not easy to work with S3 objects especially, when, you have to manipulate them on the fly. The problems starts to become even more complex when you are dealing with .zip files stored as S3 objects in a folder of a S3 location. And if the problem that you are trying to solve currently requires you to fetch and extract all the zip files stored in a S3 location, unzip them, perform operations on the files stored in those zip files, and then do something with the zip files, you can imagine the dreaded path of complexity that you are headed towards :(

Don't worry, I have been in a similar position as you are right now and as such, I feel I am the best person to help you with your problem. So without any further adieu, let's get started. And again, don't worry ! I will things simple but effective.

We will start off by learning how to extract all the zip files that are stored in a S3 location.

To get things kickstarted, we will be using the boto3 module. Now we know that a typical location of an object on S3 looks something like this :- s3://bucket-name/folder1/folder2/file1.zip

So, in order to fetch to fetch all the files from this particular s3 location, our code would look something like this.

import boto3
import zipfile
from io import BytesIO
bucket = 'bucket-name' # S3 bucket name 

# create an instance of boto3 client - to connect with low-level apis of boto3 
s3_client = boto3.client('s3', use_ssl=False)
prefix = "folder1/folder2/" # folder path of the zip files location 


# To fetch all the zip files from the folder mentioned in the prefix 
zipped_keys = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter = "/")
zip_file_list = []
for key in zipped_keys['Contents']:
    if key['Key'].endswith('.zip'):
      zip_file_list.append(key['Key'])
print(zip_file_list)

Output:- ['folder1/folder2/z1.zip', 'folder1/folder2/z2.zip']

In the next blog we will learn how to manipulate the fetched zipped files.

Did you find this article valuable?

Support ABHIRUP DAS by becoming a sponsor. Any amount is appreciated!