Benchmarking the new AWS S3 Express One Zone
At the latest AWS re:Invent (2023), AWS announced the new Amazon S3 Express One Zone Storage Class. This new service provides incredible new functionality for low latency applications.
With S3 Express One Zone, you have single digit millisecond access, increasing access speed up to 10x, while reducing costs by 50% compared to standard S3. Your data is redundantly stored on multiple devices within a single Availability Zone (AZ) with 99,95% of availability within a single AZ.
Until now, we had the possibility of choosing in which AWS region our S3 standard bucket would be located. With S3 Express One Zone you must select a specific AWS Availability Zone where the bucket will be created and where all your data will be saved. Data is stored in a different bucket type—an S3 directory bucket—which supports hundreds of thousands of requests per second. This new bucket type has a hierarchical namespace and stores object key names in a directory-like manner, as opposed to the flat key structure of traditional S3 buckets.
S3 Express introduces a new session-based authorization capability that reduces the latency associated with S3 request authorizations. This new capability can be used to create and periodically refresh your connection sessions to the new bucket type.
Let's first refresh how AWS manages the Availability Zones (AZs) in their infrastructure.
An Availability Zone is one or more discrete data centers with redundant power, networking, and connectivity in an AWS Region. To optimize low-latency retrievals, objects in the Amazon S3 Express One Zone storage class are redundantly stored in S3 directory buckets in a single Availability Zone that's local to your compute workload. When you create a directory bucket, you choose the Availability Zone and AWS Region where your bucket will be located.
AWS maps the physical Availability Zones randomly to the Availability Zone names for each AWS account. This approach helps to distribute resources across the Availability Zones in an AWS Region, instead of resources likely being concentrated in the first Availability Zone for each Region. As a result, the Availability Zone us-east-1a
for your AWS account might not represent the same physical location as us-east-1a
for a different AWS account.
As today, S3 Express One Zone is supported in the following Regions and Availability Zones.
Benchmarking
I have decided to test this new bucket storage class and compare it with the standard one. The idea is to compare how different in terms of speed file transfers can be made.
To carry out the corresponding tests I have created two types of buckets using Terraform. A standard bucket and a directory bucket (Express One Zone).
For transferring the files, I’ve used the AWS CLI with the S3 Sync command to send the files from the EC2 instance to the two buckets.
The code in terraform is quite simple and you can see it below.
The code to create an S3 Express One Zone bucket:
resource "aws_s3_directory_bucket" "example_express_bucket" {
bucket = "example-express-bucket--use1-az4--x-s3"
location {
name = "use1-az4"
}
force_destroy = true
}
Keep in mind that for this type of bucket, the name must be in the format: [bucket_name]--[azid]--x-s3. Where azid is the Availability Zone ID.
The code to create an S3 Standard bucket:
resource "aws_s3_bucket" "example_clasical_bucket" {
bucket = "example-clasical-bucket"
force_destroy = true
}
Buckets where created successfully, we can see them on AWS GUI.
In order to test the bandwidth speed, we have to send and/or retrieve files to the bucket from an EC2 instance. For that I’ve spin up an EC2 instance and wrote a small Shell script to create random files with specific sizes for our benchmark tests.
The Shell script must be executed passing two parameters. One is the number of files you want to generate (file_numbers) and the second one is the size for each of those files (file_size_in_bytes).
Example, to generate 100 files of 100k bytes, you execute ./random.sh 100 100k
You can see the script below.
#!/bin/bash
# Check if the correct number of arguments is provided
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <file_numbers> <file_size_in_bytes>"
echo "Example: $0 100 200k"
exit 1
fi
file_numbers=$1
file_size=$2
for i in $(seq 1 $file_numbers)
do
# Generate random data and write it to the file
dd if=/dev/urandom of=files/$i.txt bs=$file_size count=1 status=progress
done
Experiments
Now, we've reached the funny part of the game. I've executed two experiments related to transmitting files from the EC2 instance to AWS S3, focusing on the assessment of writing operations.
For the experiment 1, I generate 1000 files each of 1 Mb with this command: ./random.sh 1000 1M
For experiment 2 , 50000 files were generated each of 100kb size with the following command: ./random.sh 50000 100k
We can see the files were created on the EC2 instance and are ready to be uploaded to AWS S3.
In the case of experiment 1, we can see the files’s size:
We can confirmed files were uploaded and we can see them on AWS GUI.
When uploading the files to S3, I measured the time it took on each experiment and the average bandwidth of the transferring operation.
Results
Here I present the results and finding during the experiments. On the table below we can see the results for each experiment.
We can notice that there are no big difference on first experiment. Having 17 seconds v.s. 15 seconds is not a huge difference and this is because we sent 1000 files of 1Mb and the new storage class (S3 Express One Zone) provides better performance when transferring smaller objects. We can confirm this on second experiment when sending more files and more smaller than on experiment 1 and we gain the double of time for the transaction time.
The reason is because latency usually impacts small files much more than larger files as we can constate in our experiments.
But, I wasn’t completely happy with the results. The docummentation mentioned that we can obtain up to 10x speed but I only got 2x. After further investigating I realize that we can tune the AWS Sync command to improve performance by modifying the value of max_concurrent_requests. This value sets the number of requests that you can send to Amazon S3 at a time. The default value is 10, but you can increase it to a higher value.
Then I re-ran the second experiment and obtained better results.
We gain 5x on performance!. That’s great.
To conclude:
The new S3 Express One Zone storage class (a.k.a Directory Bucket) better perform on large amount of smaller files.
AWS S3 batch operations will get more beneficial of it.
I’ve performed the experiments with AWS CLI, but if you have an application that use, for instance the AWS SDK and can perform batch operations in parallel, then this new storage class is a good candidate to gain performance.
To gain the most beneficial of it, you must set your workload on the same Availability zone as your Directory bucket is located.
Be aware that the Directory bucket structure is different than the traditional S3 buckets.