How to Understand and Evaluate S3 Costs over the Data Lifecycle

AWS Simple Storage Service aka AWS S3

AWS Simple Storage Service—S3 in short—is a cheap, secure, durable, highly available object storage service for storing unlimited amount of data. S3 is simple to use and scales infinitely in terms of data storage. This is true at least in theory, although it’s remarkable that nobody has ever complained and are unlikely to do so in the foreseeable future.

In S3, data is stored across multiple devices across a minimum of three availability zones in a highly redundant manner. S3 is highly flexible to store any type of data. It is a key-based object store. S3 is based on a global infrastructure and using it is as simple as making standard https calls.

With its ever-growing popularity and ease of use, it provides many features and is still growing and evolving to enable a large number of use cases related to data storage and management across the industry, big or small.

But there is something that has everyone worried, namely, S3 usage costs. Costs are not a matter of worry on their own, given that S3 is indeed quite an inexpensive storage available on cloud. What bothers is the visibility into the various factors which affect cost calculations. These factors are too detailed to be taken lightly.

In this article, we will take a scenario, neither a very happy one nor very complex, as an example to calculate usage costs, which hopefully will make us aware of its fine details.

AWS S3 Pricing

With incredibly wide scenarios of data storage and management support in S3, there are a limited but large number of technical features and usage criteria which have attendant costs. Given all these features, it becomes quite challenging to visualize and evaluate how costs add up.

Typically, the cost of S3 is calculated based on the Data storage, S3 APIs usage, that is, Requests and data retrievals, Network Data Transfer, and the other S3 features for its data management or operations, i.e., Management and Replication.

There are different data storage classes in S3. Storage pricing on these classes are different; pricing also varies for the various API calls for data usage like uploads, listing, downloads, or retrievals.

S3 Pricing is listed at https://aws.amazon.com/s3/pricing/. We will refer to this page for all calculations.

Data Scenario and Assumptions

We have our S3 bucket in the Mumbai region. Suppose we upload 10240GB (10TB) of data per day every day until 31st March 2020, and a total number of 10 Million objects corresponding to the 10TB data per day. This data is uploaded to a S3 bucket from application servers. Also, data is uploaded to the Standard Storage Class in S3. This upload Operation uses an S3 Object PUT request starting 1st March 2020.

We will assume that each object is larger than the S3 Intelligent Tier minimum object size criteria, i.e., 128KB. Also, we will assume that the objects in the S3 Intelligent Tier would be in the Frequent access Tier for first 30 days and in the Infrequent access Tier for the next 30 days.

The next assumption is that new uploads on each day would occur at 00:00AM and that the new data uses the full 24hours on the day of the upload.

In addition, we assume that this is a single AWS account billed individually, that it is not part of any AWS Organization with any other member accounts, and that the free tier coverage is over.

We will use the S3 pricing page for all the price rates and units for calculations pertaining to the Mumbai region.

The S3 Lifecycle Transition—moving data to reduce costs

We can create Lifecycle Transitions to move data to ensure lower storage cost. Here, we will create a Lifecycle Transition using the following steps:

  • Keep data in S3 Standard for 30 days after uploading to S3 Standard.
  • After 30 days of creation, move the data to the S3 Intelligent Tier.
  • After 90 days of creation, move the data to the Glacier Deep Archive Tier.
  • After 120 days of creation, delete the data.

 

Now, without further theoretical talk, let’s begin the journey of cost calculations.

An S3 Standard Cost for March

Cost of Uploads

We calculate the price of uploads, i.e., PUT API calls as shown below:

  • The PUT per 1000 requests in S3 Standard = $0.005

(A) So, the cost of Total PUTs = 10,000,000 * $0.005 / 1000 * 31 = $1550

Storage cost

Now, we will calculate the price for the total data stored in S3 Standard for a month—from 1st March to 31st March. We can calculate data in terms of only GB-Month because data is moving to other tiers and each object would add to monthly pricing for only the amount of time it is stored in S3 Standard during the month.

Accordingly, the price for data storage for this one-month period is as follows:

  • Total number of days of storage = 31 days
  • Price of data Storage in S3 Standard for the first 50 TB per month = $0.025 per GB
  • Price of data Storage in S3 Standard for the next 450 TB per month = $0.024 per GB

Since data is changing every day, we will calculate the effective storage for March in terms of GB-Month. This calculation will signify that the changing data size, on a cumulative basis, was in total that much GB per month for the month of March. Notably, any new data uploaded to S3 Standard will remain there for 30 days.

So, the calculation of cost for March would be as follows:

  • Total GB-Month = (Data Uploaded on 1st March and be there till 30th March) + (Data Uploaded on 2nd March and be there till 31st March) + (Data Uploaded on 3rd March for its GB-Month till 31st March) + … + (Data Uploaded on 31st March)
  • Hence, total GB-Month for March = (10240 GB * 30 days/31 days per month) + (10240 GB * (31-1) days / 31 days per month) + (10240GB * (31-2) days / 31 days per month) + … + (10240 * (31-30)) days / 31 days per month) = (10240 GB / 31 days per month) * (31 + (31-1) + (31-2) + … + (31-30)) days
  • This works out to (10240/31) *((31*31) -(1+1+2+…+30))) GB-Month, and further simplified as (10240 * (961 – 466) / 31) GB-Month = (10240 * 495 / 31) GB-Month The final figure amounts to 163509.677 GB-Month, or 159.677TB-Month

(B) The cost of storage for the month of March is 50 TB-Month at $0.025 per GB + 109.677 TB-Month at $0.024 per GB = 50 * 1024 * $0.025 + 109.677 * 1024 * $0.024, amounting to $3975.42

Transition to Intelligent Tier

After 30 days from creation, data would be moved to the S3 Intelligent Tier and remain there for the next 60 days. For the first 30 days, it would be in Frequent Access Tier; then it will move to the Infrequent Access Tier for the remaining days.

For simplicity, we assume that there is no access while data is stored in S3 IT.

If some data is accessed frequently beyond the first 30 days in S3 IT, such data would remain in the Frequent Access Tier; but we would need to dig deeper to find out which data subsets have been accessed frequently and at what times; this is a difficult task. However, we can make estimations using the Bucket’s CloudWatch Storage Metrics Graph.

  • Objects uploaded to S3 on 1st March would move to S3 IT on 31st March and remain there till 29th
  • Objects uploaded to S3 on 2nd March would move to S3 IT on 1st April and remain there till 30th
  • Objects uploaded to S3 on 3rd March would remain in S3 standard on 1st April; then they move to S3 IT on 2nd April and remain there till 31st
  • Objects uploaded to S3 on 4th March would remain in S3 standard from 1st to 2nd April; these objects then move to S3 IT on 3rd April and remain there till 1st

Similarly, data uploaded to S3 on other days would move to S3 IT after 30 days. Hence, the date of this transition would fall somewhere in April. So, for days in April before this date, this data would be in S3 Standard. From this date onwards, the data would be in S3 IT and then remain there for a total of 60days.

Finally, data uploaded to S3 on 31st March would be in S3 Standard from 1st April to 29th April, and then would move to S3 IT on 30th April; it will remain there till 29th June.

The S3 Intelligent Tier Cost for March

Objects uploaded to S3 on 1st March would move to S3 IT on 31st March and remain there till 29th May.

Transition Cost

  • The Lifecycle Transition Price to S3 IT is $0.01 per 1000 Lifecycle Transition requests (objects)

(C) Thus, the cost of Transition to S3 IT on 31st March is 10,000,000 * $0.01 / 1000, which works out to $100

Storage Cost

  • The price for Storage in Frequent Access Tier, first 50 TB / Month is $0.025 per GB

(D) Thus, the cost of data storage for 31st March is 10240GB * 1 day / 31 days per month * $0.025 per GB, which equates to (10240 * 0.025) / 31, totaling $8.258

Monitoring and Automation Cost

  • For Monitoring and Automation, all Storage / Month = $0.0025 per 1,000 objects

(E) Monitoring and Automation Cost for March = 10,000,000 * $0.0025 / 1000 = $25

 The S3 Standard Tier Cost for April

Storage Cost

  • The total GB-Month in April is 10TB Data on 1st April for 3rd March upload + 10TB Data on 1st and 2nd April for 4th March upload + … + 10TB Data from 1st to 29th April for 31st March upload.

So, the actual calculations are as follow:

(10TB * 1 day / 30 days per month) + (10TB * 2 days / 30 days per month) + … + (10TB * 29 days / 30 days per month)

This equates to 10 * (1+2+…+29) / 30 TB-Month = 10 * 435 / 30 TB-Month, totaling 145 TB-Month

(F) The cost of storage for Standard Tier in April = ((50 * 1024) GB * $0.025 per GB) + ((95 * 1024) GB * $0.024 per GB), which is $3614.72

S3 Intelligent Tier Cost for April

The objects uploaded on 1st March transitioned to the S3 Intelligent Tier on 31st March; these objects will be in the S3 Intelligent Tier (S3 IT) in April from 1st April to 29th April in the Frequent access Tier; then in the Infrequent access Tier on 30th April.

Objects uploaded from 2nd March to 31st March will get transitioned to S3 Intelligent Tier (S3 IT) in April from 1st April till 30th April and remain there accordingly for 60 days.

Transition Cost

  • The Lifecycle Transition price to S3 IT is $0.01 per 1000 Lifecycle Transition requests (objects)
  • Total Objects Transitioned to S3 IT in April are 10,000,000 * 30, that is, 300,000,000

(G) So, the cost for Lifecycle Transition in April is 300,000,000 * $0.01 / 1000, which amounts to $3000

Storage Cost

  • The total GB-Month of Data in the S3 IT for the month of April is ( (10TB * 29 days / 30 days per month) + (10TB * 30 days / 30 days per month) + (10TB * (30-1) days / 30 days per month) + (10TB * (30-2) days / 30 days per month) + … + (10TB * (30-29) days / 30 days per month) ) (Frequent access)

+ (10TB * 1 day / 30 days per month) (Infrequent access)

  • This translates to 10TB * (29 + 30 + (30 –1) + (30 –2) + … + (30-29) ) days / 30 days Per month + (10TB * 1 day / 30 days per month);
  • that is, 10 * (29 + (30*30) – (1+2+…+29) ) / 30 TB-Month = 164.67 TB-Month (Frequent access) + 1/3 TB-Month (Infrequent access)

(H) Thus, the cost for S3 IT storage = (50 * 1024 * $0.025) + (114.67 * 1024 * $0.024) + (1/3 * 1024 * $0.019), which works out to $4104.533

Monitoring and Automation Cost

  • For Monitoring and Automation, all Storage / Month is $0.0025 per 1,000 objects

(I) So, the cost of S3 Intelligent Tier Management is ($0.0025 / 1000) * (( (10Million * 29 days / 30 days per month) + (10Million * 30 days / 30 days per month) + (10Million * (30-1) days / 30 days per month) + (10Million * (30-2) days / 30 days per month) + (10Million * (30-3) days / 30 days per month) + … + (10Million * (30-29) days / 30 days per month) ) (Frequent access) + (10Million * 1 day / 30 days per month)) (Infrequent access))

Hence, the actual calculations are

  • ($0.0025 / 1000) * (10Million * (29 + 30 + (30-1) + (30-2) + (30-3) + … + (30-29) + 1) days / 30 days per month),
  • which translates to ($0.0025 / 1000) * (10Million * (30 + 30 + (30-1) + (30-2) + (30-3) + … + (30-29)) days / 30 days per month);
  • that is, ($0.0025 / 1000) * (10Million * 495 days) / 30 days per month) = ($0.0025 / 1000) * 165MillionObject-month, amounting to $5

S3 Intelligent Tier Cost for May

  • Objects uploaded on 1st March will get transitioned to the S3 Intelligent Tier (S3 IT) Infrequent access Tier on 30th April; they will remain there accordingly for 30 days, i.e., on 30th April and from 1st May until 29th
  • Objects uploaded from 2nd March to 31st March will get transitioned to the S3 Intelligent Tier (S3 IT) in April—from 1st April till 30th They will be in the Frequent Tier for 30 days. Then these objects will be automatically moved to the S3 Intelligent Tier (S3 IT) Infrequent access Tier for another 30 days.
  • Object transitioned to S3 IT on 1st April will be in the Infrequent access Tier from 1st May to 30th
  • Object transitioned to S3 IT on 2nd April will be in the Frequent access Tier on 1st Then, it will be in the Infrequent access Tier from 2nd May to 31st May.
  • Object transitioned to S3 IT on 3rd April will be in Frequent access Tier on 1st and 2nd Then, it will be in the Infrequent access Tier from 3rd May to 31st May and 1st June.
  • Similarly, transition will occur for data uploaded on other dates.
  • Object transitioned to S3 IT on 30th April will be in the Frequent access Tier from 1st May up to 29th Then, it will be in the Infrequent access Tier from 30th May to 31st May and from 1st June to 28th June.

Transition Cost

Since there is no transition from the S3 standard to the S3 Intelligent Tier in May, and S3 Intelligent Tier’s sub-Tiers (i.e., Frequent/Infrequent Access Tiers) don’t count for Transition, there is zero cost of Lifecycle transition this month.

Storage Cost

For Frequent Access Tier,

  • Total GB-Month of Data in S3 IT for month of May = (10TB * 1 day / 31 days per month) + (10TB * 2 days / 31 days per month) + (10TB * 3 days / 31 days per month) + … + (10TB * 29 days / 31 days per month) (Frequent access);
  • this translates to 10TB * (1+2+3+…+29) days / 31 days per month;
  • that is, 10 * (1+2+3+…+29) / 31 TB-Month, resulting in 140.322 TB-Month

(J) So, the cost for S3 IT storage (Frequent access) = (50 * 1024 * $0.025) + (90.322 * 1024 * $0.024), which is $3499.753472

For Infrequent Access Tier,

  • The total GB-Month of Data in S3 IT for month of May is (10TB * 29 days / 31 days per month) + (10TB * 30 days / 31 days per month) + (10TB * (31-1) days / 31 days per month) + (10TB * (31-2) days / 31 days per month) + … + (10TB * (31-29) days / 31 days per month) (Infrequent access);
  • this would be 10TB * (29 + 30) days / 31 days per month + 10TB * (31*29 – (1+2+3+…+29)) days / 31 days per month;
  • which equates to, 10 * (29 + 30 + 31*29 – (1+2+3+…+29)) / 31 TB-Month, i.e, 168.7096 TB-Month

(K) The cost for S3 IT storage (Infrequent access) is then 168.7096 * 1024 * $0.019, totaling $3282.41398

Monitoring and Automation Cost

  • Monitoring and Automation, All Storage / Month is $0.0025 per 1,000 objects

(L) The cost of S3 Intelligent Tier Management is ($0.0025 / 1000) * (Frequent Access Tier: (10Million * 1 day / 31 days per month) + (10Million * 2 days / 31 days per month) + (10Million * 3 days / 31 days per month) + … + (10Million * 29 days / 31 days per month) + Infrequent Access Tier: (10Million * 29 days / 31 days per month) + (10Million * 30 days / 31 days per month) + (10Million * (31-1) days / 31 days per month) + (10Million * (31-2) days / 31 days per month) + … + (10Million * (31-29) days / 31 days per month));

this would translate to ($0.0025 / 1000) * (10Million * (1+2+3+…+29) days / 31 days per month +10Million * (29+30+(31-1) +(31-2) +…+(31-29)) days / 31 days per month);

that is, ($0.0025 / 1000) * ((10 * ((1+2+3+…+29) + (29 + 30 + (29 * 31) – (1+2+3+…+29)))/ 31) MillionObject-Month);

further, ($0.0025 / 1000) * ((10 * (29 + 30 + (29 * 31)) / 31) MillionObject-Month);

thus, ($0.0025 / 1000) * (309.032258065 MillionObject-Month), amounting to $7725.80645

S3 Glacier Deep Archive Tier Cost for May

Objects uploaded to S3 will finally get transitioned to the S3 Glacier Deep Archive Tier after 90 days of creation and remain there accordingly for the remaining period of 30 days before being completely deleted from S3.

Object uploaded on March 1st will get transitioned to the Deep Archive on May 30th and remain there on 30th and 31st May; subsequently, it will remain in the Deep Archive from 1st June for the remaining 30 days of the month.

Similarly, the object uploaded on March 2nd will get transitioned to the Deep Archive on May 31st and remain there on 31st May; then, from 1st June, it will remain for 30 days in Deep Archive.

Transition Cost        

  • The price of Lifecycle Transition Requests per 1000 requests is $0.07

(M) Thus, the cost of Transition to the Deep Archive in May is 2 * 10,000,000 * $0.07 / 1000, that is, $1400

Storage Cost

The storage Price for objects in Deep Archive is affected by 3 factors:

  1. The total data size stored
  2. The total S3 standard index size @ 8KB per object
  3. The total S3 Glacier Deep Archive index size @ 32KB per object

Calculations are as follows:

  • The total Object Count transitioned to Deep Archive in May is 20,000,000
  • So, the total Data Size = 2 * 10TB
  • The total TB-Month of data size in May is (10TB * 2 days / 31 days per month) + (10TB * 1 day / 31 days per month) + (32KB * 10,000,000 * 2 days / 31 days per month) + (8KB * 10,000,000 * 1 day / 31 days per month) (S3 Standard);
  • that is, (990.967741935 GB-Month + 9.84438 GB-Month) (Deep Archive) + (2.461095 GB-Month) (S3 Standard);
  • so, 1000.81212194 GB-Month (Deep Archive) + (2.461095 GB-Month) (S3 Standard)
  • The Deep Archive Storage Price = All Storage / Month is $0.002 per GB
  • The S3 Standard Price = First 50 TB / Month is $0.025 per GB

(N) Thus, the cost of Storage in Deep Archive in May = 1000.81212194 * $0.002 + 2.461095 * $0.025, totaling $2.06315

S3 Intelligent Tier Cost for June

  • Object transitioned to S3 IT on 2nd April will be in Frequent access Tier up to 1st Then, in Infrequent access Tier from 2nd May to 31st May.
  • Object transitioned to S3 IT on 3rd April will be in Frequent access Tier up to 2nd Then, in Infrequent access Tier from 3rd May to 31st May and 1st June.
  • Similarly, object transitioned to S3 IT on 4th April will be in Frequent access Tier on 1st, 2nd and 3rd Then, in Infrequent access Tier from 4th May to 31st May and then on 1st and 2nd June.
  • These considerations apply similarly to objects uploaded later in April or May.
  • Object transitioned to S3 IT on 30th April will be in Frequent access Tier from 1st May up to 29th Then, in Infrequent access Tier from 30th May to 31st May; subsequently, from 1st June to 28th June.

Storage Cost

  • The total TB-Month in June can be calculated as (10TB * 1 day / 30 days per month) + (10TB * 2 days / 30 days per month) + (10TB * 3 days / 30 days per month) + … + (10TB * 28 days / 30 days per month);
  • which is, 10 * (1+2+3+…+28) / 30 TB-Month, totaling 135.33333 TB-Month

(O) Then, the cost for S3 IT storage (Infrequent access) is (10 * (1+2+3+…+28) / 30 * 1024 * $0.019), totaling $2633.04533

S3 Glacier Deep Archive Tier Cost for the remaining period

Remember, objects will be transitioned to S3 Glacier Deep Archive after 90 days from creation and deleted after 120 days of creation.

  • Price of Lifecycle Transition Requests per 1000 requests is $0.07
  • Price of Lifecycle Transition Requests for Deletion is $0
  • Deep Archive storage Price = All Storage / Month is $0.002 per GB
  • The S3 Standard Price = First 50 TB / Month is $0.025 per GB, next 450 TB / Month is $0.024

The minimum storage duration in the Deep Archive is 180 days. So, any object in Deep Archive that is deleted before 180 days from the day of transition to Deep Archive will be charged as if it were stored for 180 days after transitioning there.

From the S3 Pricing Page:

Objects that are archived to S3 Glacier and S3 Glacier Deep Archive have a minimum 90 days and 180 days of storage, respectively. Objects deleted before 90 days and 180 days incur a pro-rated charge equal to the storage charge for the remaining days. Objects that are deleted, overwritten, or transitioned to a different storage class before the minimum storage duration will incur the normal storage usage charge plus a pro-rated request charge for the remainder of the minimum storage duration.

 This applies to the current scenario; thus, we need to calculate cost for 180 days in Deep Archive, instead of 30 days.

Calculations

  • Object transitioned to Deep Archive on 30th May will remain in Deep Archive in June from 1st up to 28th
  • Object transitioned to Deep Archive on 31st May will remain in Deep Archive in June from 1st up to 29th
  • Object transitioned to S3 IT Infrequent access Tier on 2nd May will be transitioned to Deep Archive on 1st Then, it will remain there up to 30th June.
  • Object transitioned to S3 IT Infrequent access Tier on 3rd May will be transitioned to Deep Archive on 2nd Then, it will remain there up to 30th June and on 1st July.
  • This logic is similarly applied to later objects.
  • Object transitioned to S3 IT Infrequent access Tier on 30th May will be transitioned to the Deep Archive after 28th Then, it will remain there on 29th June and 30th June and from 1st July to 28th July.

Transition Cost

  • The price of Lifecycle Transition Requests per 1000 requests is $0.07

(P) So, the cost of Transition to Deep Archive in June is 29 * 10,000,000 * $0.07 / 1000, which works out to $20300.

Again, the minimum Deep Archive storage period is 180 days.

  • Object Uploaded to S3 on 1st March has transitioned to Deep Archive on 30th
  • The month wise days it should have been in Deep Archive, if not deleted early, would be as follows – 30th May and 31st May; the full months of June, July, August, September, and October; then the days from 1st November until 25th
  • Object Uploaded to S3 on 31st March has transitioned to Deep Archive on 29th
  • Month wise days it should have been in Deep Archive, if not deleted early, would be – 29th and 30th June; the full months of July, August, September, October, and November; then, the days from 1st December up to 25th

All data transitioned in between will fill up the middle months and days, in chronological order, between these two extremities.

Storage Cost

As stated earlier, there are 3 components for Deep Archive Storage:

  1. The total data size stored
  2. A total S3 standard index size of 8KB per object

(For early deletion, I will be adding this component too. If anybody finds it incorrect, please provide your comment; I would be happy to correct)

  1. A total S3 Glacier Deep Archive index size of 32KB per object

(For early deletion, I will be adding this component too. If anybody finds it incorrect, please provide your comment; I would be happy to correct)

In case of Early deletion, as referred above from the Pricing page, storage cost would be equal to storage for 180 days from the date of transition to Deep Archive. There is also a pro-rated request charge for early deletion cases, but I am not able to find the pricing unit for that, so I will be skipping such request charges. (I will try to figure it out and update later, or if anybody has any information, please provide in a comment below; I would be happy to update and appreciate your inputs.)

Deep Archive Storage Costs in June

  • The total TB-Month for Data Objects is calculated as (10TB * 30 days / 30 days per month) + (10TB * (30-1) days / 30 days per month) + (10TB * (30-2) days / 30 days per month) + … + (10TB * (30-27) days / 30 days per month) + (10TB * (30-28) days / 30 days per month);
  • this equates to 10TB * ((30-0) + (30-1) + (30-2) + … + (30-27) + (30-28)) days / 30 days per month;
  • that is, 10TB * (30*29 – (1+2+…+28)) days/ 30 days per month = 10 * ((30*29) – (14*29)) / 30 TB-Month;
  • which amounts to 154.666666667 TB-Month
  • The total GB-Month for Deep Archive Index Objects is (32KB * 10,000,000 * 30 days / 30 days per month) + (32KB * 10,000,000 * (30-1) days / 30 days per month) + (32KB * 10,000,000  * (30-2) days / 30 days per month) + … + (32KB * 10,000,000  * (30-27) days / 30 days per month) + (32KB * 10,000,000  * (30-28) days / 30 days per month);
  • so, 32KB * 10,000,000 * ((30*29) – (14*29)) / 30 KB-Month, which amounts to 4720.05208333 GB-Month
  • The total GB-Month for S3 Index Objects = (8KB * 10,000,000 * 30 days / 30 days per month) + (32KB * 10,000,000 * (30-1) days / 30 days per month) + (8KB * 10,000,000  * (30-2) days / 30 days per month) + … + (8KB * 10,000,000  * (30-27) days / 30 days per month) + (8KB * 10,000,000  * (30-28) days / 30 days per month);
  • so, 8KB * 10,000,000 * ((30*29) – (14*29)) / 30 KB-Month, totaling 1180.01302083 GB-Month

(Q) Thus, the total Costs of Storage in June are ((154.666666667 * 1024) + 4720.05208333) * $0.002 + (1180.01302083 * $0.025); this amounts to $355.697763

Deep Archive Storage Costs in July

  • The total TB-Month for Data Objects is 31 * (10TB * 31 days / 31 days per month), amounting to 310 TB-Month
  • The total GB-Month for Deep Archive Index Objects is 31 * (32KB * 10,000,000 * 31 days / 31 days per month), that is, 9460.44921875 GB-Month
  • The total GB-Month for S3 Index Objects is 31 * (8KB * 10,000,000 * 31 days / 31 days per month), which, is 2365.11230469 GB-Month

(R) So, total Costs of Storage in July = (((310 * 1024) + 9460.44921875) * $0.002) + (2365.11230469 * $0.025), totaling

$712.928706

Deep Archive Storage Costs in August

  • The total TB-Month for Data Objects is 31 * (10TB * 31 days / 31 days per month), that is, 310 TB-Month
  • The total GB-Month for Deep Archive Index Objects is 31 * (32KB * 10,000,000 * 31 days / 31 days per month), which is 9460.44921875 GB-Month
  • The total GB-Month for S3 Index Objects is 31 * (8KB * 10,000,000 * 31 days / 31 days per Month), amounting to 2365.11230469 GB-Month

(S) So, the total Costs of Storage in August are calculated (((310 * 1024) + 9460.44921875) * $0.002) + (2365.11230469 * $0.025); this amounts to $712.928706

Deep Archive Storage Costs in September

  • The total TB-Month for Data Objects is 31 * (10TB * 30 days / 30 days per month), that is, 310 TB-Month
  • The total GB-Month for Deep Archive Index Objects is 31 * (32KB * 10,000,000 * 30 days / 30 days per month), which is 9460.44921875 GB-Month
  • The total GB-Month for S3 Index Objects is 31 * (8KB * 10,000,000 * 30 days / 30 days per month), which equals 2365.11230469 GB-Month

(T) So, the total Cost of Storage in September is (((310 * 1024) + 9460.44921875) * $0.002) + (2365.11230469 * $0.025), totaling $712.928706

Deep Archive Storage Costs in October

  • The total TB-Month for Data Objects is 31 * (10TB * 31 days / 31 days per month), that is, 310 TB-Month
  • The total GB-Month for Deep Archive Index Objects is 31 * (32KB * 10,000,000 * 31 days / 31 days per month), amounting to 9460.44921875 GB-Month
  • The Total GB-Month for S3 Index Objects id 31 * (8KB * 10,000,000 * 31 days / 31 days per month), which is 2365.11230469 GB-Month

(U) So, the total Cost of Storage in October is (((310 * 1024) + 9460.44921875) * $0.002) + (2365.11230469 * $0.025); the total for October is $712.928706

Deep Archive Storage Costs in November

  • The total TB-Month for Data Objects = 31 * (10TB * 25 days / 30 days per month) + 30 * (10TB * 1 day / 30 days per month)  + 29 * (10TB * 1 day / 30 days per month)  + 28 * (10TB * 1 day / 30 days per month)  + 27 * (10TB * 1 day / 30 days per month)  + 26 * (10TB * 1 day / 30 days per month);
  • that is, 10 * ((31*25) + 30 + 29 + 28 + 27 + 26) / 30 TB-Month, which equals 305 TB-Month
  • The total GB-Month for Deep Archive Index Objects is 31 * (32KB * 10,000,000 * 25 days / 30 days per month) + 30 * (32KB * 10,000,000 * 1 day / 30 days per month)  + 29 * (32KB * 10,000,000 * 1 day / 30 days per month)  + 28 * (32KB * 10,000,000 * 1 day / 30 days per month)  + 27 * (32KB * 10,000,000 * 1 day / 30 days per month)  + 26 * (32KB * 10,000,000 * 1 day / 30 days per month);
  • this equates to 32 * 10,000,000 * ((31*25) + 30 + 29 + 28 + 27 + 26) / 30 KB-Month, that is, 9307.86132813 GB-Month
  • The total GB-Month for S3 Index Objects is 31 * (8KB * 10,000,000 * 25 days / 30 days per month) + 30 * (8KB * 10,000,000 * 1 day / 30 days per month)  + 29 * (8KB * 10,000,000 * 1 day / 30 days per month)  + 28 * (8KB * 10,000,000 * 1 day / 30 days per month)  + 27 * (8KB * 10,000,000 * 1 day / 30 days per month)  + 26 * (8KB * 10,000,000 * 1 day / 30 days per month);
  • thus, 8 * 10,000,000 * ((31*25) + 30 + 29 + 28 + 27 + 26) / 30 KB-Month, which is 2326.96533203 GB-Month

(V) So, the total Costs of Storage in November are calculated as (((305 * 1024) + 9307.86132813) * $0.002) + (2326.96533203 * $0.025); the total cost of storage for November is $701.429856

Deep Archive Storage Costs in December

  • The total TB-Month for Data Objects is (10TB * 25 days / 31 days per month) + (10TB * 24 days / 31 days per month) + (10TB * 23 days / 31 days per month) + … + (10TB * 1 day / 31 days per month);
  • this equates to 10 * (25 + 24 + 23 + … + 1) / 31 TB-Month, amounting to 104.838709677 TB-Month
  • The total GB-Month for Deep Archive Index Objects is (32KB * 10,000,000 * 25 days / 31 days per month) + (32KB * 10,000,000 * 24 days / 31 days per month) + (32KB * 10,000,000 * 23 days / 31 days per month) + … + (32KB * 10,000,000 * 1 day / 31 days per month);
  • thus, 32 * 10,000,000 * (25 + 24 + 23 + … + 1) / 31 KB-Month, that is, 3199.4235131 GB-Month
  • The total GB-Month for S3 Index Objects = (8KB * 10,000,000 * 25 days / 31 days per month) + (8KB * 10,000,000 * 24 days / 31 days per month) + (8KB * 10,000,000 * 23 days / 31 days per month) + … + (8KB * 10,000,000 * 1 day / 31 days per month);
  • hence, 8 * 10,000,00 * (25 + 24 + 23 + … + 1) / 31 KB-Month, totaling 799.855878276 GB-Month

(W) So, the total Cost of Storage in December is (((104.838709677 * 1024) + 3199.4235131) * $0.002) + (799.855878276 * $0.025); this amounts to $241.104921

Total Cost of Data Storage and Management for Data uploaded for the Month of March throughout its Lifecycle

  • The total Cost is A ($1550) + B ($42) + C ($100) + D ($8.258) + E ($25) + F ($3614.72) + G ($3000) + H ($4104.533) + I ($412.5) + J ($3499.753) + K ($3282.414) + L ($7725.80645) + M ($1400) + N ($2.06315) + O ($2633.04533) + P ($20300) + Q ($355.697763) + R ($712.928706) + S ($712.928706) + T ($712.928706) + U ($712.928706) + V ($701.429856) + W ($241.104921);
  • this amounts to $59,783.4603

It’s ~=$60K!! Whoa!! And it’s just 120 days.

Some Analysis and Wise words

We see that for a total of 310 TB of data uploaded, —in chunk of 10 TB in size, 10 million in objects count on each day of March, and stored for just 120 days—incurs such high costs.

It was a highly simplified model, where data was uploaded sharp at 12:00AM; but in a real-life scenario, uploading 10 TB data with 10 million objects would take hours to days, depending on the approach of data transfer into S3. Also, data was moved across tiers or storage classes for lower storage costs. We can see that there are data transition costs (~$20K in June), which are huge because of the huge count of objects in the data (10 Million uploaded per day in March). We simplified this case to a point where the data size is larger than the minimum object size for S3 IT Tier (128K), which would not be the case in practice.

We can see that there is a huge cost implication due to early deletion from Deep Archive. Deep Archive also has a data index overhead that the user has to bear. If you delete objects before the minimum storage period, early deletion charges are like a penalty you have to pay. For Deep Archive, this penalty is 180 days; for Glacier, it’s 90 days; it’s 30 days for S3 IT and S3 IA Tiers.

We simplified the scenario in such a way that data would be uploaded only once a day; data would not be deleted in between but only managed by Lifecycle Transition. Also, data in the S3 Intelligent Tier would all move to its Infrequent access tier after the first 30 days of transition into the Intelligent Tier.

All these simplifications made it possible to make the costs calculations easier. We have calculated the storage size in terms of Size-Month (e.g., TB-Month or GB-Month), which is because data is moving in time and it stays in a particular class for only a limited time in a month.

We can use CloudWatch metrics graphs to visualize the amount of data in a Storage class in a month, given that the data is moving. The graph would give a good visualization of the average data size for monthly calculations.

In our calculations, we only include data uploaded in the month of March; even then, for a Lifecycle of 120 days, total costs were ~$60k. What if we keep adding data every month and it remains there for a longer period?

Moreover, we did not try downloading data from the S3 standard or the S3 Intelligent Tier (priced at $0.005 per 1000 GET requests). Besides, we did not try LIST (priced at $0.0004 per 1000 requests) or any other requests, which are almost always used in real scenarios. Also, we did not try downloading data in other regions than the current region of the bucket, or from AWS services from other regions, or over the public internet, which would amount to Data Transfer; believe me, if not planned well, it is very costly. Nevertheless, data transfer within the same region is free, for example, from EC2 instance in Mumbai where this bucket is located. (! Importantly, it was once confirmed by AWS support that this within region transfer can be free even over public internet, sometime around mid of March 2020).

Sometimes, data is retrieved from Deep Archive; such a retrieval request has its own costs based on multiple retrieval pricings tiers. Retrieval first restores data into an RRS storage class bucket temporarily, which also has its own storage cost, in addition to retrieval. Then this data has to be copied to the S3 standard or elsewhere, which again adds to costs due to storage cost in that other place, along with the costs of some COPY API requests to the data from the bucket in the mentioned RRS class. Be warned that retrieval is a very costly, tiresome and 4 sources of pricing scenario.

There are many other sources and situations adding to high costs in S3; for example, S3 Multipart uploads, S3 Batch, S3 SELECT, S3 Transfer Acceleration, S3 Replication, etc.

CONCLUSION

AWS S3 is a cheap and highly scalable storage option for diverse data needs. But its usage for data storage and management has many considerations that have cost implications. They are so finely granular that you miss either the details of the calculations or the details of the features being used. So, it is wise to be cautious of the cost implications before planning any serious S3 usage. It’s good to evaluate the usage scenarios and design a data-based solution that is good for data availability, usage, management, and costs.

One should carefully analyze the pricing page and the user guide to understand the nuances of S3 pricing and then evaluate the current data usage in S3 or the plans for future use to help run cost-optimized workloads. If planned in advance, and workloads are designed diligently and executed well, you will ensure healthy profit margins for your company.

 

Click to share