AWS S3 Storage

Table of content

S3
Features
Storage Classes
- Setting the storage class of an object
Object Lifecycle Management
Access Control List (ACL)
- - Canned ACL & Grants
Pre-signed URL
Requester Pays
Use-Cases
- S3 or EFS
- Customer Encryption

S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides management features so that you can optimize, organize, and configure access to your data to meet your specific business, organizational, and compliance requirements.

Features

Feature	Description
High Durability	S3 provides 99.999999999% durability for objects by storing copies in multiple systems across at least three AZs.
Scalability	S3 can store an unlimited amount of data, from small files to large datasets, scaling automatically as storage needs grow.
Security	Offers comprehensive security features like encryption in transit and at rest, IAM policies, S3 Bucket Policies, and Access Control Lists.
Data Lifecycle Management	Automatically moves objects between different storage classes and archives or deletes data based on defined rules.
Versioning	Keeps multiple versions of an object in the same bucket, allowing you to preserve, retrieve, and restore every version.
Cross-Region Replication (CRR)	Automatically replicates data across AWS regions for enhanced availability and compliance.
Event Notifications	Sends notifications when specified events occur in your bucket, using SNS, SQS, or Lambda.
Storage Class Analysis	Monitors access patterns and suggests when to move data to more cost-effective storage classes.
S3 Select and Glacier Select	Retrieve only a subset of data from an object by using simple SQL expressions.
Transfer Acceleration	Speeds up the transfer of files by utilizing Amazon CloudFront’s globally distributed edge locations.
Multipart Upload	Allows large files to be uploaded in smaller parts concurrently, improving throughput and the ability to resume uploads.
Object Lock	Provides WORM (Write Once Read Many) capability to prevent object deletion or modification for a fixed amount of time.

Other important features are:

One bucket for each Region but access is global
S3 Signed URL (Short term access url. Url for limited time e.g. user exam)
MFA delete protection
CORS - If you request resources from S3 bucket to another, you need to enable CORS

Storage Classes

Each object in Amazon S3 has a storage class associated with it. For example, if you list the objects in an S3 bucket, the console shows the storage class for all the objects in the list. Amazon S3 offers a range of storage classes for the objects that you store. You choose a class depending on your use case scenario and performance access requirements. All of these storage classes offer high durability.

Storage Class	Use Case Examples	Availability	Durability	Minimum Storage Duration	Minimum Billable Object Size	Retrieval Fee	First Byte Latency
S3 Standard	Frequently accessed data, general-purpose	99.99%	99.999999999%	None	None	No	Milliseconds
S3 Intelligent-Tiering	Data with unknown or changing access patterns	99.9%	99.999999999%	None	128KB	No (for frequent access tier)	Milliseconds
S3 Standard-IA	Long-lived, infrequently accessed data	99.9%	99.999999999%	30 days	128KB	Yes	Milliseconds
S3 One Zone-IA	Infrequently accessed data, not requiring multiple AZ data resilience	99.5%	99.999999999%	30 days	128KB	Yes	Milliseconds
S3 Glacier Instant Retrieval	Archive data with rapid retrieval	99.9%	99.999999999%	90 days	128KB	Yes	Milliseconds to seconds
S3 Glacier Flexible Retrieval (formerly Glacier)	Archive data accessed once or twice a year	99.99%	99.999999999%	90 days	128KB	Yes	Minutes to hours
S3 Glacier Deep Archive	Long-term archive, accessed very infrequently	99.9%	99.999999999%	180 days	128KB	Yes	12 hours (standard), 48 hours (bulk)

Setting the storage class of an object

To set and update object storage classes, you can use the Amazon S3 console, AWS SDKs, or the AWS CLI.
Amazon S3 API operations support setting (or updating) the storage class of objects as follows:
- When creating a new object, you can specify its storage class. For example, when creating objects by using the PUT Object, POST Object, and Initiate Multipart Upload API operations, you add the x-amz-storage-class request header to specify a storage class. If you don’t add this header, Amazon S3 uses S3 Standard, the default storage class.
- You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object by using the PUT Object - Copy API operation.
- However, you can’t use PUT Object - Copy to copy objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. You also can’t transition from S3 One Zone-IA to S3 Glacier Instant Retrieval.
- You copy the object in the same bucket by using the same key name and specifying the request headers as follows:
  - Set the x-amz-metadata-directive header to COPY.
  - Set the x-amz-storage-class header to the storage class that you want to use.

Object Lifecycle Management

Transition Action: Objects are moved to different storage class after 30 days <?>. e.g. If you select Intelligent tier for objects, data is moved from standard to standard-IA based on if data is not accessed in last 30 days.
Expiration Action: Objects are deleted asynchronously. There might be some delays( eventual consistency). To check if object is deleted use GET When clicking on Life-cycle, and adding a rule, a rule can be applied to either the entire bucket or a single ‘folder’ in a bucket

Access Control List (ACL)

Each bucket and object has an ACL attached which defines type of access. When a request is received against S3 resource, S3 checks the corresponding ACL to verify that the requester has the necessary permissions
When you create Bucket or upload object, S3 creates default ACL that grants the resource owner full control over the resource. aws s3api get-object-acl --bucket rsc-test-bucket --key abc

{
  "Owner": {
    "DisplayName": "awspetsafemasterdevelop",
    "ID": "4fccb070bc8890e265d43" // creater or called Canonical User Id
  },
  "Grants": [
    {
      "Grantee": {
        "DisplayName": "awspetsafemasterdevelop",
        "ID": "4fccb070bc8890e265d43", //default assigned to creater
        "Type": "CanonicalUser"
      },
      "Permission": "FULL_CONTROL"
    }
  ]
}

Canned ACL & Grants

You grant permissions to objects (max 100 grants). To grant you can either use CannedACL or define explicitly. You can not do both.

Canned ACL: S3 supports a set of pre-defined grants, known as canned ACLs. Each ACL has a predefined set of grantees and permissions.

Set Explicitly - You can use x-amz-acl in header of the request to set it

curl ... -H "x-amz-acl:private" https://mybucket.s3.amazonaws.com/test.json

Pre-signed URL

Share Object with Others Temporarily

By default, object is private. But object/bucket owner can generate pre-signed URL of object through Cli/Console/SigV4
Pre-signed URL will have token associated with it.
Pre-signed URL expires after TTL
- IAM instance profile: valid upto 6 hours
- STS: valid upto 36 hours
- IAM user: valid upto 7 days

aws s3 presign s3://mybucket/myimage.png

# returns
https://mybucket.s3.us-east-1.amazonaws.com/myimage.png?
    X-Amz-Algorithm=AWS4-HMAC-SHA256&
    X-Amz-Credential=ASIAXQYGD74PCZPYL37D%2F20210528%2Fus-east-1%2Fs3%2Faws4_request&
    X-Amz-Date=20210528T104921Z&
    X-Amz-Expires=3600&
    X-Amz-SignedHeaders=host&
    X-Amz-Security-Token=IQoJb3...D&
    X-Amz-Signature=1753...

Requester Pays

NASA general HD images/videos and would like to reduce transfer cost to 3rd who sells images/videos to marketplace
In general, bucket owners pay for all S3 storage and data transfer costs. When Requester Pays is enabled, the requester pays for the data transfer, and the bucket owner pays for the data storage.
Anonymous access to this bucket is disabled. Instead, the request must come from an IAM Role or IAM User that the related AWS account can be appropriately charged.
The requester must send a header, x-amz-request-payer, which confirms that the requester knows that they will be charged for the download. To access objects in Requester Pays buckets, requests must include one of the following.
- GET/HEAD/POST: include x-amz-request-payer=requester in the header
- AWS SigV4: include x-amz-request-payer=requester in the request
- If the request succeeds (requester is charged), the response includes the header x-amz-request-charged:requester. Otherwise, S3 returns a 403 error.
Bucket may need policy that these users are allowed to get IAM role, and you need to handle authentication by yourselves
Requester Pays can work with Pre-signed URL as well

Use-Cases

S3 or EFS

Every 15 minutes service needs to write 20GB of weather data. Would it be EFS or S3 more feasible to use ?
Access to EFS is faster compared to calling objects on S3 buckets. Although AWS S3 supports strong read-after-write consistency, billions of objects will be overwritten on the S3 bucket every 15 minutes which could take longer to write than using Amazon EFS.

Customer Encryption

Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) for the S3 bucket ensures data security both at rest and in-transit.

For Amazon S3 REST API calls, you have to include the following HTTP Request Headers:

x-amz-server-side-encryption-customer-algorithm
x-amz-server-side-encryption-customer-key
x-amz-server-side-encryption-customer-key-MD5

For Presigned URLs, you should specify the algorithm using the x-amz-server-side-encryption-customer-algorithm request header.