Table of content
- S3
- Features
- Storage Classes
- Object Lifecycle Management
- Access Control List (ACL)
- Pre-signed URL
- Requester Pays
- Use-Cases
S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can use Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides management features so that you can optimize, organize, and configure access to your data to meet your specific business, organizational, and compliance requirements.
Features
Feature | Description |
---|---|
High Durability | S3 provides 99.999999999% durability for objects by storing copies in multiple systems across at least three AZs. |
Scalability | S3 can store an unlimited amount of data, from small files to large datasets, scaling automatically as storage needs grow. |
Security | Offers comprehensive security features like encryption in transit and at rest, IAM policies, S3 Bucket Policies, and Access Control Lists. |
Data Lifecycle Management | Automatically moves objects between different storage classes and archives or deletes data based on defined rules. |
Versioning | Keeps multiple versions of an object in the same bucket, allowing you to preserve, retrieve, and restore every version. |
Cross-Region Replication (CRR) | Automatically replicates data across AWS regions for enhanced availability and compliance. |
Event Notifications | Sends notifications when specified events occur in your bucket, using SNS, SQS, or Lambda. |
Storage Class Analysis | Monitors access patterns and suggests when to move data to more cost-effective storage classes. |
S3 Select and Glacier Select | Retrieve only a subset of data from an object by using simple SQL expressions. |
Transfer Acceleration | Speeds up the transfer of files by utilizing Amazon CloudFront’s globally distributed edge locations. |
Multipart Upload | Allows large files to be uploaded in smaller parts concurrently, improving throughput and the ability to resume uploads. |
Object Lock | Provides WORM (Write Once Read Many) capability to prevent object deletion or modification for a fixed amount of time. |
Other important features are:
- One bucket for each Region but access is global
- S3 Signed URL (Short term access url. Url for limited time e.g. user exam)
- MFA delete protection
- CORS - If you request resources from S3 bucket to another, you need to enable CORS
Storage Classes
Each object in Amazon S3 has a storage class associated with it. For example, if you list the objects in an S3 bucket, the console shows the storage class for all the objects in the list. Amazon S3 offers a range of storage classes for the objects that you store. You choose a class depending on your use case scenario and performance access requirements. All of these storage classes offer high durability.
Storage Class | Use Case Examples | Availability | Durability | Minimum Storage Duration | Minimum Billable Object Size | Retrieval Fee | First Byte Latency |
---|---|---|---|---|---|---|---|
S3 Standard | Frequently accessed data, general-purpose | 99.99% | 99.999999999% | None | None | No | Milliseconds |
S3 Intelligent-Tiering | Data with unknown or changing access patterns | 99.9% | 99.999999999% | None | 128KB | No (for frequent access tier) | Milliseconds |
S3 Standard-IA | Long-lived, infrequently accessed data | 99.9% | 99.999999999% | 30 days | 128KB | Yes | Milliseconds |
S3 One Zone-IA | Infrequently accessed data, not requiring multiple AZ data resilience | 99.5% | 99.999999999% | 30 days | 128KB | Yes | Milliseconds |
S3 Glacier Instant Retrieval | Archive data with rapid retrieval | 99.9% | 99.999999999% | 90 days | 128KB | Yes | Milliseconds to seconds |
S3 Glacier Flexible Retrieval (formerly Glacier) | Archive data accessed once or twice a year | 99.99% | 99.999999999% | 90 days | 128KB | Yes | Minutes to hours |
S3 Glacier Deep Archive | Long-term archive, accessed very infrequently | 99.9% | 99.999999999% | 180 days | 128KB | Yes | 12 hours (standard), 48 hours (bulk) |
Setting the storage class of an object
- To set and update object storage classes, you can use the Amazon S3 console, AWS SDKs, or the AWS CLI.
- Amazon S3 API operations support setting (or updating) the storage class of objects as follows:
- When creating a new object, you can specify its storage class. For example, when creating objects by using the PUT Object, POST Object, and Initiate Multipart Upload API operations, you add the x-amz-storage-class request header to specify a storage class. If you don’t add this header, Amazon S3 uses S3 Standard, the default storage class.
- You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object by using the
PUT Object - Copy
API operation. - However, you can’t use
PUT Object - Copy
to copy objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. You also can’t transition from S3 One Zone-IA to S3 Glacier Instant Retrieval. - You copy the object in the same bucket by using the same key name and specifying the request headers as follows:
- Set the
x-amz-metadata-directive
header to COPY. - Set the
x-amz-storage-class
header to the storage class that you want to use.
- Set the
Object Lifecycle Management
- Transition Action: Objects are moved to different storage class after 30 days <?>. e.g. If you select Intelligent tier for objects, data is moved from standard to standard-IA based on if data is not accessed in last 30 days.
- Expiration Action: Objects are deleted asynchronously. There might be some delays( eventual consistency). To check if object is deleted use GET When clicking on Life-cycle, and adding a rule, a rule can be applied to either the entire bucket or a single ‘folder’ in a bucket
Access Control List (ACL)
- Each bucket and object has an ACL attached which defines type of access. When a request is received against S3 resource, S3 checks the corresponding ACL to verify that the requester has the necessary permissions
- When you create Bucket or upload object, S3 creates default ACL that grants the resource owner full control over the resource.
aws s3api get-object-acl --bucket rsc-test-bucket --key abc
{
"Owner": {
"DisplayName": "awspetsafemasterdevelop",
"ID": "4fccb070bc8890e265d43" // creater or called Canonical User Id
},
"Grants": [
{
"Grantee": {
"DisplayName": "awspetsafemasterdevelop",
"ID": "4fccb070bc8890e265d43", //default assigned to creater
"Type": "CanonicalUser"
},
"Permission": "FULL_CONTROL"
}
]
}
Canned ACL & Grants
You grant permissions to objects (max 100 grants). To grant you can either use CannedACL or define explicitly. You can not do both.
- Canned ACL: S3 supports a set of pre-defined grants, known as canned ACLs. Each ACL has a predefined set of grantees and permissions.
- Set Explicitly - You can use x-amz-acl in header of the request to set it
curl ... -H "x-amz-acl:private" https://mybucket.s3.amazonaws.com/test.json
Pre-signed URL
Share Object with Others Temporarily
- By default, object is private. But object/bucket owner can generate pre-signed URL of object through Cli/Console/SigV4
- Pre-signed URL will have token associated with it.
- Pre-signed URL expires after TTL
- IAM instance profile: valid upto 6 hours
- STS: valid upto 36 hours
- IAM user: valid upto 7 days
aws s3 presign s3://mybucket/myimage.png
# returns
https://mybucket.s3.us-east-1.amazonaws.com/myimage.png?
X-Amz-Algorithm=AWS4-HMAC-SHA256&
X-Amz-Credential=ASIAXQYGD74PCZPYL37D%2F20210528%2Fus-east-1%2Fs3%2Faws4_request&
X-Amz-Date=20210528T104921Z&
X-Amz-Expires=3600&
X-Amz-SignedHeaders=host&
X-Amz-Security-Token=IQoJb3...D&
X-Amz-Signature=1753...
Requester Pays
- NASA general HD images/videos and would like to reduce transfer cost to 3rd who sells images/videos to marketplace
- In general, bucket owners pay for all S3 storage and data transfer costs. When Requester Pays is enabled, the requester pays for the data transfer, and the bucket owner pays for the data storage.
- Anonymous access to this bucket is disabled. Instead, the request must come from an IAM Role or IAM User that the related AWS account can be appropriately charged.
- The requester must send a header, x-amz-request-payer, which confirms that the requester knows that they will be charged for the download. To access objects in Requester Pays buckets, requests must include one of the following.
- GET/HEAD/POST: include
x-amz-request-payer=requester
in the header - AWS SigV4: include
x-amz-request-payer=requester
in the request - If the request succeeds (requester is charged), the response includes the header x-amz-request-charged:requester. Otherwise, S3 returns a 403 error.
- GET/HEAD/POST: include
- Bucket may need policy that these users are allowed to get IAM role, and you need to handle authentication by yourselves
- Requester Pays can work with Pre-signed URL as well
Use-Cases
S3 or EFS
- Every 15 minutes service needs to write 20GB of weather data. Would it be EFS or S3 more feasible to use ?
- Access to EFS is faster compared to calling objects on S3 buckets. Although AWS S3 supports strong read-after-write consistency, billions of objects will be overwritten on the S3 bucket every 15 minutes which could take longer to write than using Amazon EFS.
Customer Encryption
Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) for the S3 bucket ensures data security both at rest and in-transit.
- For Amazon S3 REST API calls, you have to include the following HTTP Request Headers:
- x-amz-server-side-encryption-customer-algorithm
- x-amz-server-side-encryption-customer-key
- x-amz-server-side-encryption-customer-key-MD5
- For Presigned URLs, you should specify the algorithm using the
x-amz-server-side-encryption-customer-algorithm
request header.