How to choose appropriate AWS storage solutions?

Sruti Samatkar
Analytics Vidhya
Published in
6 min readDec 14, 2020

--

AWS: Storage Design

In AWS, there are many options when it comes to storage, so it’s important to understand the various storage services that are available, so you can pick the appropriate one’s for you needs. There is Simple Storage Service (S3), Glacier, CloudFront, Elastic Block Store (EBS), Storage gateways and the Snow family.

If you’ve ever seen that S3 and wondered what does it mean? It’s just the three S’s, Simple Storage Service. And it was one of the first storage services that Amazon ever offered with AWS. We also have Glacier. Glacier is for that archival data. Some place to put a large amount of data that you want to keep for a long time, but you’re not necessarily worried about accessing it frequently, or instantly.

Then we also have CloudFront. Now CloudFront is about getting the stuff

close to your users. So with CloudFront, quite often what you’re doing is simply making sure that web information, this kind of data that’s accessed frequently by your website visitors, is cached at an edge location that’s near the customer.

Then we also have Elastic Block Store, or EBS. Elastic Block Store is the best storage solution to use for your instances, when you want those instances to have very fast access, because we’re talking about block-level access, rather than object-level access. S3 is object-level while EBS is block-level.

The Storage Gateway is basically an appliance that you put on your local network, either a software appliance or a hardware appliance, that acts as a VPN connection into the Amazon cloud. So that you can access your storage as if it’s local storage. Now the Snow family is a collection of really three primary products that can be used in order to migrate data from your local data stores into the cloud when you have massive amounts of data that you need to move. We’ll look at those in more detail later as well. The final storage service that you really have is your databases.

Let’s see each one in detail:

Amazon Simple Storage Service (S3): it was one of the first storage services that Amazon ever offered with AWS.

Amazon S3
  • S3 is about object storage.
  • The data in S3 is distributed across at least 3 availability zones except 1A (1Zone which is least expensive)
  • S3 supports encryption and automatic data classification.
  • Big Data analytics can run directly against stored data.

Getting data into S3:

  1. API
  2. Amazon Direct Connect: creating a VPN connection from our network into AWS and then data can go through that VPN connection.
  3. Storage Gateway: It can be that data is stored locally and replicated into S3 buckets.
  4. Kinesis Firehose: it is way to get large amount of analytical data into S3 buckets.
  5. Transfer Acceleration: It works based on CloudFront technology.
  6. Snow Family : Snowball (petabyte) or Snowball Edge (100TB)

Common S3 Operation

  • Creating and deleting buckets
  • Writing objects
  • Reading objects
  • Deleting objects
  • Managing object properties

S3 Advance Features:

  • Prefixes and delimiters : for hierarchy
  • Storage classes: S3 Standard, S3 Infrequent Access (IA), S3 Reduced Redundancy Storage (RRS) and Glacier.
  • Object Lifecycle Management: if not used for longtime data automatically moves from standard to IA to Glacier but is still is in our bucket.
  • Encryption: 256 AES encryption at server side.
  • Multifactor Authentication

Glacier: The data which is not frequently accessed is stored in glacier and it is the least expensive storage service.

Amazon Glacier
  • In S3 we have buckets here we have Vaults.
  • Also called Archival data storage
  • There are three access methods

o Expedited: can be accessed in 3–5 minutes and is most expensive.

o Standard : can be accessed in 3–5 hours

o Bulk : can be accessed in 5–12 hours

  • Here we can define the region for data storage.
  • Data stored with AES-256 bit encryption.

Glacier Integration:

1. S3 cold data can be automatically moved into Glacier.

2. Snow Devices can be used to import data.

3. Storage gateway can be connected to glacier.

Glacier features:

  • Data Retrieval: up to 5% data is retrieved at no charge every month.
  • Vaults can be configured to limit cost.
  • A single AWS account can create up to 1000 vaults per region.
  • Only empty vaults can be deleted.
  • Glacier supports multipart uploads of archives, so a large archive is not required to be uploaded in a single action.

Elastic Block Storage (EBS): Elastic Block Store is the best storage solution to use for your instances, when you want those instances to have very fast access, because we’re talking about block-level access, rather than object-level access. S3 is object-level and EBS is block-level.

EBS Overview:

  • It has to be attached with only one instance.
  • It is used for durable storage in EC2 instances.
  • Block level storage from one AWS service to another.

EBS volume types:

1. Magnetic: lowest cost and is the slowest.

2. SSD (Solid State Drive): it has fast performance and very fast chip-based storage of data.

a. General Purpose SSD

b. Provisioned IOPS

Protecting EBS Data:

· Snapshot: to create exact duplicate of instance in another instance, to restore EBS to that state at a later time for recovery purposes.

· Volume Recovery: Attaching volumes from instance to another.

· Encryption Method

Elastic File System (EFS): EFS is like NAS (Network Attached Storage) storage within cloud for the cloud.

· EFS is shareable: multiple instances can access it.

· EFS can be accessed through NFSv4 (network file system).

· EC2 instances can use EFS shares.

· During EFS configuration we require VPC (virtual private cloud), VPC are these clouds that are a collection of resources in the cloud that you can manage that are put together in your own space i.e. your own private space.

Storage Gateway: The Storage Gateway is basically an appliance that you put on your local network, either a software appliance or a hardware appliance, that acts as a VPN connection into the Amazon cloud. So that you can access your storage as if it’s local storage. It comes under integrating on-premises storage in AWS.

Types of Storage solution:

1. File based (NFS)

2. Volume based

3. Tape based

Conclusion: Cloud storage is a critical component of cloud computing because it holds the information used by applications. Big data analytics, data warehouses, Internet of Things (IoT), databases, and backup and archive applications all rely on some form of data storage architecture.

Cloud storage is typically more reliable, scalable, and secure than traditional on-premises storage systems. AWS offers a complete range of cloud storage services to support both application and archival compliance requirements. Usage patterns, performance, durability and availability, scalability and elasticity, security, interface, and cost models are outlined and described for these cloud storage services. While this gives you a better understanding of the features and characteristics of these cloud services, it is crucial for you to understand your workloads and requirements then decide which storage service is best suited for your needs.

--

--