Architecting applications on AWS is challenging. On the one hand, you need a broad understanding of AWS services. On the other hand, you have to know the details as well. In this blog post, I outline the mindset you need to build on AWS successfully.
Without the details, you might miss that one of the services is unavailable in your region or not in scope in the required compliance framework. Ignoring the details can also lead to high costs or reduced availability.
Regional availability of services
Not all AWS services are available in all regions. Newer services, as well as smaller regions, tend to have lower service coverage. Regions like us-east-1 and eu-west-1 usually have the highest coverage.
AWS publishes a list of supported services per region. Unfortunately, the list does not allow comparing multiple regions. A community project solves that need.
There is still a slight chance that a single feature is not available in your region. Unfortunately, there is no table at that granularity. You have to read the docs carefully or test it out!
Availability
Some AWS services are distributed worldwide by default. Examples are Route 53 or DynamoDB Global Tables. Other services are regional. A regional service is distributed across multiple availability zones within one region. Examples are S3 buckets, SQS queue, SNS topics. The last category of services is zonal. A zonal service runs in a single availability zone. Examples are EC2 instances and EBS volumes.
Depending on your availability requirements, you have to select the services with care or implement additional measures to achieve higher availability. For example, run multiple EC2 instances in an Auto Scaling Group spread across availability zones.
Costs
AWS services are, in almost all cases, priced based on usage. To create a cost-efficient architecture, you have to familiarize yourself with the cost model of each service.
Don’t assume that services are priced in the same way. They all have their own pricing model using different dimensions. Also, keep in mind that prices vary in each region.
One big factor that is often omitted is that you pay for traffic. The network design has a considerable impact on your bill. A rule of thumb: As long as the pricing page does not explicitly state traffic is free, you pay for it.
Compliance
Are you tasked to create an architecture in compliance with a framework (e.g., SOC)? You might read some marketing material that tells you: AWS is SOC compliant!? Only the services and regions in scope are. For example, ElastiCache for Memcached is not in scope as well as the regions in China.
The latest SOC update includes 133 services (out of over 200) and 23 regions (out of 25). This leads us to the question: What is in scope? Rule of thumb: New services and regions aren’t. You can find a list here.
To see if your region is in scope, you have to look at AWS’s detailed PDF report. The details also include a list of edge locations that are in scope.
Quotas
AWS protects your bill and its infrastructure with quotas. By default, you are not allowed to launch 100 EC2 instances or create 100 VPCs.
Some quotas can be increased while others can’t. For example, if your architecture relies on creating a VPC per customer, you should know the upper limit for that quota. If your AWS bill is significant enough for AWS, they might make exceptions for you. But you should know that upfront.
Limitations
Every AWS service has limitations. One of my favorite limitations is the network throughput of EC2 instances. If your EC2 instance has guaranteed 10 Gbit/s connectivity, you only get 5Gbit/s to and from the Internet. Surprising, right? If you miss that detail, a traffic-bound architecture will become twice expensive as you thought.
I share another limitation with you: EC2 blocks outbound traffic on port 25 by default. No matter what your security group says. You will have a hard time debugging this.
For every AWS service that you use, make sure to read the FAQs and docs carefully. Unfortunately, AWS makes it harder to find the limitations every year.
SLAs
EC2 instances have an SLA of 99.99%?! If you look closer, 99.99% does not apply to a single instance. The SLO 99.99% assumes that EC2 instances are deployed concurrently across two or more AZs in the same region.
The SLO for a single instance is 99.5%. The question now is what happens if AWS misses the SLO. And that’s where the SLA gets interesting. If AWS misses an SLO, you earn credits worth 10% of your EC2 costs in that month. That’s it. If AWS misses the SLO badly, AWS provides up to 100% in credits.
Other services provide different or no SLAs at all. Keep that in mind.
IAM Capabilities
Access to AWS services is granted by the IAM service. You control access by defining policies. Your architecture might rely on IAM to achieve tenant isolation.
Unfortunately, not all AWS services support the same set of features. Some services allow you to limit access to specific resources. Others allow you to restrict access based on tags in policies. But don’t take this for granted. Double-check the AWS IAM docs.
Summary
To select the best services for your architecture, you need a broad understanding of all AWS services. Wide enough to know what services are good for. But you also need a deep understanding of each service you include in your architecture.
I listed many reasons why the details matter. I hope my tips help you to avoid common pitfalls when architecting on AWS.