Your application is using an ELB in front of an Auto Scaling group of web/application servers
deployed across two AZs and a Multi-AZ RDS Instance for data persistence. The database CPU
is often above 80% usage and 90% of I/O operations on the database are reads. To improve
performance you recently added a single-node Memcached ElastiCache Cluster to cache
frequent DB query results. In the next weeks the overall workload is expected to grow by 30%.
Do you need to change anything in the architecture to maintain the high availability of the
application with the anticipated additional load? Why?
A.
Yes, you should deploy two Memcached ElastiCache Clusters in different AZs because the RDS
instance will not be able to handle the load if the cache node fails.
B.
No, if the cache node fails you can always get the same data from the DB without having any
availability impact.
C.
No, if the cache node fails the automated ElastiCache node recovery feature will prevent any
availability impact.
D.
Yes, you should deploy the Memcached ElastiCache Cluster with two nodes in the same AZ as
the RDS DB master instance to handle the load if one cache node fails.
Explanation:
A single-node Memcached ElastiCache cluster failure is nothing but a total failure. (Even though
AWS will automatically recover the failed node, there are no other nodes in the cluster)
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html
Mitigating Node Failures
To mitigate the impact of a node failure, spread your cached data over more nodes. Because
Memcached does not support replication, a node failure will always result in some data loss from
your cluster.
When you create your Memcached cluster you can create it with 1 to 20 nodes, or more by
special request. Partitioning your data across a greater number of nodes means you’ll lose less
data if a node fails. For example, if you partition your data across 10 nodes, any single node
stores approximately 10% of your cached data. In this case, a node failure loses approximately
10% of your cache which needs to be replaced when a replacement node is created and
provisioned.
Mitigating Availability Zone FailuresTo mitigate the impact of an availability zone failure, locate your nodes in as many availability
zones as possible. In the unlikely event of an AZ failure, you will lose only the data cached in that
AZ, not the data cached in the other AZs.