You are a solutions architect working for a company that specializes in ingesting large data feeds (using
Kinesis) and then analyzing these feeds using Elastic Map Reduce (EMR). The results are then stored on a
custom MySQL database which is hosted on an EC2 instance which has 3 volumes, the root/boot volume, and
then 2 additional volumes which are striped in to a RAID 1. Your company recently had an outage and lost
some key data and have since decided that they will need to run nightly back ups. Your application is only used
during office hours, so you can afford to have some down time in the middle of the night if required. You decide
to take a snapshot of all three volumes every 24 hours. In what manner should you do this?
A.
Take a snapshot of each volume independently, while the EC2 instance is running.
B.
Stop the EC2 instance and take a snapshot of each EC2 instance independently. Once the snapshots are
complete, start the EC2 instance and ensure that all relevant volumes are remounted.
C.
Add two additional volumes to the existing RAID 0 volume and mirror these volumes creating a RAID 10.
Take a snap of only the two new volumes.
D.
Create a read replica of the existing EC2 instance and then take your snapshots from the read replica and
not the live EC2 instance.
Why A is not right?
We don’t need to shutdown EC2 for a snapshot after all.
If the instance is not shutdown, any data available in the cache will not be a part of the snapshot. Snapshots are taken on the data available in the volumes. Due to the statement – “Your application is only used
during office hours, so you can afford to have some down time in the middle of the night if required” – Hence we take snapshots after the system is shutdown
Thanks for the good explanation. Very helpful.
Hey Jeremy!
Will the data on root volumes be lost when instance stopped?
Ans: B.
In the real world, we would advise D but since it can be shut down after hours, B is the cheaper option. Go with B.
“Custom” DB so AWS “read replica” isn’t an option because it’s not managed by AWS managed services.
B
Regarding D, does not read replica apply only to RDS.
Only if you would have instance storage (ephermal), with EBS not