You have recently joined a startup company building sensors to measure street noise and air
quality in urban areas.
The company has been running a pilot deployment of around 100 sensors for 3 months. Each
sensor uploads 1KB of sensor data every minute to a backend hosted on AWS. During the pilot,
you measured a peak of 10 IOPS on the database, and you stored an average of 3GB of sensor
data per month in the database.
The current deployment consists of a load-balanced, auto scaled Ingestion layer using EC2
instances, and a PostgreSQL RDS database with 500GB standard storage
The pilot is considered a success and your CEO has managed to get the attention of some
potential Investors. The business plan requires a deployment of at least 100k sensors which
needs to be supported by the backend.
You also need to store sensor data for at least two years to be able to compare year over year
improvements.
To secure funding, you have to make sure that the platform meets these requirements and leaves
room for further scaling.
Which setup will meet the requirements?
A.
Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage
B.
Keep the current architecture, but upgrade RDS storage to 3TB and 10k provisioned IOPS
C.
Ingest data into a DynamoDB table and move old data to a Redshift cluster
D.
Add an SQS queue to the ingestion layer to buffer writes to the RDS Instance
Explanation:
The POC solution is being scaled up by 1000, which means it will require 72TB of Storage to
retain 24 months worth of data. This rules out RDS as a possible DB solution which leaves you
with RedShift.
I believe DynamoDB is a more cost effective and scales better for ingest rather than using EC2 in
an autoscaling group.
Also, this example solution from AWS is some what similar for reference.
http://media.amazonwebservices.com/architecturecenter/AWS_ac_ra_timeseriesprocessing_16.p
df