Which of the below mentioned options is the most cost e…

With Amazon Elastic MapReduce (Amazon EMR) you can analyze and process vast amounts of
data. The cluster is managed using an open-source framework called Hadoop. You have set up
an application to run Hadoop jobs. The application reads data from DynamoDB and generates a
temporary file of 100 TBs.
The whole process runs for 30 minutes and the output of the job is stored to S3. Which of the
below mentioned options is the most cost effective solution in this case?

With Amazon Elastic MapReduce (Amazon EMR) you can analyze and process vast amounts of
data. The cluster is managed using an open-source framework called Hadoop. You have set up
an application to run Hadoop jobs. The application reads data from DynamoDB and generates a
temporary file of 100 TBs.
The whole process runs for 30 minutes and the output of the job is stored to S3. Which of the
below mentioned options is the most cost effective solution in this case?

A.
Use Spot Instances to run Hadoop jobs and configure them with EBS volumes for persistent data
storage.

B.
Use Spot Instances to run Hadoop jobs and configure them with ephermal storage for output file
storage.

C.
Use an on demand instance to run Hadoop jobs and configure them with EBS volumes for
persistent storage.

D.
Use an on demand instance to run Hadoop jobs and configure them with ephemeral storage for
output file storage.

Explanation:
AWS EC2 Spot Instances allow the user to quote his own price for the EC2 computing capacity.
The user can simply bid on the spare Amazon EC2 instances and run them whenever his bid
exceeds the current Spot Price. The Spot Instance pricing model complements the On-Demand
and Reserved Instance pricing models, providing potentially the most cost-effective option for
obtaining compute capacity, depending on the application. The only challenge with a Spot
Instance is data persistence as the instance can be terminated whenever the spot price exceeds
the bid price. In the current scenario a Hadoop job is a temporary job and does not run for a
longer period. It fetches data from a persistent DynamoDB. Thus, even if the instance gets
terminated there will be no data loss and the job can be re-run. As the output files are large
temporary files, it will be useful to store data on ephermal storage for cost savings.
http://aws.amazon.com/ec2/purchasing-options/spot-instances/



Leave a Reply 0

Your email address will not be published. Required fields are marked *