Which of the below would be the most cost efficient way to reduce the runtime of the job?

You require the ability to analyze a large amount of data, which is stored on Amazon S3 using Amazon Elastic
Map Reduce. You are using the cc2 8x large Instance type, whose CPUs are mostly idle during processing.
Which of the below would be the most cost efficient way to reduce the runtime of the job?

You require the ability to analyze a large amount of data, which is stored on Amazon S3 using Amazon Elastic
Map Reduce. You are using the cc2 8x large Instance type, whose CPUs are mostly idle during processing.
Which of the below would be the most cost efficient way to reduce the runtime of the job?

A.
Create more smaller flies on Amazon S3.

B.
Add additional cc2 8x large instances by introducing a task group.

C.
Use smaller instances that have higher aggregate I/O performance.

D.
Create fewer, larger files on Amazon S3.



Leave a Reply 4

Your email address will not be published. Required fields are marked *


DakkuDaddy

DakkuDaddy

C. Use smaller instances that have higher aggregate I/O performance.

https://aws.amazon.com/elasticmapreduce/faqs/

A,D- Irrelevant
B- Adding more , C’mon the situation is idle, reducing would be the option!

This is the only line relevant to understanding to support C, it talks about if you need more but here the situation is for idle so think accordingly:

As a general guideline, we recommend that you limit 60% of your disk space to storing the data you will be processing, leaving the rest for intermediate output. Hence, given 3x replication on HDFS, if you were looking to process 5 TB on m1.xlarge instances, which have 1,690 GB of disk space, we recommend your cluster contains at least (5 TB * 3) / (1,690 GB * .6) = 15 m1.xlarge core nodes. You may want to increase this number if your job generates a high amount of intermediate data or has significant I/O requirements.

Srinivasu M

Srinivasu M

C.
Use smaller instances that have higher aggregate I/O performance.