Which of the following alternatives will lower costs without compromising average performance of the system or data integrity for the raw data?

Your department creates regular analytics reports from your company’s log files All log data is collected in
Amazon S3 and processed by daily Amazon Elastic MapReduce (EMR) jobs that generate daily PDF reports and
aggregated tables in CSV format for an Amazon Redshift data warehouse.
Your CFO requests that you optimize the cost structure for this system.
Which of the following alternatives will lower costs without compromising average performance of the system
or data integrity for the raw data?

A.
Use reduced redundancy storage (RRS) for PDF and csv data in Amazon S3. Add Spot instances to Amazon
EMR jobs Use Reserved Instances for Amazon Redshift.

B.
Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved
Instances for Amazon EMR jobs use Reserved instances for Amazon Redshift.

C.
Use reduced redundancy storage (RRS) for all data in Amazon S3 Add Spot Instances to Amazon EMR jobs
Use Reserved Instances for Amazon Redshitf.

D.
Use reduced redundancy storage (RRS) for PDF and csv data in S3 Add Spot Instances to EMR jobs Use Spot
Instances for Amazon Redshift.

Show Hint

← Previous question

Next question →

DakkuDaddy

Answer is A – Agree with Sandeep

A. Use reduced redundancy storage (RRS) for PDF and csv data in Amazon S3. Add Spot instances to Amazon
EMR jobs Use Reserved Instances for Amazon Redshift.

C- not possible as it is for temporary purpose
core nodes should be reserved for the capacity that is required until your cluster completes(temporary)
EMR uses spot instances, only AWS GovCloud (US) region does not support spot instances.

B,c- in any case not recommended RRS all Data

D-It is not possible as Redshift recommends reserved instances.

Reserved Instances (a.k.a. Reserved Nodes) are appropriate for steady-state production workloads, and offer significant discounts over On-Demand pricing.

https://aws.amazon.com/redshift

Last but not the least its A because :

Q: What are some EMR best practices?

If you are running EMR in production you should specify an AMI version, Hive version, Pig version, etc. to make sure the version does not change unexpectedly (e.g. when EMR later adds support for a newer version). If your cluster is mission critical, only use Spot instances for task nodes because if the Spot price increases you may lose the instances. In development, use logging and enable debugging to spot and correct errors faster. If you are using GZIP, keep your file size to 1–2 GB because GZIP files cannot be split. Click here to download the white paper on Amazon EMR best practices.

https://aws.amazon.com/elasticmapreduce/faqs/