Your department creates regular analytics reports from your company’s log files All log data is collected in
Amazon S3 and processed by daily Amazon Elastic MapReduce (EMR) jobs that generate daily PDF reports and
aggregated tables in CSV format for an Amazon Redshift data warehouse.
Your CFO requests that you optimize the cost structure for this system.
Which of the following alternatives will lower costs without compromising average performance of the system
or data integrity for the raw data?
A.
Use reduced redundancy storage (RRS) for PDF and csv data in Amazon S3. Add Spot instances to Amazon
EMR jobs Use Reserved Instances for Amazon Redshift.
B.
Use reduced redundancy storage (RRS) for all data in S3. Use a combination of Spot instances and Reserved
Instances for Amazon EMR jobs use Reserved instances for Amazon Redshift.
C.
Use reduced redundancy storage (RRS) for all data in Amazon S3 Add Spot Instances to Amazon EMR jobs
Use Reserved Instances for Amazon Redshitf.
D.
Use reduced redundancy storage (RRS) for PDF and csv data in S3 Add Spot Instances to EMR jobs Use Spot
Instances for Amazon Redshift.
My answer would be A.
B,C – RRS S3 for ‘ALL’ data may not be recommended. If log files are lost, then they cannot be recovered. Whereas PDF and CSV can be regenerated even if lost.
D – Spot instances for Redshift is not possible i think
option A is using only spot instances for EMR which is not advisable
I think its B
Answer is A – Agree with Sandeep
A. Use reduced redundancy storage (RRS) for PDF and csv data in Amazon S3. Add Spot instances to Amazon
EMR jobs Use Reserved Instances for Amazon Redshift.
C- not possible as it is for temporary purpose
core nodes should be reserved for the capacity that is required until your cluster completes(temporary)
EMR uses spot instances, only AWS GovCloud (US) region does not support spot instances.
B,c- in any case not recommended RRS all Data
D-It is not possible as Redshift recommends reserved instances.
Reserved Instances (a.k.a. Reserved Nodes) are appropriate for steady-state production workloads, and offer significant discounts over On-Demand pricing.
https://aws.amazon.com/redshift
Last but not the least its A because :
Q: What are some EMR best practices?
If you are running EMR in production you should specify an AMI version, Hive version, Pig version, etc. to make sure the version does not change unexpectedly (e.g. when EMR later adds support for a newer version). If your cluster is mission critical, only use Spot instances for task nodes because if the Spot price increases you may lose the instances. In development, use logging and enable debugging to spot and correct errors faster. If you are using GZIP, keep your file size to 1–2 GB because GZIP files cannot be split. Click here to download the white paper on Amazon EMR best practices.
https://aws.amazon.com/elasticmapreduce/faqs/
100% A
B and C is for all data in RRS so nogo
D use spot instance nogo
in A we only add more instances (here spot)
A is the right answer.
B and C are wrong because you shouldn’t use RRS for ALL data.
D is wrong because you can’t use Spot Instances for Redshift. It is the same as RDS – Redshift is always up and running, not something you launch and terminate at any time.
B
A
A