A customer has a 10 GB AWS Direct Connect connection to an AWS region where they have a web application
hosted on Amazon Elastic Computer Cloud (EC2). The application has dependencies on an on-premises
mainframe database that uses a BASE (Basic Available. Sort stale Eventual consistency) rather than an ACID
(Atomicity. Consistency isolation. Durability) consistency model. The application is exhibiting undesirable
behavior because the database is not able to handle the volume of writes. How can you reduce the load on
your on-premises database resources in the most cost-effective way?
A.
Use an Amazon Elastic Map Reduce (EMR) S3DistCp as a synchronization mechanism between the onpremises database and a Hadoop cluster on AWS.
B.
Modify the application to write to an Amazon SQS queue and develop a worker process to flush the queue
to the on-premises database.
C.
Modify the application to use DynamoDB to feed an EMR cluster which uses a map function to write to the
on-premises database.
D.
Provision an RDS read-replica database on AWS to handle the writes and synchronize the two databases
using Data Pipeline.
Explanation:
https://aws.amazon.com/blogs/aws/category/amazon-elastic-map-reduce/
The answer would be B for this question. The option A doesn’t make any sense because we are relating a database and we know that we can’t have database on S3.
To support my answer, look at the benefits the SQS offers. SQS is pull based message mechanism which delivers at least once to the respective recipient and it can support any number of messages but the there is limit on the size of message it is 256KB. It will support high volume of writes. We can attach the SQS to the application from which we receive high number of write operations.
As SQS is the queuing mechanism which is organized the total write request and pass on to on-premises DB which is oppose to come all request together , however SQS will not change the volume of writes.
SQS will mange writes so not all writes happened simultaneously.
Using S3DistCp, you can efficiently copy a large amount of data from Amazon S3 into the HDFS datastore of your cluster.
So B will be suitable.
b
answer is b
https://aws.amazon.com/sqs/faqs/
answer is B
B
B
yrrr option B take the help of worker , and in option A there is mechansim, which Is more cost effective ???
B should be the answer.
A & C utilize AWS elastic map reduce’s technologies which I could not find any relationship to the question’s requirement.
D utilizes “synchronize the two databases using Data Pipeline” but this way, customer need to store database at both side: on-premise DB, and AWS’ RDS DB, hence violate it’s prior requirement of “…mainframe database that uses a BASE…”
B is the answer.
why not A?
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html
Apache DistCp is an open-source tool you can use to copy large amounts of data. DistCp uses MapReduce to copy in a distributed manner—sharing the copy, error handling, recovery, and reporting tasks across several servers. For more information about the Apache DistCp open source project,
EMR is a solution for taking processing of a large data set by splitting up the wok and then merging the results. If EMR is sitting behind your web server it’s because it is delivering some sort of reporting or analytics. This question appears to be a transactional system that is having write IO issues. While data ingest is something that EMR does well, it is built to plow through large data sets and produce results. Maybe with some sets of data EMR could be used as part of the ingest process to structure the data in a manner that it can be more easily ingested by the target, but we are not guaranteed that it would even have all of the data needed to process the input data
The bigger clue may be that it says we should use this as a synchronization mechanism. What synchronization capability exists between EMR and this unnamed legacy database?
In this example, SQS allows the back end database to ingest at a pace that it can handle and still remain consistent. You still have to assume that it will have a time where it can eventually catch up.
A.
Use an Amazon Elastic Map Reduce (EMR) S3DistCp as a synchronization mechanism between the onpremises database and a Hadoop cluster on AWS.
–Not cost effective
C.
Modify the application to use DynamoDB to feed an EMR cluster which uses a map function to write to the
on-premises database.
— I think the DynamoDB here is just to distract and focus on BASE. It may be suitable but it complicates and there is additional cost.
D.
Provision an RDS read-replica database on AWS to handle the writes and synchronize the two databases
using Data Pipeline.
— RDS read-replica are for MySQL, MariaDB, and PostgreSQL. Not applicable here. Easily ruled out.
The correct answer should be :
B.
Modify the application to write to an Amazon SQS queue and develop a worker process to flush the queue
to the on-premises database.
i) Its BASE so we can use SQS and there is no hurry to write/read the data – Eventual consistency model.
ii) cost effective as that is the only item that is introduced here.
Answer is B
Most of the answers at the top are wrong. I’ve gone through the trouble of correcting all 400 of them for my own study purposes. If you would like a digital copy of this dump please send $40 to paypal.me/lyannabear