A newspaper organization has a on-premises application which allows the public to search its back catalogue
and retrieve individual newspaper pages via a website written in Java They have scanned the old newspapers
into JPEGs (approx 17TB) and used Optical Character Recognition (OCR) to populate a commercial search
product. The hosting platform and software are now end of life and the organization wants to migrate Its
archive to AWS and produce a cost efficient architecture and still be designed for availability and durability
Which is the most appropriate?
A.
Use S3 with reduced redundancy lo store and serve the scanned files, install the commercial search
application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer.
B.
Model the environment using CloudFormation use an EC2 instance running Apache webserver and an open
source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index.
C.
Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query
processing, and use Elastic Beanstalk to host the website across multiple availability zones.
D.
Use a single-AZ RDS MySQL instance lo store the search index 33d the JPEG images use an EC2 instance to
serve the website and translate user queries into SQL.
E.
Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current
commercial search product, along with a Java Container Tor the website on EC2 instances and use Route53
with DNS round-robin.
The answer is C. Cloud search is the perfect option for the search related content.
c
C is correct.
Can cloudsearch perform the optical character recognition?
I think the ocr tasks his offline in this question, not best way though.
They did the OCR to build their existing site. They already have an index that matches content to pages.
Cloud search is the perfect option for the search related content.
Answer C
Is this really a SAA question?
Yes! This a SAA question.