You require the ability to analyze a customer’s clickstream data on a website so they can do behavioral
analysis. Your customer needs to know what sequence of pages and ads their customer clicked on. This data
will be used in real time to modify the page layouts as customers click through the site to increase stickiness
and advertising click-through. Which option meets the requirements for captioning and analyzing this data?
A.
Log clicks in weblogs by URL store to Amazon S3, and then analyze with Elastic MapReduce
B.
Push web clicks by session to Amazon Kinesis and analyze behavior using Kinesis workers
C.
Write click events directly to Amazon Redshift and then analyze with SQL
D.
Publish web clicks by session to an Amazon SQS queue men periodically drain these events to Amazon RDS
and analyze with sol
Explanation:
http://www.slideshare.net/AmazonWebServices/aws-webcast-introduction-to-amazon-kinesis
Option B
Key word is real time!
Real time = Kinesis.
B.
Push web clicks by session to Amazon Kinesis and analyze behavior using Kinesis workers
OK ‘real time’ is mentioned => Kinesis
i think its ‘A’
refer this architecture diagram
http://media.amazonwebservices.com/architecturecenter/AWS_ac_ra_adserving_06.pdf
In step three “… This information is contained in the log files of the clickthrough web servers, which are periodically uploaded to Amazon S3.” So, it says “periodically” and we need “This data will be used in real time ..”.
So, this design is not suitable for real time. I think the answer is B.
Thoughts?
Agree with Saad, Kinesis Analytics seems like a good fit here, purely because of the phrase “real-time”. If the answers weren’t needed in real-time, Andy’s citation of the published reference architecture would be a good fit.
AWS articles:
https://aws.amazon.com/blogs/big-data/real-time-clickstream-anomaly-detection-with-amazon-kinesis-analytics/
B is the right answer.
Use Amazon Kinesis Streams to collect and process large streams of data records in real time.
http://docs.aws.amazon.com/streams/latest/dev/introduction.html
B. Push web clicks by session to Amazon Kinesis and analyze behavior using Kinesis workers
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.