Which type of cluster should you identify?

Note: This question is part of a series of questions that use the same scenario. For your convenience, the
scenario is repeated in each question. Each question presents a different goal and answer choices, but the text
of the scenario is exactly the same in each question in this series.
You have an initial dataset that contains the crime data from major cities.
You plan to build training models from the training data. You plan to automate the process of adding more data
to the training models and to constantly tune the models by using the additional data, including data that is
collected in near real-time. The system will be used to analyze event data gathered from many different
sources, such as Internet of Things (IoT) devices, live video surveillance, and traffic activities, and to generate
predictions of an increased crime risk at a particular time and place.
You have an incoming data stream from Twitter and an incoming data stream from Facebook, which are eventbased only, rather than time-based. You also have a time interval stream every 10 seconds.
The data is in a key/value pair format. The value field represents a number that defines how many times a
hashtag occurs within a Facebook post, or how many times a Tweet that contains a specific hashtag is
retweeted.
You must use the appropriate data storage, stream analytics techniques, and Azure HDInsight cluster types for
the various tasks associated to the processing pipeline.
You are designing the real-time portion of the input stream processing. The input will be a continuous stream of
data and each record will be processed one at a time. The data will come from an Apache Kafka producer.
You need to identify which HDInsight cluster to use for the final processing of the input data. This will be used to
generate continuous statistics and real-time analytics. The latency to process each record must be less than
one millisecond and tasks must be performed in parallel.
Which type of cluster should you identify?

Note: This question is part of a series of questions that use the same scenario. For your convenience, the
scenario is repeated in each question. Each question presents a different goal and answer choices, but the text
of the scenario is exactly the same in each question in this series.
You have an initial dataset that contains the crime data from major cities.
You plan to build training models from the training data. You plan to automate the process of adding more data
to the training models and to constantly tune the models by using the additional data, including data that is
collected in near real-time. The system will be used to analyze event data gathered from many different
sources, such as Internet of Things (IoT) devices, live video surveillance, and traffic activities, and to generate
predictions of an increased crime risk at a particular time and place.
You have an incoming data stream from Twitter and an incoming data stream from Facebook, which are eventbased only, rather than time-based. You also have a time interval stream every 10 seconds.
The data is in a key/value pair format. The value field represents a number that defines how many times a
hashtag occurs within a Facebook post, or how many times a Tweet that contains a specific hashtag is
retweeted.
You must use the appropriate data storage, stream analytics techniques, and Azure HDInsight cluster types for
the various tasks associated to the processing pipeline.
You are designing the real-time portion of the input stream processing. The input will be a continuous stream of
data and each record will be processed one at a time. The data will come from an Apache Kafka producer.
You need to identify which HDInsight cluster to use for the final processing of the input data. This will be used to
generate continuous statistics and real-time analytics. The latency to process each record must be less than
one millisecond and tasks must be performed in parallel.
Which type of cluster should you identify?

A.
Apache Storm

B.
Apache Hadoop

C.
Apache HBase

D.
Apache Spark

Explanation:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-storm-overview



Leave a Reply 1

Your email address will not be published. Required fields are marked *