Note: This question is part of a series of questions that use the same scenario. For your convenience, the
scenario is repeated in each question. Each question presents a different goal and answer choices, but the text
of the scenario is exactly the same in each question in this series.
You have an initial dataset that contains the crime data from major cities.
You plan to build training models from the training data. You plan to automate the process of adding more data
to the training models and to constantly tune the models by using the additional data, including data that is
collected in near real-time. The system will be used to analyze event data gathered from many different
sources, such as Internet of Things (IoT) devices, live video surveillance, and traffic activities, and to generate
predictions of an increased crime risk at a particular time and place.
You have an incoming data stream from Twitter and an incoming data stream from Facebook, which are eventbased only, rather than time-based. You also have a time interval stream every 10 seconds.
The data is in a key/value pair format. The value field represents a number that defines how many times a
hashtag occurs within a Facebook post, or how many times a Tweet that contains a specific hashtag is
retweeted.
You must use the appropriate data storage, stream analytics techniques, and Azure HDInsight cluster types for
the various tasks associated to the processing pipeline.
You are planning a storage strategy for a large amount of analytic data used for the crime data analytics
system. The initial data load involves over 100 billion records, and more than two billion records will be added
daily.
You already created an Apache Hadoop cluster in HDInsight premium.
You need to implement the storage strategy to meet the following requirements:
The storage capacity must support 50 TB.
The storage must be optimized for Hadoop.
The data must be stored in its native format.
Enterprise-level security based on Active Directory must be supported.
What should you create?
A.
a virtual machine (VM) by using the Data Science Virtual Machine template for Windows that has premium
storage, a G-series size, and uses Microsoft SQL Server 2016 to store the data
B.
an Azure Data Lake Analytics service by using Azure PowerShell
C.
an Azure Data Lake Store account by using the Azure portal
D.
an Azure Blob storage account by using the Azure portal
Explanation:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal
More new 2018 70-775 Exam Dumps can be viewed here: http://www.hpdumps.com/?s=70-775&searchsubmit=Search