You are building an Azure Machine Learning experiment.
You need to transform 47 numeric columns into a set of 10 linearly uncorrelated features.
Which module should you add to the experiment?
A.
Principal Component Analysis
B.
K-Means Clustering
C.
Normalize Data
D.
Group Data into Bins
QUESTION
Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in this series.
You plan to create a predictive analytics solution for credit risk assessment and fraud prediction in Azure Machine Learning. The Machine Learning workspace for the solution will be shared with other users in your organization. You will add assets to projects and conduct experiments in the workspace.
The experiments will be used for training models that will be published to provide scoring from web services.
The experiment for fraud prediction will use Machine Learning modules and APIs to train the models and will predict probabilities in an Apache Hadoop ecosystem.
You plan to configure the resources for part of a workflow that will be used to preprocess data from files stored in Azure Blob storage. You plan to use Python to preprocess and store the data in Hadoop.
You need to get the data into Hadoop as quickly as possible.
Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Create an Azure virtual machine (VM), and then configure MapReduce on the VM.
B. Create an Azure HDInsight Hadoop cluster.
C. Create an Azure virtual machine (VM), and then install an IPython Notebook server.
D. Process the files by using Python to store the data to a Hadoop instance.
E. Create the Machine learning experiment, and then add an Execute Python Script module.
Answer: BDE
QUESTION
Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question.
You need to use only one percent of an Apache Hive data table by conducting random sampling by groups.
Which module should you use?
A. Execute Python Script
B. Tune Model Hyperparameters
C. Normalize Data
D. Select Columns in Dataset
E. Import Data
F. Edit Metadata
G. Clip Values
H. Clean Missing Data
Answer: A
QUESTION
You are building an Azure Machine Learning solution for an online retailer.
When a customer selects a product, you need to recommend products that the customer might like to purchase at the same time. The recommendation should be based on what other customers purchased when they purchased the same product.
Which model should you use?
A. Collaborative filtering
B. Boosted Decision Tree Regression model
C. Two-Class boosted decision tree
D. K-Means Clustering
Answer: A
QUESTION
You plan to use Azure Machine Learning to develop a predictive model.
You plan to include an Execute Python Script module.
What capability does the module provide?
A. Outputting a file to a network location.
B. Performing interactive debugging of a Python script.
C. Saving the results of a Python script run in a Machine Learning environment to a local file.
D. Visualizing univariate and multivariate summaries by using Python code.
Answer: D
QUESTION
You have an Azure Machine Learning experiment. You discover that a model causes many errors in a production dataset. The model causes only few errors in the training data.
What is the cause of the errors?
A. overfitting
B. generalization
C. underfitting
D. a simple predictor
Answer: A
2018 new 70-774 PDF and VCE Dumps: https://www.braindump2go.com/70-774.html
Principal Component Analysis
・The module analyzes your data and creates a reduced feature set that captures all the information contained in the dataset, but in a smaller number of features.
K-means
・K-means is one of the simplest and the best known unsupervised learning algorithms, and can be used for a variety of machine learning tasks, such as detecting abnormal data, clustering of text documents, and analysis of a dataset prior to using other classification or regression methods.
Normalize Data
・The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information.
Group Data into Bins
・Can be used to group numbers or change the distribution of continuous data. You can customize how the bin edges are set and how values are apportioned into the bins.