You are performing exploratory analysis of files that are encoded in a complex proprietary format. The format
requires disk intensive access to several dependent files in HDFS.
You need to build an Azure Machine Learning model by using a canopy clustering algorithm. You must ensure
that changes to proprietary file formats can be maintained by using the least amount of effort.
Which Machine Learning library should you use?
A.
MicrosoftML
B.
scikit-learn
C.
SparkR
D.
Mahout
HDFS
・Hadoop Distributed File System
Apache Mahout is one of many Hadoop-related projects at Apache. Its mission is to build a scalable machine learning and data mining library.
In other words, Mahout provides data science tools useful for detecting meaningful patterns in given data sets that are stored in HDFS (Hadoop Distributed File System).
One of the Microsoft HDInsight key components is Mahout, a scalable machine learning library that provides a number of algorithms relying on the Hadoop platform.
Mahout is based on three “C-pillars” of machine learning implementations:
Collaborative filtering (aka recommendation),
Clustering, and
Classification.