Which Machine Learning library should you use?

You are performing exploratory analysis of files that are encoded in a complex proprietary format. The format
requires disk intensive access to several dependent files in HDFS.
You need to build an Azure Machine Learning model by using a canopy clustering algorithm. You must ensure
that changes to proprietary file formats can be maintained by using the least amount of effort.
Which Machine Learning library should you use?

You are performing exploratory analysis of files that are encoded in a complex proprietary format. The format
requires disk intensive access to several dependent files in HDFS.
You need to build an Azure Machine Learning model by using a canopy clustering algorithm. You must ensure
that changes to proprietary file formats can be maintained by using the least amount of effort.
Which Machine Learning library should you use?

A.
MicrosoftML

B.
scikit-learn

C.
SparkR

D.
Mahout

← Previous question

Next question →

Leave a Reply 2

rai

HDFS
・Hadoop Distributed File System

Apache Mahout is one of many Hadoop-related projects at Apache. Its mission is to build a scalable machine learning and data mining library.
In other words, Mahout provides data science tools useful for detecting meaningful patterns in given data sets that are stored in HDFS (Hadoop Distributed File System).

One of the Microsoft HDInsight key components is Mahout, a scalable machine learning library that provides a number of algorithms relying on the Hadoop platform.

Reply

rai

Mahout is based on three “C-pillars” of machine learning implementations:
Collaborative filtering (aka recommendation),
Clustering, and
Classification.

Reply