Which algorithm should the data mining model use?

You design a Business Intelligence (BI) solution by using SQL Server 2008. The solution contains a SQL Server 2008 Analysis Services (SSAS) database. A measure group in the database contains log entries of manufacturing events. These events include accidents, machine failures, production capacity metrics, and other activities. You need to implement a data mining model that meets the following requirements:
-Predict the frequency of different event types.
-Identify short-term and long-term patterns.
Which algorithm should the data mining model use?

You design a Business Intelligence (BI) solution by using SQL Server 2008. The solution contains a SQL Server 2008 Analysis Services (SSAS) database. A measure group in the database contains log entries of manufacturing events. These events include accidents, machine failures, production capacity metrics, and other activities. You need to implement a data mining model that meets the following requirements:
-Predict the frequency of different event types.
-Identify short-term and long-term patterns.
Which algorithm should the data mining model use?

A.
the Microsoft Time Series algorithm

B.
the Microsoft Decision Trees algorithm

C.
the Microsoft Linear Regression algorithm

D.
the Microsoft Logistic Regression algorithm

Explanation:
Tip: "Predict the frequency" = "Time Series"

Microsoft Time Series Algorithm
Microsoft Time Series is used to impact a common business problem, accurate forecasting. This algorithm is often used to predict future values, such as rates of sale for a particular product. Most often the inputs are continuous values. To use this algorithm, your source data must contain at one column marked as Key Time. Any predictable columns must be of type Continuous. You can select one or more inputs as predictable columns when using this algorithm.
Time series source data can also contain an optional Key Sequence column.
Function
The ARTxp algorithm has proved to be very good at short-term prediction. The ARIMA algorithm is much better at longer-term prediction. By default, the Microsoft Time Series algorithm blends the results of the two algorithms to produce the best prediction for both the short and long term.
Microsoft Decision Trees Algorithm
Microsoft Decision Trees is probably the most commonly used algorithm, in part because of its flexibilitydecision trees work with both discrete and continuous attributesand also
because of the richness of its included viewers. Its quite easy to understand the output via these viewers. This algorithm is used to both view and to predict. It is also used (usually in
conjunction with the Microsoft Clustering algorithm) to find deviant values. The Microsoft Decision Trees algorithm processes input data by splitting it into recursive (related) subsets.
In the default viewer, the output is shown as a recursive tree structure.
If you are using discrete data, the algorithm identifies the particular inputs that are most closely correlated with particular predictable values, producing a result that shows which columns
are most strongly predictive of a selected attribute. If you are using continuous data, the algorithm uses standard linear regression to determine where the splits in the decision tree occur.
Clicking a node displays detailed information in the Mining Legend window. You can configure the view using the various drop-down lists at the top of the viewer, such as Tree,
Default Expansion, and so on. Finally, if youve enabled drillthrough on your model, you can display the drillthrough informationeither columns from the model or (new to SQL Server 2008) columns from the mining structure, whether or not they are included in this model.
Microsoft Linear Regression Algorithm
Microsoft Linear Regression is a variation of the Microsoft Decision Trees algorithm, and works like classic linear regressionit fits the best possible straight line through a series of points (the sources being at least two columns of continuous data). This algorithm calculates all possible relationships between the attribute values and produces more complete results than other (nondata mining) methods of applying linear regression. In addition to a key column, you can use only columns of the continuous numeric data type. Another way to understand this is that it disables splits. You use this algorithm to be able to visualize the relationship between two continuous attributes. For example, in a retail scenario, you might want to create a trend line between physical placement locations in a retail store and rate of sale for items. The algorithm result is similar to that produced by any other linear regression method in that it produces a trend line. Unlike most other methods of calculating linear regression, the Microsoft Linear Regression algorithm in SSAS calculates all possible relationships between all input dataset values to produce its results. This differs from other methods of calculating linear regression, which generally use progressive splitting techniques between the source inputs

(Smart Business Intelligence Solutions with Microsoft SQL Server 2008, Copyright 2009 by Kevin Goff and Lynn Langit)



Leave a Reply 0

Your email address will not be published. Required fields are marked *