You design a Business Intelligence (BI) solution by using SQL Server 2008. Your solution includes a data mining structure that uses SQL Server 2008 Analysis Services (SSAS) as its data source. The measure groups use 100 percent multidimensional online analytical processing (MOLAP) storage.
You need to provide detailed information on the training and test data to ensure the accuracy of the mining model. You also need to minimize the time required to create the training and test data. Which two tasks should you perform? (Each correct answer presents part of the solution. Choose two.)
A.
Perform cross-validation queries to the test and training data.
B.
Create a new mining structure that has a holdout value.
C.
Create a SQL Sever 2008 Integration Services (SSIS) package that partitions test and training datasets and merges case and nested tables.
D.
Use a Sort Data Flow transformation.
E.
Use an ORDER BY clause in the Data Flow source query. Define a SortKeyPosition ordinal key for the appropriate output column.
Explanation:
Tip: "MOLAP … accuracy … minimize the time" = "cross-validation" … "new mining structure"Cross Validation
The cross validation tool was added specifically to address requests from enterprise customers. Keep in mind that cross validation does not require separate training and testing datasets. You can use testing data, but you wont always need to. This elimination of the need for holdout (testing) data can make cross validation more convenient to use for data mining model validation. Cross validation works by automatically separating the source data into partitions of equal size. It then performs iterative testing against each of the partitions and shows the results in a detailed output grid. Cross validation works according to the value specified in the Fold Count parameter on the Cross Validation tab of the Mining Accuracy Chart tab in BIDS. The default value for this parameter is 10, which equates to 10 sets. If youre using temporary mining models to cross validate in Excel 2007, 10 is the maximum number of allowable folds. If youre using BIDS, the maximum number is 256. Of course, a greater number of folds equates to more processing overhead.
You can also implement cross validation using newly introduced stored procedures.A reason to use the new cross-validation capability is that its a quick way to perform validation using multiple mining models as source inputs.
Note Cross validation cannot be used to validate models built using the Time Series or Sequence Clustering algorithms. This is logical if you think about it because both of these algorithms depend on sequences and if the data was partitioned for testing, the validity of the sequence would be violated
(Smart Business Intelligence Solutions with Microsoft SQL Server 2008, Copyright 2009 by Kevin Goff and Lynn Langit)