You are the lead developer for a SQL Server 2008 data warehousing project. The source database for the project is an online transaction processing (OLTP) system. The OLTP system executes 4,000 transactions every minute during business hours. The OLTP system records only the date and time of insertion of a new row and not for the updates of existing rows.
You plan to design an extract, transform, and load (ETL) process for the project that populates a data warehouse from the source database.
The ETL process must be configured in the following manner:
-To run after business hours
-To capture new rows and existing rows that have been modified
You need to ensure that only new rows or modified rows from the database tables are processed by the ETL process.
What should you do?
A.
Configure the data warehouse database to support the Type I Slowly Changing Dimension transformation.
B.
Configure the data warehouse database to support the Type II Slowly Changing Dimension transformation.
C.
Configure the Change Data Capture feature on all the source database tables that will be processed by the ETL process.
D.
Configure the Change Data Capture feature on all the data warehouse database tables that will be processed by the ETL process.
Explanation:
Tip: "only new rows or modified" = "Change Data Capture … source database"Change Data Capture
One of the biggest challenges of the Extract, Transform, and Load (ETL) process is determining which records need to be extracted from the source data and loaded into the data mart. For smaller dimensional tables that are not used to populate slowly changing dimensions, we may choose to truncate the target table and refill it with all of the data from the source with every load
There are several methods for determining which data has changed since the last extract. They include:
c Adding create and last update fields to the database table
c Adding flag fields to indicate when records have been extracted
c Creating triggers or stored procedures to replicate changes to change capture tables
If our source data is coming from a SQL Server 2008 database, we have a new feature to make this process much easier. That feature is known as change data capture (CDC).
The transaction information is converted into a more readily usable format and stored in a change table. One change table is created for each table that is being tracked by change data capture.
(McGraw-Hill – Delivering Business Intelligence with Microsoft SQL Server 2008 (2009))http://msdn.microsoft.com/en-us/library/bb522489.aspx
Change Data Capture
Change data capture is designed to capture insert, update, and delete activity applied to SQL Server tables, and to make the details of the changes available in an easily consumed relational format. The change tables used by change data capture contain columns that mirror the column structure of a tracked source table, along with the metadata needed to understand the changes that have occurred.
Change data capture is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.
Change data capture provides information about DML changes on a table and a database. By using change data capture, you eliminate expensive techniques such as user triggers, timestamp columns, and join queries.