Note: New for version 22.214.171.1245: The introduction of the Memory Optimized property. This is a significant change and we encourage all users to please read the Memory Optimized Property section before configuring the component.
Dimension Merge SCD Transform
|Transform Icon||Transform Description|
|The Dimension Merge SCD Transform is a custom Data Flow component used within SSIS to handle slowly changing dimension processing commonly used in Data Warehouses.|
While SSDT/BIDS provides a standard SCD component, users may experience limitations such as:
- Slow performance
- Re-running the SCD Wizard can destroy parts of the data flow
- Cannot be used against all data providers
- Difficulty in troubleshooting the standard component’s decision making abilities
Task Factory’s Dimension Merge SCD Transform maintains the functionality of the original SCD component while also providing:
- Better performance
- Non-destructive editing capabilities
- The ability to use a variety of data providers
- Improved UI that empowers users to customize the component’s configuration and view/troubleshoot decision-making
- Input/Output auditing
- Improved logging capabilities
When using the Dimension Merge SCD Transform, users begin by connecting two of the following:
|Existing Dimension||This input provides data from a data warehouse's dimension table and must include the housekeeping columns (SCD2, effective, and expiry).|
Note: Although optional, it's recommended that users sort this input on the business key.
|Source System||This input provides data from the source OLTP system needed to compare and insert/update to the data warehouse's dimension table. Source system and existing dimension datatypes must match.|
Note: Although optional, it's recommended that users sort this input on the business key.
|Special Members||(Optional) In typical dimensions, special or unknown members may be used to link fact table rows to NULL or unknown dimension values. The special members input allows users to specify and maintain these special members in a data warehouse dimension table. (Special member rows should be treated as Type 1 changes to avoid duplicate versions of unknown values.) Like the previous two inputs, it's recommended the Special Members input also be sorted on the business key.|
As of version 126.96.36.1992, the Dimension Merge Slowly Changing Dimensions component includes a new property called Memory Optimized. This new property was introduced due to the retention of a large number of records to remain in memory in the original behavior. This caused memory usage of the package to grow significantly over the package execution time. By requiring the inputs to be sorted, this new model is able to process and remove rows from memory much earlier thus reducing the overall memory footprint and speeding up execution. While there is no setting within the user-interface, users can configure this option by right-clicking the Dimension Merge component and selecting Properties.
Note: the following additional information about the Memory Optimized property:
- When set to false, the component operates as it did in the past. Upgraded packages have this property set to False by default.
- New packages have this property set to True and have to meet certain restrictions (listed below). Otherwise users encounter errors.
- If the inputs are not marked as sorted, the component uses the original behavior, even if Memory Optimized is set to True.
- If the property is set to True, all input rows must be sorted by the Business Key using a binary sort order. Example:
Note: the ORDER BY and COLLATE Latin1_general_bin clauses on this product dimension. This should be used when sorting within the source query and setting the Is Sorted/Sort Key properties in the Advanced Editor. When using a file as input then users will need to add the SSIS sort component between the source and Dimension Merge component. Because this is a significant processing change, it's recommended that users test their packages with both settings before deployment to ensure performance is optimized based on their dataset and configuration.
Existing Dimension Input Column Definitions Tab
From this tab, users identify how each column in the existing dimension participates in the SCD transform. Columns can be defined as one of the following:
Note: You may see an error at the bottom stating there is no Business Key selected, the Business Key can be selected from the SCD Column Type column.
|Not Used||Declares the column to not be used in the SCD processing.|
|Business Key||Identifies the business key column.|
|Surrogate Key||Identifies the surrogate key column.|
|SCD0 (Send To Invalid Input)||Identifies a column for Type 0 changes. Any change is sent to the Invalid Input's output.|
|SCD0 (Ignore Changes)||Identifies a column for Type 0 changes. Any change is ignored and sent to the Unchanged output.|
|SCD1||Identifies a column for Type 1 changes (dimension is overwritten with new values).|
|SCD2||Identifies a column for Type 2 changes (retains historical data).|
|SCD2 Current Record||A boolean column that can be used to identify the current record among related historical data.|
|Audit Column - Last Changed||A column that can be used to identify the moment a row was last changed.|
|Audit Column - Row Added||A column that can be used to identify a row that has been added.|
|Audit Column - SCD1 Update||A column that can be used to identify an SCD1 updated row.|
|Audit Column - SCD2 Change||A column that can be used to identify a row that has undergone an SCD2 change.|
|Inferred Member Indicator||A DT_WSTR column used to identify fact table references to a dimension that hasn't been loaded.|
Column Mapping Tab
In the Column Mapping tab, users can specify the mapping between the existing dimension and source system.
Row Change Detection Tab
In the Row Change Detection tab, users can define how each column detects changes including case sensitivity, leading/trailing space, and columns with NULL values.
Note: By default, the Dimension Merge SCD redirects columns containing NULL values to the Invalid Input output unless that column is configured as Nullable.
SCD2 Date Handling Tab
In the SCD2 Date Handling tab, users are given a variety of options for handling SCD2 dates including:
- Assigning now and expiry to system or user-created variables.
- The ability to select a granularity that ranges from day to millisecond (whose precision is also configurable).
- The ability to define a fixed-input expiry date.
Important: Users should pay careful attention to the example located to the right of the last record's expiry date. If the expiry column located in your dimension does not match the format of the example, this can cause unchanged rows to be sent to the New output due to the mismatch.
For example: setting the granularity to millisecond with a last record's expiry date set to MS SQL DateTime Max requires a datetime entry of 9999/12/31 23:59:59.997. If a developer mistakenly uses .999, this causes a mismatch and sends unchanged records to the New output.
Surrogate Key Handling Tab
In this tab, users are given the option to manage surrogate key assignments to new rows or pass through the existing surrogate key and let the SQL identity column manage the assignment.
Note: It's recommended the surrogate key assignment be managed somewhere else (first option) for optimal performance.
Inferred Member Behavior Tab
In the Inferred Member Behavior tab, users assign a single alpha-numeric character to define True and False values.
Output Column Selection Tab
In the Output Column Selection Tab, users can select which outputs to attach to the component and which columns are included/excluded in the output.
The final step in configuring the Dimension Merge SCD is to attach all of the outputs configured in the Output Column Selection tab. The following is a list of all available outputs as well as a brief description:
Note: Once outputs are attached, re-open the Dimension Merge SCD Transform to dismiss warnings referencing unattached outputs. Keep in mind that warnings remain for any output that has been configured in the Output Column Selection Tab that does not have the Don't warn that this output isn't attached button selected.
|Auditing||The Auditing output processes Input/output counter information configured in the Auditing tab. Users who do not wish to include the Auditing output can dismiss the warning by selecting the Auditing output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Deleted||The Deleted output manages the records that are present and active within the Existing Dimension, but no longer appear within the Source System data stream. Records passed down the Deleted output can either be expired or deleted from the Existing Dimension. Expiration/deletion from the Existing Dimension are based on the Surrogate Key column(s). Users who desire their Deleted output to be expired may redirect their deleted rows to the Expired SCD2 with SCD1 Changes output. Additionally, users who do not wish to include the Deleted output can dismiss the warning by selecting the Deleted output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Expired SCD2 with SCD1 Changes||The Expired SCD2 with SCD1 Changes output expires current SCD2 records that are being updated in the existing dimension. Records passed down the Expired SCD2 with SCD1 Changes are expired from the Existing Dimension based on the Surrogate Key column(s). Users who do not wish to include the Expired SCD2 with SCD1 Changes output can dismiss the warning by selecting the Expired SCD2 with SCD1 Changes output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Invalid Input||The Invalid Input output handles any invalid rows that match the configured rules found on the Row Change Detection tab (such as NULL values in columns not configured as Nullable). Users who do not wish to include the Invalid Input output can dismiss the warning by selecting the Invalid Input output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|New||The New output handles records that are not present within the Existing Dimension, but found within the Source System. Any record passed down the New output needs to be inserted into the Existing Dimension. Users who do not wish to include the New output can dismiss the warning by selecting the New output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|New SCD2||The New SCD2 output works along side the Expired SCD2 with SCD1 Changes output to insert a new record for historical records. Users who do not wish to have an additional output can redirect the New SCD2 output to the New output. Users who do not wish to include the New SCD2 output can dismiss the warning by selecting the New SCD2 output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Statistics||The Statistics output processes statistical information for the current package execution such as the number of Source System Input records, Existing Dimension Input records, and Interval Start/End times. Users who do not wish to include the Statistics output can dismiss the warning by selecting the Statistics output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Unchanged||The Unchanged output is an optional output that outputs all rows that have no changes between the Existing Dimension and the Source System data stream, or rows, that do not appear in the Source System data stream and are expired in the Existing Dimension. Users who do not wish to include the Unchanged output can dismiss the warning by selecting the Unchanged output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
|Updated SCD1||The Updated SCD1 output records that contain updated values to non-historical columns. Users who do not wish to include the Updated SCD1 output can dismiss the warning by selecting the Updated SCD1 output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.|
In the Auditing tab, users can set variables to display dimension, source system, and output counters that can be analyzed post-execution.
The Logging tab enables the component to provide users with more descriptive information of the execution process viewed in the Execution Results/Progress tab within SSDT/BIDS.
In this tab, users can alter performance settings including:
|Warning Suppression||Dismisses warnings when circumstances require a condition be ignored (such as not sorting input business keys).|
|Timeout||Developers can define a time (in seconds) to end execution for debugging.|
|Threading||Sets the number of threads available in two pools - one for matching keys and the other for processing rows. Users can maximize processing speed by configuring the component as follows:|
|Hashing||Security feature that allows users to determine which hash algorithm to use during processing.|
|Use Memory Optimized Mode||Sets the Memory Optimized property. For more information, see Memory Optimized Property.|
|Guid Key Sort Algorithm||Identifies the method in which the input data is sorted when using Guid as relationship keys. (This option only appears when Guid keys are used.) The following methods can be used:|