Task Factory Dimension Merge SCD Transform

Note:  New for version 4.2.0.405: The introduction of the Memory Optimized property. This is a significant change and we encourage all users to please read the Memory Optimized Property section before configuring the component.

Dimension Merge SCD Transform

Transform IconTransform Description
Task Factory Dimension Merge SCD Transform IconThe Dimension Merge SCD Transform is a custom Data Flow component used within SSIS to handle slowly changing dimension processing commonly used in Data Warehouses.

While SSDT/BIDS provides a standard SCD component, users may experience limitations such as:

  • Slow performance
  • Re-running the SCD Wizard can destroy parts of the data flow
  • Cannot be used against all data providers
  • Difficulty in troubleshooting the standard component’s decision making abilities

Task Factory’s Dimension Merge SCD Transform maintains the functionality of the original SCD component while also providing:

  • Better performance
  • Non-destructive editing capabilities
  • The ability to use a variety of data providers
  • Improved UI that empowers users to customize the component’s configuration and view/troubleshoot decision-making
  • Input/Output auditing
  • Improved logging capabilities

Setup

Setup

When using the Dimension Merge SCD Transform, users begin by connecting two of the following:

Connector Description
Existing DimensionThis input provides data from a data warehouse's dimension table and must include the housekeeping columns (SCD2, effective, and expiry).

Note:  Although optional, it's recommended that users sort this input on the business key.
Source SystemThis input provides data from the source OLTP system needed to compare and insert/update to the data warehouse's dimension table. Source system and existing dimension datatypes must match.

Note:  Although optional, it's recommended that users sort this input on the business key.
Special Members(Optional) In typical dimensions, special or unknown members may be used to link fact table rows to NULL or unknown dimension values. The special members input allows users to specify and maintain these special members in a data warehouse dimension table. (Special member rows should be treated as Type 1 changes to avoid duplicate versions of unknown values.) Like the previous two inputs, it's recommended the Special Members input also be sorted on the business key.

Task Factory Input Output Selection window

Memory Optimized Property

Task Factory Dimension Merge Slowly Changing Dimension Memory Optimized

As of version 4.2.0.402, the Dimension Merge Slowly Changing Dimensions component includes a new property called Memory Optimized. This new property was introduced due to the retention of a large number of records to remain in memory in the original behavior. This caused memory usage of the package to grow significantly over the package execution time. By requiring the inputs to be sorted, this new model is able to process and remove rows from memory much earlier thus  reducing the overall memory footprint and speeding up execution. While there is no setting within the user-interface, users can configure this option by right-clicking the Dimension Merge component and selecting Properties.

Note:   the following additional information about the Memory Optimized property:

  • When set to false, the component operates as it did in the past. Upgraded packages have this property set to False by default.
  • New packages have this property set to True and have to meet certain restrictions (listed below). Otherwise users encounter errors.
  • If the inputs are not marked as sorted, the component uses the original behavior, even if Memory Optimized is set to True.
  • If the property is set to True, all input rows must be sorted by the Business Key using a binary sort order. Example:
    Task Factory Binary sort order example

Note:  the ORDER BY and COLLATE Latin1_general_bin clauses on this product dimension. This should be used when sorting within the source query and setting the Is Sorted/Sort Key properties in the Advanced Editor. When using a file as input then users will need to add the SSIS sort component between the source and Dimension Merge component. Because this is a significant processing change, it's recommended that users test their packages with both settings before deployment to ensure performance is optimized based on their dataset and configuration.

Existing Dimension

Existing Dimension Input Column Definitions Tab

From this tab, users identify how each column in the existing dimension participates in the SCD transform. Columns can be defined as one of the following:

Note:  You may see an error at the bottom stating there is no Business Key selected, the Business Key can be selected from the SCD Column Type column.

ColumnDescription
Not UsedDeclares the column to not be used in the SCD processing.
Business KeyIdentifies the business key column.
Surrogate KeyIdentifies the surrogate key column.
SCD0 (Send To Invalid Input)Identifies a column for Type 0 changes. Any change is sent to the Invalid Input's output.
SCD0 (Ignore Changes)Identifies a column for Type 0 changes. Any change is ignored and sent to the Unchanged output.
SCD1Identifies a column for Type 1 changes (dimension is overwritten with new values).
SCD2Identifies a column for Type 2 changes (retains historical data).
SCD2 Current RecordA boolean column that can be used to identify the current record among related historical data.
Audit Column - Last ChangedA column that can be used to identify the moment a row was last changed.
Audit Column - Row AddedA column that can be used to identify a row that has been added.
Audit Column - SCD1 UpdateA column that can be used to identify an SCD1 updated row.
Audit Column - SCD2 ChangeA column that can be used to identify a row that has undergone an SCD2 change.
Inferred Member IndicatorA DT_WSTR column used to identify fact table references to a dimension that hasn't been loaded.

Task Factory Dimension Merge SCD Existing Dimension Input Column Definitions

Column Mapping

Column Mapping Tab

In the Column Mapping tab, users can specify the mapping between the existing dimension and source system.
Task Factory Dimension Merge SCD Column Mapping

Row Change

Row Change Detection Tab

In the Row Change Detection tab, users can define how each column detects changes including case sensitivity, leading/trailing space, and columns with NULL values. 

Note:  By default, the Dimension Merge SCD redirects columns containing NULL values to the Invalid Input output unless that column is configured as Nullable.

Task Factory Dimension Merge SCD Row Change Detection

SCD2 Date

SCD2 Date Handling Tab

In the SCD2 Date Handling tab, users are given a variety of options for handling SCD2 dates including:

  • Assigning now and expiry to system or user-created variables.
  • The ability to select a granularity that ranges from day to millisecond (whose precision is also configurable).
  • The ability to define a fixed-input expiry date.

Important:  Users should pay careful attention to the example located to the right of the last record's expiry date. If the expiry column located in your dimension does not match the format of the example, this can cause unchanged rows to be sent to the New output due to the mismatch. 

For example: setting the granularity to millisecond with a last record's expiry date set to MS SQL DateTime Max requires a datetime entry of 9999/12/31 23:59:59.997. If a developer mistakenly uses .999, this causes a mismatch and sends unchanged records to the New output.

Task Factory Dimension Merge SCD SCD 2 Date Handling

Surrogate Key

Surrogate Key Handling Tab

In this tab, users are given the option to manage surrogate key assignments to new rows or pass through the existing surrogate key and let the SQL identity column manage the assignment. 

Note:  It's recommended the surrogate key assignment be managed somewhere else (first option) for optimal performance.

Task Factory Dimension Merge SCD Surrogate Key Handling

Inferred Member

Inferred Member Behavior Tab

In the Inferred Member Behavior tab, users assign a single alpha-numeric character to define True and False values.

Task Factory Dimension Merge SCD Inferred Member Behaviour

Output Column

Output Column Selection Tab

In the Output Column Selection Tab, users can select which outputs to attach to the component and which columns are included/excluded in the output.

Task Factory Dimension Merge SCD Output Column Selection

The final step in configuring the Dimension Merge SCD is to attach all of the outputs configured in the Output Column Selection tab. The following is a list of all available outputs as well as a brief description:

Note:  Once outputs are attached, re-open the Dimension Merge SCD Transform to dismiss warnings referencing unattached outputs. Keep in mind that warnings remain for any output that has been configured in the Output Column Selection Tab that does not have the Don't warn that this output isn't attached button selected.

OutputDescription
AuditingThe Auditing output processes Input/output counter information configured in the Auditing tab. Users who do not wish to include the Auditing output can dismiss the warning by selecting the Auditing output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
DeletedThe Deleted output manages the records that are present and active within the Existing Dimension, but no longer appear within the Source System data stream. Records passed down the Deleted output can either be expired or deleted from the Existing Dimension. Expiration/deletion from the Existing Dimension are based on the Surrogate Key column(s). Users who desire their Deleted output to be expired may redirect their deleted rows to the Expired SCD2 with SCD1 Changes output. Additionally, users who do not wish to include the Deleted output can dismiss the warning by selecting the Deleted output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
Expired SCD2 with SCD1 ChangesThe Expired SCD2 with SCD1 Changes output expires current SCD2 records that are being updated in the existing dimension. Records passed down the Expired SCD2 with SCD1 Changes are expired from the Existing Dimension based on the Surrogate Key column(s). Users who do not wish to include the Expired SCD2 with SCD1 Changes output can dismiss the warning by selecting the Expired SCD2 with SCD1 Changes output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
Invalid InputThe Invalid Input output handles any invalid rows that match the configured rules found on the Row Change Detection tab (such as NULL values in columns not configured as Nullable). Users who do not wish to include the Invalid Input output can dismiss the warning by selecting the Invalid Input output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
NewThe New output handles records that are not present within the Existing Dimension, but found within the Source System. Any record passed down the New output needs to be inserted into the Existing Dimension. Users who do not wish to include the New output can dismiss the warning by selecting the New output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
New SCD2The New SCD2 output works along side the Expired SCD2 with SCD1 Changes output to insert a new record for historical records. Users who do not wish to have an additional output can redirect the New SCD2 output to the New output. Users who do not wish to include the New SCD2 output can dismiss the warning by selecting the New SCD2 output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
StatisticsThe Statistics output processes statistical information for the current package execution such as the number of Source System Input records, Existing Dimension Input records, and Interval Start/End times. Users who do not wish to include the Statistics output can dismiss the warning by selecting the Statistics output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
UnchangedThe Unchanged output is an optional output that outputs all rows that have no changes between the Existing Dimension and the Source System data stream, or rows, that do not appear in the Source System data stream and are expired in the Existing Dimension. Users who do not wish to include the Unchanged output can dismiss the warning by selecting the Unchanged output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.
Updated SCD1The Updated SCD1 output records that contain updated values to non-historical columns. Users who do not wish to include the Updated SCD1 output can dismiss the warning by selecting the Updated SCD1 output found on the Output tab, select the Exclude All button, and ensure that the Don't warn that this output isn't attached button is selected.

Task Factory Dimension Merge Slowly Changing Dimension Package

Auditing

Auditing Tab

In the Auditing tab, users can set variables to display dimension, source system, and output counters that can be analyzed post-execution.

Task Factory Dimension Merge SCD Auditing

Logging

Logging Tab

The Logging tab enables the component to provide users with more descriptive information of the execution process viewed in the Execution Results/Progress tab within SSDT/BIDS.

Task Factory Dimension Merge SCD Logging

Performance

Performance Tab

In this tab, users can alter performance settings including:

OptionDescription
Warning SuppressionDismisses warnings when circumstances require a condition be ignored (such as not sorting input business keys).
TimeoutDevelopers can define a time (in seconds) to end execution for debugging.
ThreadingSets the number of threads available in two pools - one for matching keys and the other for processing rows. Users can maximize processing speed by configuring the component as follows:

  • Disabled - Sets the value to 1.
  • Automatically manage thread counts - Value is set to a number based on the processor cores of the machine it is being run on. Note:  This option should be considered for smaller environments.
  • Manually specify thread count per workload type - Should be considered when running on an environment with multiple core processors. Additionally, the automatic option doesn't take into account other processes or packages that might be running, therefore, manually specifying the thread count might be a better option for larger or busy environments.
HashingSecurity feature that allows users to determine which hash algorithm to use during processing.
Use Memory Optimized ModeSets the Memory Optimized property. For more information, see Memory Optimized Property.
Guid Key Sort AlgorithmIdentifies the method in which the input data is sorted when using Guid as relationship keys. (This option only appears when Guid keys are used.) The following methods can be used:

  • SQL Server Order By - Performed by using an ORDER BY clause within a query. Additionally, the IsSorted property needs to be set to true and the key sort order set. These are found in the sources Advanced Editors (eg Ole DB Source and ADO.Net Source.)
  • Sort Transform - Configured when using the SSIS native Sort Transform component between the source(s) and Dimension Merge SCD transform. (Sorting is done by adding an ORDER BY clause, however, the IsSorted and SortKeyPosition should not be set within the advanced editor.)
  • .NET Guid - Configured when using .NET Guid sorting within an upstream Script Task.
  • String - Sort Guids as strings. Note:  Does not work with Memory Optimized mode set to true.


Task Factory Dimension Merge SCD Performance

Videos

Additional Videos

Dimension Merge SCD - Configure Native SCD Example


Dimension Merge SCD - How to use it


Dimension Merge SCD - Intro to Sample Package


Dimension Merge SCD - Intro to test data


Dimension Merge SCD - T-SQL Merge Example