Friday, May 27, 2011

SSIS Lookup and Conditional Split tasks to load data incrementally

Microsoft integration service slowly changing dimensions out of box task can be used to implement SCD Type 1 and SCD Type 2. The main challenge ETL developer faces is around selecting right approach in loading data incrementally. We have to consider the aspects like – amount of data, whether to implement Type – 1 or Type – 2 of SCD.

There are lot of ways to load data incrementally like -

  • SSIS Component (Lookup, conditional split etc)
  • Third party components
  • Using SQL MERGE statement

I have plan to cover above three implementation approach in my coming blogs..today will talk about SCD using Lookup, Conditional Split tasks. We will create a SSIS package using these tasks to -

  • Figure out whether incoming value in the pipe already exists in database.
  • Lookup column data has been changed or same as one in the database.

First Create a table and populate some records.

CREATE TABLE [DBO].[TTB_STUDENT](
ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY
,
NAME VARCHAR(100
) NULL,
AGE
INT
)
GO
INSERT INTO
[DBO].[TTB_STUDENT]
VALUES
('Elaine', 30
),
(
'Kevin', 42
),
(
'Robin', 22
),
(
'Alice', 15
)
GO
Select above table and see the result - 
SELECT * FROM [DBO].[TTB_STUDENT]
image
Find below the package and the tasks we used. 
image 
We are using script component task as Source (you will be prompted with options like source, target etc when you drag a script component tasks on package designer) to create source for the downstream transformation tasks. Set the output name as MyOutput and add 2 columns as under. 
image 
Go back to Script tab and click on Edit Script… button and place following code snippet. We are adding some student rows to test all 3 scenarios.




public override voidCreateNewOutputRows()
{
/*
Lets add some student rows such that we cover
New student, Student with same age and student with
different age.
*/

//Existing student with different age
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Elaine";
MyOutputBuffer.Age = 33;

//Existing student with same age
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Kevin";
MyOutputBuffer.Age = 42;

//New student
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Cameron";
MyOutputBuffer.Age = 55;
}



I will just run the package at this point by adding data viewer for you to check dataset that we have built to be used as source for lookup task (note that you have to connect script component output to a task before you run the package).



image



Drag a lookup tasks on the designer and set editor properties as under. Make sure you set error output for lookup match output to redirect row. I am placing the screen shot for you to be able to connect well.



Set the link between available input and lookup column as under. Here we are using student name for the equijoin and lookup column is Age.



image



This is how the error output should be set so that row is redirected and it can be used in the downstream transformation.



image



From lookup task we will have two outputs. First output will be for newly added students (from the source) and second output will be for existing students.



Newly added students flow we can simply connect to underline database table. Note that I have used row count tasks and store in a variable which can be used for any auditing or emailing purposes. However this is totally depends on your requirement and shall be altered per that.



Second flow (existing student with same and different age), we will connect to conditional split task for us to be able to split 2 outputs - 1. Student with same age 2. Student with different age. Note that student with same age, we don’t need to take any action. Student with different age we will connect to OLE DB Command SSIS task through UNION ALL task. This is how your conditional split task should be setup -



image



This is how your OLE DB Command task should be setup. In the Connection Manager tab set the connection you used to connect to your relational database. In the Component Properties tab set SqlCommand properties with the SQL Update statement (Shown in the dialog). Note that In the Column Mapping tab you have to specify parameters we have used in the update.

Component Properties tab -



image
Column Mappings tab -




image
Finally run the package and observe the data viewers for the 3 outputs. Remember we are not taking any action if it is existing student with same age as in database table. - image



As you could make out by looking the ETL flow, Elaine went to OLE DB update, Kevin went to Age is same flow (no change in age) and Cameron went to OLE DB Insert.



Let’s select from the table again -
SELECT * FROM [DBO].[TTB_STUDENT]
image

You can do the same using out of box SCD tasks however it comes with some drawbacks and this is one of the best alternative approach. Hope the article will be helpful for you to understand and compare with other available solutions to load data incrementally. Request you to leave any feedback or comments you guys have.



Bye for now, will be back with more. Enjoy long weekend!

No comments:

Post a Comment