Friday, May 27, 2011

SSIS Lookup and Conditional Split tasks to load data incrementally

Microsoft integration service slowly changing dimensions out of box task can be used to implement SCD Type 1 and SCD Type 2. The main challenge ETL developer faces is around selecting right approach in loading data incrementally. We have to consider the aspects like – amount of data, whether to implement Type – 1 or Type – 2 of SCD.

There are lot of ways to load data incrementally like -

  • SSIS Component (Lookup, conditional split etc)
  • Third party components
  • Using SQL MERGE statement

I have plan to cover above three implementation approach in my coming blogs..today will talk about SCD using Lookup, Conditional Split tasks. We will create a SSIS package using these tasks to -

  • Figure out whether incoming value in the pipe already exists in database.
  • Lookup column data has been changed or same as one in the database.

First Create a table and populate some records.

CREATE TABLE [DBO].[TTB_STUDENT](
ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY
,
NAME VARCHAR(100
) NULL,
AGE
INT
)
GO
INSERT INTO
[DBO].[TTB_STUDENT]
VALUES
('Elaine', 30
),
(
'Kevin', 42
),
(
'Robin', 22
),
(
'Alice', 15
)
GO
Select above table and see the result - 
SELECT * FROM [DBO].[TTB_STUDENT]
image
Find below the package and the tasks we used. 
image 
We are using script component task as Source (you will be prompted with options like source, target etc when you drag a script component tasks on package designer) to create source for the downstream transformation tasks. Set the output name as MyOutput and add 2 columns as under. 
image 
Go back to Script tab and click on Edit Script… button and place following code snippet. We are adding some student rows to test all 3 scenarios.




public override voidCreateNewOutputRows()
{
/*
Lets add some student rows such that we cover
New student, Student with same age and student with
different age.
*/

//Existing student with different age
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Elaine";
MyOutputBuffer.Age = 33;

//Existing student with same age
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Kevin";
MyOutputBuffer.Age = 42;

//New student
MyOutputBuffer.AddRow();
MyOutputBuffer.StudentName = "Cameron";
MyOutputBuffer.Age = 55;
}



I will just run the package at this point by adding data viewer for you to check dataset that we have built to be used as source for lookup task (note that you have to connect script component output to a task before you run the package).



image



Drag a lookup tasks on the designer and set editor properties as under. Make sure you set error output for lookup match output to redirect row. I am placing the screen shot for you to be able to connect well.



Set the link between available input and lookup column as under. Here we are using student name for the equijoin and lookup column is Age.



image



This is how the error output should be set so that row is redirected and it can be used in the downstream transformation.



image



From lookup task we will have two outputs. First output will be for newly added students (from the source) and second output will be for existing students.



Newly added students flow we can simply connect to underline database table. Note that I have used row count tasks and store in a variable which can be used for any auditing or emailing purposes. However this is totally depends on your requirement and shall be altered per that.



Second flow (existing student with same and different age), we will connect to conditional split task for us to be able to split 2 outputs - 1. Student with same age 2. Student with different age. Note that student with same age, we don’t need to take any action. Student with different age we will connect to OLE DB Command SSIS task through UNION ALL task. This is how your conditional split task should be setup -



image



This is how your OLE DB Command task should be setup. In the Connection Manager tab set the connection you used to connect to your relational database. In the Component Properties tab set SqlCommand properties with the SQL Update statement (Shown in the dialog). Note that In the Column Mapping tab you have to specify parameters we have used in the update.

Component Properties tab -



image
Column Mappings tab -




image
Finally run the package and observe the data viewers for the 3 outputs. Remember we are not taking any action if it is existing student with same age as in database table. - image



As you could make out by looking the ETL flow, Elaine went to OLE DB update, Kevin went to Age is same flow (no change in age) and Cameron went to OLE DB Insert.



Let’s select from the table again -
SELECT * FROM [DBO].[TTB_STUDENT]
image

You can do the same using out of box SCD tasks however it comes with some drawbacks and this is one of the best alternative approach. Hope the article will be helpful for you to understand and compare with other available solutions to load data incrementally. Request you to leave any feedback or comments you guys have.



Bye for now, will be back with more. Enjoy long weekend!

Wednesday, May 25, 2011

Unpivot data transformation in SSIS

I get sales product data in an excel spreadsheet. This file holds yearly sales data of all the products. Following is the excel template business users are using to capture sales data month to month. The requirement is to unpivot the sales data and store in a table with appropriate snapshot. Find below file template with sample data -

image

Find below table structure where above file has to be imported using package.

CREATE TABLE [DBO].[TTB_PRODUCT_SALES](
[PRODUCT_NAME] NVARCHAR(100
) NOT NULL,
[SALES_AMT] DECIMAL(15,4
) NULL,
[SNAPSHOT_DATE] VARCHAR(50
) NULL
)

Next step would be to create a SSIS package to import data from excel file into the database table recursively such that all these months are loaded. Note that we can do this using unpivot transformation task.
Steps to create package - 


  • Go to your integration service project in your solution and add new SSIS package.

  • Add a data flow task under Control Flow tab as under -
    image

  • Double click on the above task to go to the data flow tab. Next, right click on Connection Managers tab and select Excel from the connection manager list, click on Add… button and select sales excel as your source. Make sure you checked First row has column names in the Excel Connection Manager.
    image

  • You can add data viewer and run the package to view current state of dataset. It will be like this  -
    image

  • Add Unpivot transformation tasks to unpivot sales columns into rows. Double click on Unpivot transformation task and editor as under -
    image

    As you see we have to pass through Product as we want to keep product and transform other columns (months like Jan, Feb etc) into rows. Meaning, month columns will be converted into rows for each product. Set Destination Column to SALES_AMT to store unpivoted value in the database table. Set Pivot key value column name to “PKeyMonth”. This variable will hold value of the pivot key value. In our example we have put month name as key (Jan, Feb etc). Add a data viewer to check incoming rows and variable value -
    image

  • Let’s add a script component task to read “PkeyValue” value and convert it into snapshot date. Select Transform in the Select Script Component Type dialog. In the Script Transformation Editor go to Inputs and Outputs tab and add a output column name SnapshotDate. Next switch to Script tab and place following script to assign snapshot date value to newly added variable. We will use this variable to set snapshot in the database table.

    public override void
    Input0_ProcessInputRow(Input0Buffer Row)
    {
    string varSnapShotDate = string.Empty;
    switch (Row.PKeyMonth)
    {
    case "Jan": varSnapShotDate = "2011-01-31"; break;
    case "Feb": varSnapShotDate = "2011-02-28"; break;
    case "Mar": varSnapShotDate = "2011-03-31"; break;
    case "Apr": varSnapShotDate = "2011-04-30"; break;
    case "May": varSnapShotDate = "2011-05-31"; break;
    case "Jun": varSnapShotDate = "2011-06-30"; break;
    case "Jul": varSnapShotDate = "2011-07-31"; break;
    case "Aug": varSnapShotDate = "2011-08-31"; break;
    case "Sep": varSnapShotDate = "2011-09-30"; break;
    case "Oct": varSnapShotDate = "2011-10-31"; break;
    case "Nov": varSnapShotDate = "2011-11-30"; break;
    case "Dec": varSnapShotDate = "2011-12-31"; break;
    }
    Row.SnapshotDate = varSnapShotDate;
    }

  • Your SSIS Package data flow tab should look like this. Run the package and query database table
    (
    SELECT * FROM [DBO].[TTB_PRODUCT_SALES]
    ).
    image

Monday, May 23, 2011

Filter dimension member using MDX VBA function

If you are passing month time member in your MDX to slice cube and need to get quarter number for some calculation, you can try something like this. Assume a case where you are showing monthly measure value on the report but the quarterly target. In this case you will need to derive quarter number from month parameter passed in the MDX and define a calculated member for target.

Find MDX example below  –

WITH 
MEMBER
[INVESTMENT PRODUCT].[PRODUCT NAME].SYKESPRODUCT
AS
Aggregate
(
{
Filter
(
NonEmpty
(
[INVESTMENT PRODUCT].[PRODUCT NAME].[PRODUCT NAME].
MEMBERS
,[Measures].[ONCE DONE SURVEYS]
)
,
VBA!Left
(
VBA!Ltrim([INVESTMENT PRODUCT].[PRODUCT NAME].CurrentMember.Name)
,5
)
= "SYKES"
)
}
)
MEMBER [INVESTMENT PRODUCT].[PRODUCT NAME].OTHERPRODUCT
AS
Aggregate
(
Filter
(
NonEmpty
(
[INVESTMENT PRODUCT].[PRODUCT NAME].[PRODUCT NAME].MEMBERS,
,[Measures].[ONCE DONE SURVEYS]
)
,
VBA!Left
(
VBA!Ltrim([INVESTMENT PRODUCT].[PRODUCT NAME].CurrentMember.Name)
,5
)
<> "SYKES"
)
)
SELECT
{[Measures].[ONCE DONE SURVEYS]}
ON COLUMNS
,{
{
[INVESTMENT PRODUCT].[PRODUCT NAME].SYKESPRODUCT
,[INVESTMENT PRODUCT].[PRODUCT NAME].OTHERPRODUCT
}
*
[TIME].[YQMD].[Month].&[2010-11-01T00:00:00]
}
ON ROWS
FROM
[YOUR CUBE];

Thursday, May 19, 2011

MDX - Get quarter number from current month time member

If you are passing month time member in your MDX to slice cube and need to get quarter number for some calculation, you can try something like this. Assume a case where you are showing monthly measure value on the report but the quarterly target. In this case you will need to derive quarter number from month parameter passed in the MDX and define a calculated member for target.

Find MDX example below –

WITH 
//Find quarter number from time member
MEMBER MEASURES.[QUARTER NUMBER]
AS
Cint(Round(YTD().Count + 1) / 3)
MEMBER [Measures].[TARGET VALUE]
AS
MEASURES.[QUARTER NUMBER] * [Measures].[TARGET] / 4
SELECT
{
MEASURES.[QUARTER NUMBER]
,[Measures].[TARGET VALUE]
}
ON COLUMNS
,{StrToSet(@ParamTimeHier)}
ON ROWS
FROM
(
SELECT
StrToSet(@ParamTimeHier)
ON COLUMNS
FROM
[YOUR CUBE]
);

MDX - Prorating target for current month or quarter

In the report, if you need to prorate your measure value based on the month or quarter you are showing, you can do something like the example explained below.

Example. You might get yearly target of say 400 million dollar and based on the quarter you are in, might want prorate the value as under –

Quarter Number

Prorated $ Value

Quarter 1

100

Quarter 2

200

Quarter 3

300

Quarter 4

400

MDX snippet –

WITH
MEMBER MEASURES.MONTHNUMBE AS
Month(Cdate(Left(Right([TIME].[YQMD].CurrentMember.UniqueName,20),10)))

MEMBER MEASURES.PRORATED_TARGET AS
CASE
WHEN MEASURES.MONTHNUMBER = 1 THEN 1 * [MEASURES].[TARGET] / 4
WHEN MEASURES.MONTHNUMBER = 4 THEN 2 * [MEASURES].[TARGET] / 4
WHEN MEASURES.MONTHNUMBER = 7 THEN 3 * [MEASURES].[TARGET] / 4
ELSE 4 * [MEASURES].[TARGET] / 4
END

SELECT
{
MEASURES.PRORATED_TARGET
,MEASURES.MONTHNUMBER
} ON COLUMNS
,
NonEmpty
(
StrToMember(@ParamTime).Parent.Lag(3) : StrToMember(@ParamTime).Parent
,[Measures].[TARGET]
)
DIMENSION PROPERTIES
MEMBER_CAPTION
,MEMBER_UNIQUE_NAME
,PARENT_UNIQUE_NAME
,LEVEL_NUMBER
ON ROWS
FROM [YOUR_CUBE];

In case of you want your prorating to be done monthly, meaning like –

Month Number

Prorated $ Value

January

1*400/12

February

2*400/12

…….

---

December

12*400/12

Replace above calculated member as under -

MEMBER MEASURES.PRORATED_TARGET AS
YTD().Count*[Measures].[TPM FY TARGET]/12