SQL Server Business Intelligence


Friday, March 31, 2006

TransactionOption In SSIS

I had a 3 simple packages get called from master package. First package populates time dimension (stored proc), second one populates transaction type dimension(stored proc) and the final one populates date dimension.

I set the TransactionOption = Required in the Master package, every thing else (package & component) set TransactionOption = Supported.

I have been testing transactions, basically I made time and transaction type dimension to fail and the package did roll back. I am stuck in the date dimension.Within Date dimension I got a stored procedure which populates calendar date attributes into Wrk_Date table. Then I have a data flow task which reads from thats working table, but it cant access it. I tried running SP_WHO2 command, and the status is SUSPENDED, being blocked by id -2.

I was going through step by step and identified the reason for locks. During step 2, if date table exist in stage database it TRUNCATES the table else it creates the table. Truncate statement seems to hold the lock and it is not releasing it further down the line. I have managed to get the rollback to work after removing truncate.

Therefore conclusion is you can't have Truncate and cant populate the same table in a single transaction in SSIS.


Monday, March 20, 2006

Most Powerful Sort Component So Far

As everyone knows the SSIS sort component out of the box is not known for its higher performance. There are couple of third party Sort components available.

ExtraSort component was available for a while. It is a freeware. You can access ExtraSort here. Ivolva Digital is behind this component.

During last few days another Sort component was made available. This one is called NSort. You can download the trial version here. Ordinal Technology Corp is behind this component. This is not memory hungry as it uses temporary files. It accepts multiple inputs and provides multiple outputs.


Wednesday, March 15, 2006

Single Source File With Multiple Files

One of the requirments I came across today was interesting. Each outlet has a specific menu file. Therefore we receive aournd 2000 menu files. Each menu file from system does have 8 different file types. I need to shread the data send 8 files out to different destinations. First 2 characters identifies the file type.

I have written a script component to do this. Script Component is the source. I have attached the sample package and sample text file I have used. In this example I have used 2 types of files just to prove the point that it works. You can download sample package from here.


Tuesday, March 14, 2006

Null field in Fixed Length File Which Should be an Integer Column

The project I am currently working get source files in a fixed length format. Some fields allowed to have nulls, and some are integer fields.

I ran into a problem if I define the field as integer column in the connection manager. The reason for that is that those fields which supposed to be nulls, are not really nulls, they are spaces. To overcome this I did the following

* Define the column as DB_STR in the flat file connection manager.
* Once the data is read from the data source I added Derived Column in the next step.
* I added a new derived column with the following expression
(DT_UI4)((TRIM([GrpId]) == "") ? NULL(DT_WSTR,0) : [GrpId])

The above statement will TRIM the spaces out from a column called GrpId and if GrpId is equal to “”, then it will set it to be null, else it will assign the value of GrpId. Finally (DT_UI4) statement will convert the return result set to Four Byte Unsigned Integer.


Date Source File in US Format. Database Locale is Set to British. Need to Get Surrogate Key?

If you are receiving the date source file with US format (MM/DD/YYYY). It may sound odd, but in the project I am currently working on I had to overcome this issue. The date dimension is stored in the table using DD/MM/YYYY format. You wish to get Surrogate Key using Lookup feature within SSIS.

Source File As shown below:
Col1 Col2 Col3 Col4
1 USD 01/01/2006 0:00 0.99980004
1 USD 01/02/2006 0:00 1.000900811
1 USD 01/21/2006 0:00 0.99960016

I would suggest you define the column3 as STRING in the flat file connection manager. Once you have defined the data type you can use Derived Column component to convert the string into the correct format as shown below :
(DT_DBTIMESTAMP)(SUBSTRING([Column 2],4,2) + "/" + SUBSTRING([Column 2],1,2) + "/" + SUBSTRING([Column 2],7,4))

You can download the sample package from here.

Try to avoid using DATEPART. If you wish to use DATEPART then you would have to set the source field (in this example Column 2 need to be defined as DATETIMESTAMP instead of STRING). The statement below will give you similar result, but there is GOTCHA.

(DT_STR,2,1252)DATEPART("dd",[Column 2]) + "/" + (DT_STR,2,1252)DATEPART("mm",[Column 2]) + "/" + (DT_STR,5,1252)DATEPART("yyyy",[Column 2])

The above script will produce the following result set.

1 USD 01/01/2006 0:00 0.99980004 01/01/2006 - This is right
1 USD 01/02/2006 0:00 1.000900811 01/02/2006 - This is NOT right
1 USD 01/21/2006 0:00 0.99960016 21/01/2006 - This is right

First of January worked fine. Second of January didn't work fine. Twenty first worked fine. Therefore be aware when you use it.

If you receive date in the following format 01-JAN-2006 then you can use the following syntax to convert it to date. You would need to define the source as String.
(DT_DBTIMESTAMP)(SUBSTRING([Column 2],1,2) + "/" + SUBSTRING([Column 2],4,3) + "/" + SUBSTRING([Column 2],8,2))


Thursday, March 09, 2006

Moving Files - Source to Working Dir

The project currently I am working on had the following directory structure.

C:\Data\SourceFiles - This dir will have source files.
C:\Data\ArchiveFiles - Once the data is loaded move it from working to archive dir.
C:\Data\WorkingFiles - Move source files to working dir.
C:\Data\ControlFiles - Single control file per dimension.

One of the dimension we have is payment type which is made out of 2 files, such as paytype and paytygrp. Each time we receive files with different names, but first 8 characters of the file is fixed. Some of our dimension has upto 24 different files, therefore we decided to use a control file with the delimiter of "-". In the control file we have source file name and final source file name.

One of my collegue(Rastko) wrote a script to move the files from source directory to working directory. You can download the entire script here.