SQL Server Business Intelligence

SQL SERVER 2005 & 2008 IS THE MAIN THEME OF THIS BLOG

Saturday, December 31, 2005

Package Configurations - DTSConfig File

SSIS supports 5 different types of sources to use the setting the properties of objects and they are :

  1. XML Configuration File
  2. Environment Variable
  3. Registry Entry
  4. Parent Package Variable
  5. SQL Server

I have used XML configuration file to do all the SQL Server connection settings. In the future if you wish to include more connections / settings you can add to this configuration file. If you continue to use this method, you can execute SSIS Packages in different environments without making any changes to the packages. Environments.dtsconfig is the only file you would have to change between environments.

Configuration file is located at C:\Config\Enironments.dtsconfig

This configuration file should exist at the same place in all environments. During runtime SSIS package will pickup the setting from that location and update the connection settings according to the XML configuration file.

XML configuration file will look like the one in the link below:
http://www.smart-bi-solutions.com/BlogStuff/31122005_01.jpg

When you migrate the SSIS packages from one environment (Dev) to the other (UAT), you would need to do the following to the configuration file :

  1. Copy “Enironments.dtsconfig” from your dev environment to UAT environment. Please make sure that you place this configuration file in the same location as development.
  2. Edit “Enironments.dtsconfig” using WordPad or something similar.
  3. Rename Data Source values. For example if your development environment is called “Beast\SQL2005” and UAT environment is called “Sql2005\Uat”. You need to search and replace the value for data source.
  4. Save the file.

Thanks
Sutha

MSDTC - What a pain

Yes I admit that I am not an expert in MSDTC, but still so difficult to figure this damn thing. If Windows Firewall is ON it wont work. I didnt have that turned on, but I have had Norton on my machine, which would stop MSDTC working as well. If you get this kind of message please try to disable your firewall and try before anythign else. You will get the message below even though you have configured MSDTC correctly and started successfully.

------------------------------
Error: The SSIS Runtime has failed to start the distributed transaction due to error 0x8004D01B "The Transaction Manager is not available.". The DTC transaction failed to start. This could occur because the MSDTC Service is not running.
------------------------------

Useful posts when you try to resolve MSDTC issue :
http://blogs.conchango.com/jamiethomson/archive/2005/11/02/2339.aspx
http://blogs.conchango.com/paulmcmillan/archive/2005/10/17/2277.aspx
http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q293799
http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q306843

It is advisable to run MSDTC with DTC Logon Account such as "NT AUTHORITY\NETWORK SERVICE". Also select "Allow Inbound" and "Allow Outbound" under Transaction Manager Communication.

Thanks
Sutha

Friday, December 30, 2005

Creating Random Data From Given Dimensions

I had to create millions of rows dummy fact table for testing purpose. I have got valid dimensions, but not fact. I had a requirement to create facts using random products (roughly 160 products across whole day). The fact table stored data to the minute. Each shop in average have 2000 transactions / day.

Each day shop in average sells 160 products. This is populated into a Tmp table as shown below:

SELECT TOP 160 productid
INTO TmpProduct
FROM dim_product
ORDER BY NEWID()

I have written couple of SQL statements to achieve this. They produced right result but unfortunately they are slow for my requirement. Therefore we have to do in C#.

Option 1
---------
SELECT TOP 2000 RANK() OVER (order by newid()) as RankRec,
dateid,
timeid,
shopid,
managerid,
TmpProduct.productid,
1 as Qty,
(timeid*rand()) as Amount
FROM dim_date
CROSS JOIN dim_shop
CROSS JOIN dim_time
CROSS JOIN dim_manager
CROSS JOIN dim_product
INNER JOIN TmpProduct
ON dim_product.productid = TmpProduct.productid
WHERE shopid = 39
AND dateid = 1
AND dim_shop.shopno = dim_manager.shopno


Option 2
---------

SELECT TOP 2000
dateid,
timeid,
shopid,
managerid,
TmpProduct.productid,
1 as Qty,
(timeid*rand()) as Amount
FROM dim_date
CROSS JOIN dim_shop
CROSS JOIN dim_time
CROSS JOIN dim_manager
CROSS JOIN dim_product
INNER JOIN TmpProduct
ON dim_product.productid = TmpProduct.productid
WHERE shopid = 39
AND dateid = 1
AND dim_shop.shopno = dim_manager.shopno
order by newid()

I am posting this for me as it will come handy during future projects.

Thanks
Sutha

Wednesday, December 28, 2005

Random data

I had a request from a client to use random products (for example 100 products per customer) when I create a dummy set of data for testing purpose. I didn't realise that there is a simple way to get random set of products being returned at each time, until I Googled it.

SELECT TOP 100 ProductId
FROM Dim_Products
ORDER BY NEWID()

NEWID() returns a UniqueIdentifier every time it runs and therefore it will return different product list each time.

Very useful when you create random data.

I am sure most people knew this, but I learnt this yesterday. So I thought I would post it as it would be useful for me in the future.

Thanks
Sutha

Tuesday, December 27, 2005

Auditing

As part of best practice I have kept all the auditing details in a separate database. I have given database name as Audit. There are couple of tables, one for auditing of SSIS package and the other for stored procedure auditing.

SSIS Audit table structure

CREATE TABLE dbo.SSIS_Audit
(
LoadId int NOT NULL,
PackageName varchar(100) NOT NULL,
TaskName varchar(100) NOT NULL,
[Type] varchar(100) NOT NULL,
MachineName varchar(50) NOT NULL,
UserName varchar(50) NOT NULL,
NoOfRowsRead int NOT NULL CONSTRAINT DF_MBA_Audit_NoOfRowsProcessed DEFAULT ((0)),
NoOfRowsInserted int NOT NULL CONSTRAINT DF_MAB_Audit_NoOfRowsInserted DEFAULT ((0)),
NoOfRowsUpdated int NOT NULL CONSTRAINT DF_MAB_Audit_NoOfRowsUpdated DEFAULT ((0)),
StartTime datetime NOT NULL,
PackageDuration int NULL
)
GO

Stored Procedure Audit table structure


CREATE TABLE dbo.Error_Audit
(
ErrorNumber numeric(18, 0) NOT NULL,
ErrorSeverity numeric(18, 0) NOT NULL,
ErrorState numeric(18, 0) NOT NULL,
ErrorProcedure varchar(60) NOT NULL,
ErrorLine numeric(18, 0) NOT NULL,
ErrorMessage varchar(250) NOT NULL,
ErrorDate datetime NOT NULL
)
GO


Within SSIS package I used SQL Expression to achieve auditing. I have used Event Handler wherever possible (On Pre-Execute / On Post-Execute) and sometimes I have used Execute SQL Task to achieve this. I couldn’t use event handler on package itself as it is triggering twice. The reason it triggers twice is “it will fire the event handler scoped to the task AND the event handler scoped to the package. So your event actually gets handled twice”. Therefore I use Execute SQL Task to log audit details at the beginning and end of the package to log auditing details.

Example expression to populate SSIS audit table

"INSERT INTO SSIS_Audit (LoadId,PackageName, TaskName, Type, MachineName, UserName, NoOfRowsRead, NoOfRowsInserted, PackageDuration, StartTime)
VALUES ('"+(DT_STR,50,1252)@[User::LocalLoadId] +"',
'"+ @[System::PackageName] + "', '" + @[System::TaskName] + "',
'Package Completed', '" + @[System::MachineName] + "',
'" + @[System::UserName] + "', '"+(DT_STR,50,1252)@[User::RowsRead] +"','"+(DT_STR,50,1252)@[User::RowsInserted] +"',"+(DT_STR,4,1252)DATEDIFF("ss",@[System::StartTime],getdate()) + ", getdate() )"

When the above expression is evaluated it will display something similar to this as SQLStatement

INSERT INTO SSIS_Audit (LoadId,PackageName, TaskName, Type, MachineName, UserName, NoOfRowsRead, NoOfRowsInserted, PackageDuration, StartTime)
VALUES
( '0','Pkg_Fct_Summary_Initial_Load', 'End SSIS Pkg Audit Log',
'Package Completed', 'BEAST', 'BEAST\Sutha', '0', '0',14, getdate()
)


Within stored procedure I have used the following stored procedure to call if anything fails. This Stored procedure will log it into a table which I have specified above (Error_Audit)

CREATE PROCEDURE dbo.prc_LogErrorInfo
AS
BEGIN TRY
INSERT INTO Error_Audit
VALUES
(
ERROR_NUMBER(),
ERROR_SEVERITY() ,
ERROR_STATE() ,
ERROR_PROCEDURE() ,
ERROR_LINE() ,
ERROR_MESSAGE(),
GETDATE()
)
END TRY

BEGIN CATCH
SELECT ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage,
GETDATE() AS ErrorDate

END CATCH

As usual comments welcome.

Thanks
Sutha

LoadId

I thought I would explain the load id mechanism I have been using in my current project. I have stored couple of tables within "Stage" database. Decision was made to put it in the Stage database, against Audit database as these are not auditing tables.

The structure of the tables as follows:

CREATE TABLE dbo.Ctl_Load_Summary
(
LoadId int NOT NULL,
StartDate datetime NOT NULL,
EndDate datetime NULL,
Duration int NOT NULL CONSTRAINT DF_Ctl_Load_Summary_Duration DEFAULT ((0))
)
GO

CREATE TABLE dbo.Ctl_Load_Batch_State
(
LoadId int NOT NULL,
InProgress varchar(1) NOT NULL
)
GO

"Ctl_Load_Batch_State" table always hold single record, which is controlled by a stored procedure (Code is below). This table will hold last successful load id OR current load id which is in progress. If it is in progress it will have the current load id and ‘Y’ as Progress flag. If the load is successful it will have the load id and ‘N’ as progress flag. During any unforeseen problems if the SSIS package fails during data warehouse load progress flag will remain as ‘Y’. When we rerun the package it will pick the same load id as the transaction within the package would have rolled the data back.


“Ctl_Load_Summary” table holds the summary details of the load. For example when we load data everyday it will have the loaded we have used for the day along with when the package started and ended. The duration of the package is calculated in seconds and stored in the duration column. This is mainly done to meet the client’s requirement.

Commets welcome.

Thanks
Sutha


Stored Procedure To Retrieve Load ID

-- =============================================
-- Author : Sutha Thiru
-- Create date : 12/12/2005
-- Description : This SP will return the latest .
-- =============================================
ALTER PROCEDURE [dbo].[prc_get_loadid] (@loadid INT OUTPUT)
AS

SET NOCOUNT ON

DECLARE @inprogress VARCHAR(1)
DECLARE @statecount INT
DECLARE @summarycount INT

SET @loadid = 0
SET @summarycount = 0
SET @statecount = 0

BEGIN TRY

BEGIN TRAN
SELECT @summarycount = count(*)
FROM Ctl_Load_Summary

SELECT @statecount = count(*)
FROM Ctl_Load_Batch_State

IF @summarycount <> 0 and @statecount = 0
BEGIN
RETURN 0
END
ELSE
SELECT @loadid = loadid,
@inprogress = inprogress
FROM Ctl_Load_Batch_State

IF @inprogress = 'Y'
BEGIN
set @loadid = @loadid
END
ELSE
SET @loadid = @loadid+1
IF @statecount = 0
BEGIN
INSERT
INTO Ctl_Load_Batch_State
VALUES (@loadid,'Y')
IF @@ERROR = 0
BEGIN
set @loadid = @loadid
END
END
ELSE
BEGIN
UPDATE Ctl_Load_Batch_State
SET loadid = @loadid,
inprogress ='Y'
IF @@ERROR = 0
BEGIN
set @loadid = @loadid
END
END
COMMIT TRAN

RETURN @loadid
END TRY

BEGIN CATCH
ROLLBACK TRAN
--EXEC Audit.dbo.prc_LogErrorInfo -- This calls another SP for auditing
END CATCH

Tuesday, December 13, 2005

Execute SQL Task - Retrieving LoadId & Assigning To User Variable

During daily load of data warehouse I wanted to get the LoadID from a table for that load and assign it to a user variable. I wanted to execute a stored procedure which will return an output value which will be the LoadID. I wanted to assign it to a variable.

The SQLStatement would be “EXEC prc_get_loadid ? OUTPUT”. You can see from the link below:
http://www.smart-bi-solutions.com/BlogStuff/13122005_01.jpg

From the link below below you can see that I am assigning the output value from stored procedure to user defined variable. Please note that the variables are case sensitive within SSIS.
http://www.smart-bi-solutions.com/BlogStuff/13122005_02.jpg

Now I assigned the output (LoadId) from a stored procedure to user variable, I can use that variable anywhere within the calling SSIS (parent / child) package which handles the daily load.

Thanks
Sutha

Friday, December 09, 2005

Auditing Data Flow Task

I would like to audit number of rows read from source, inserted, updated etc. You can use out of the box Row Count component to get the number of rows processed at a given point and time and assign it to a variable as Jamie explained well in his post.

I was talking to Mark and said it would be nice if we could have something to do this using an Event Handler, which means we don’t have to have multiple variables in a package etc. We discussed it and the wiz-kid wrong a component which lets me do what I wanted to achieve. Using this component I can get the number of rows processed into a single variable at a given time and fire an event handler to insert it into an audit table. You can see what Mark has done here.

I will do a blog soon, with simple demo explaining all the details.

Thanks
Sutha

SQL Server 2005

While I was working for Reuters I had the opportunity to have access to SQL Server 2005 as Reuters was one of the Technology Adoption Programme (TAP) partners. I have been working with SQL Server 2005 over 2 years now.

I have concentrated mainly on SQL Server Integration Services (SSIS), but have spend time in understanding new features with SQL Server database as well as SQL Server Analysis Service (SSAS).

Thanks
Sutha

Lets Give it a go....

I have been working with SQL Server for nearly 8 years now. I have concentrated on BI front all these years. I never bothered to setup my own blog.

After all these years couple of my collegues (Steve & Mark) nagging me for a long time, I have decided to go ahead with my own blog today.

I am setting this as a knowledge base for me more than anything else. If I get to share some valuable information during that process it would be bonus. There are some great blogs out there as you all aware.

I am ready to rock..... Lets start blogging....

Thanks
Sutha