SQL Server Business Intelligence: February 2006

SSIS - Clustered Service

As everyone knows that SSIS is not Cluster-Aware. There is a work around for it.

Kirk did a brilliant post with the help from Ranjeeta and Nick. You can access the post here.

Thanks
Sutha

More New SSIS Transforms

Performance When Loading Data into Oracle Using SSIS

It is a known fact that the Microsoft OLE DB for Oracle / Oracle's their own OLE DB drivers does not perform well during data load. They are extremly slow.

You can see vaious people have difficulties with this provider(s).
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=130786&SiteID=1
http://forums.microsoft.com/msdn/showpost.aspx?postid=117014&siteid=1
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=124921&SiteID=1&mode=1

Donald has written a blog about the stance within Microsoft. You might find it surprising what are about to read:
http://www.sqljunkies.com/WebLog/donald_farmer/archive/2005/03/13/8819.aspx

The solution is provided by a company called Persistent Systems. The numbers are impressive with their Oracle Bulk Load Connector. Check how impressive the numbers compares for yourself at http://www.persistentsys.com/products/ssisoracleconn/ssisoracleconn.htm

I am putting all these in here for my furture reference.

Thanks
Sutha

SQL2005 Profiler Trace Definitions

While I have been involved in PoC, I have learnt lot about SQL Profiler and Performance Monitor.

There are 6 Trace files available out of the box, which is located at C:\Program Files\Microsoft SQL Server\90\Tools\Profiler\TraceDefinitions\1033. These definition files are in XML format. There is defintion file for SQL Server, Analysis Services and Integration Services.

This XML will give you the textual definition of a SQL Profiler event class number that I have captured.

Thanks
Sutha

Incremental Cube Processing

AS2005 will NOT process aggregations automatically when you process the partition. During used partition processing it used to process the data and it also built the aggregations for that partition. It is changed in AS2005. You can process partition incrementally using "Process Incremental" option. If you want to build index you have to do that using "Process Index" option.

If you wish, you can put these two in 2 separate steps within SSIS after data is loaded.

Its all about how quickly you want the data presented to the users. If you do the Process Incremental first and made the data available for users first and then build the aggregations while users are using the data. This way will give you maximum availability, but impacts performance while building aggregations and indices. UI in SSMS does not support "Process Incremental" and "Process Index" in one single transaction. You can achieve that using simple XMLA scripts as shown below:
http://www.smart-bi-solutions.com/BlogStuff/incremental_process.xmla

This example uses Adventure Works. ProcessAdd and ProcessIndexes are the key things to look out for in the XMLA doc.

Thanks
Sutha

Can you load 74 million rows in 2 Minutes using SSIS (Bulk Insert Task)?

The simple answer is “Yes”. I have loaded 74 million rows in just under 2 minutes.

I was working on a Proof of Concept (PoC), where I had to load about 10 billion rows into a fact table. Each file had 6,170,000 rows and they belong to single day transactions in the outlets (~3000). As it was PoC we generated each day’s fact data into text files and RAR them up. I had to load this data very rapidly into the table and then only we can process the cube and continue with the rest of the testing.

We created the database with the maximum possible size, in other words it does not have to grow during data load. SQL Server Table was partitioned by month. We had 61 partitions (60 partitions for 5 years worth of data + 1 empty in the end). We also set the database to go across 3 file groups. We built the table without an index (I know that having an index will slow down the data load). Loaded the data using 8 Way Unisys ES7000 with 32GB of memory with 1.5GHz Processor.

I have used 12 parallel Foreach tasks to load data. I was loading 4 files (24,680,000 rows) into a single file group. You can see what I have done below (12 of those):
http://www.smart-bi-solutions.com/BlogStuff/09_02_06_Pic1.JPG

To get the best out of the Bulk Insert Task, we did the following settings.
Set the Code Page to 1252 (ANSI - Latin I)
Set the batch size to 1,000,000. Varies depending on the memory.
Set Options – Uncheck Check Constraints (default)
- Check Table Lock. It is not physical lock on the table.

Once the whole data set was loaded we built a clustered index on DateId, which took on ES7000 5 hours to build. It took around 1 hour to drop clustered index on this fact table.

If you have any questions on this feel free to contact me on sutha@smart-bi-solutions.com.

Thanks
Sutha

Calendar Transform and UnDoubleOut

There are couple of SSIS components which we can download from Microsoft site for free.

UnDoubleOut is an SSIS dataflow component that removes qualifiers from quoted text, either in place, or via the creation of a new output column. This component removes quotes that bracket input strings if present. Replaces doubled quotes with single quotes inside the string.
http://www.microsoft.com/downloads/details.aspx?familyid=91E24D1D-ACF0-45B1-A9A5-9EA1C7BD0463&displaylang=en

CalendarTransform is an SSIS dataflow transform component that generates standard calendar attributes.
http://www.microsoft.com/downloads/details.aspx?familyid=E603BDE7-44BB-409A-890F-ED94A20B6710&displaylang=en

Thanks
Sutha

How can you validate XML Schema against incoming XML document?

For example if you are getting customers in an XML file format to load into your data warehouse using SSIS. Initial file you have received was in a perfect format. You built Schema using a specified XML location. Generate the schema using “Generate XSD” and saved it to a given location. You mapped the XML Output to a destination table in SQL Server. All ran through fine during first week. Second week the XML file that was sent to you by source system has been changed. You want to validate the XML source against XML Schema. How can I do that? If there is a difference between the source file and schema I want to fail the package.

There is a component called XML Task in Control Flow, which could help you to do that. You can do the following using the XML Task like Diff, XSLT, XPATH, Merge, Patch & Validate.
http://www.smart-bi-solutions.com/BlogStuff/020206_Pic1.jpg

The Validate is the Operation Type you would you to validate the XML Source file against XML Schema. Once you have selected the OperationType, make sure XSD is selected in ValidationType (which is default). Please be aware that the FailOnValidationFail is set to False by default. If you leave it with default value for FailOnValidationFail, the package will NOT fail even if the XSD and XML doesn’t match. Therefore you need to change FailOnValidationFail value to True if you want your package to fail.

Thanks
Sutha

SQL Server Business Intelligence

Monday, February 20, 2006

SSIS - Clustered Service

Wednesday, February 15, 2006

More New SSIS Transforms

Tuesday, February 14, 2006

Performance When Loading Data into Oracle Using SSIS

Monday, February 13, 2006

SQL2005 Profiler Trace Definitions

Incremental Cube Processing

Sunday, February 12, 2006

Can you load 74 million rows in 2 Minutes using SSIS (Bulk Insert Task)?

Monday, February 06, 2006

Calendar Transform and UnDoubleOut

Thursday, February 02, 2006

How can you validate XML Schema against incoming XML document?

About Me

Links

Previous Posts

Archives