SSIS - Clustered Service
Kirk did a brilliant post with the help from Ranjeeta and Nick. You can access the post here.
Thanks
Sutha
SQL SERVER 2005 & 2008 IS THE MAIN THEME OF THIS BLOG
The simple answer is “Yes”. I have loaded 74 million rows in just under 2 minutes.
I was working on a Proof of Concept (PoC), where I had to load about 10 billion rows into a fact table. Each file had 6,170,000 rows and they belong to single day transactions in the outlets (~3000). As it was PoC we generated each day’s fact data into text files and RAR them up. I had to load this data very rapidly into the table and then only we can process the cube and continue with the rest of the testing.
We created the database with the maximum possible size, in other words it does not have to grow during data load. SQL Server Table was partitioned by month. We had 61 partitions (60 partitions for 5 years worth of data + 1 empty in the end). We also set the database to go across 3 file groups. We built the table without an index (I know that having an index will slow down the data load). Loaded the data using 8 Way Unisys ES7000 with 32GB of memory with 1.5GHz Processor.
I have used 12 parallel Foreach tasks to load data. I was loading 4 files (24,680,000 rows) into a single file group. You can see what I have done below (12 of those):
http://www.smart-bi-solutions.com/BlogStuff/09_02_06_Pic1.JPG
To get the best out of the Bulk Insert Task, we did the following settings.
Set the Code Page to 1252 (ANSI - Latin I)
Set the batch size to 1,000,000. Varies depending on the memory.
Set Options – Uncheck Check Constraints (default)
- Check Table Lock. It is not physical lock on the table.
Once the whole data set was loaded we built a clustered index on DateId, which took on ES7000 5 hours to build. It took around 1 hour to drop clustered index on this fact table.
If you have any questions on this feel free to contact me on sutha@smart-bi-solutions.com.
Thanks
Sutha