SQL Server Business Intelligence

SQL SERVER 2005 & 2008 IS THE MAIN THEME OF THIS BLOG

Sunday, February 12, 2006

Can you load 74 million rows in 2 Minutes using SSIS (Bulk Insert Task)?

The simple answer is “Yes”. I have loaded 74 million rows in just under 2 minutes.

I was working on a Proof of Concept (PoC), where I had to load about 10 billion rows into a fact table. Each file had 6,170,000 rows and they belong to single day transactions in the outlets (~3000). As it was PoC we generated each day’s fact data into text files and RAR them up. I had to load this data very rapidly into the table and then only we can process the cube and continue with the rest of the testing.

We created the database with the maximum possible size, in other words it does not have to grow during data load. SQL Server Table was partitioned by month. We had 61 partitions (60 partitions for 5 years worth of data + 1 empty in the end). We also set the database to go across 3 file groups. We built the table without an index (I know that having an index will slow down the data load). Loaded the data using 8 Way Unisys ES7000 with 32GB of memory with 1.5GHz Processor.

I have used 12 parallel Foreach tasks to load data. I was loading 4 files (24,680,000 rows) into a single file group. You can see what I have done below (12 of those):
http://www.smart-bi-solutions.com/BlogStuff/09_02_06_Pic1.JPG

To get the best out of the Bulk Insert Task, we did the following settings.
Set the Code Page to 1252 (ANSI - Latin I)
Set the batch size to 1,000,000. Varies depending on the memory.
Set Options – Uncheck Check Constraints (default)
- Check Table Lock. It is not physical lock on the table.

Once the whole data set was loaded we built a clustered index on DateId, which took on ES7000 5 hours to build. It took around 1 hour to drop clustered index on this fact table.

If you have any questions on this feel free to contact me on sutha@smart-bi-solutions.com.

Thanks
Sutha

2 Comments:

  • At 2:07 AM, Blogger Sutha Thiru said…

    Make sure the database recovery model is set to Simple.

    Another Gotcha is when you create create PK via management studio it creates the PK with Clustered Index, which will affect performance big time when you load millions of rows.

    Mark Hill investigated some of the performance issues I have had earlier this week and identified these.

    Thanks Mark.

    Sutha

     
  • At 5:17 AM, Blogger Fuchsia Software Technologies Private Limited said…

    Hi !!
    I'm using SQL Server 2000 and I'm trying to insert 8 million records into a table.
    And there is only one column.
    I'm using the following code

    SET @SQL = ' BULK INSERT MP_Input_Table FROM ' +CHAR(39)+ @InputFilePath +CHAR(39) + '
    WITH (
    DATAFILETYPE = ''CHAR'',
    ROWTERMINATOR = ''\n'',
    CODEPAGE = 1252,
    BATCHSIZE = 100000,
    TABLOCK ) '

    EXEC(@SQL)


    But its taking me around 2 minutes to insert the data, whereas it took you only 2 minutes for 74 million rows.
    Can you please tell me, where I'm going wrong?
    Thanks and Regards
    Edward Anil Joseph

     

Post a Comment

<< Home