help importing large flat file into relational tables

Last Post 23 Jan 2006 10:01 AM by mwesch. 1 Replies.
AddThis - Bookmarking and Sharing Button Printer Friendly
  •  
  •  
  •  
  •  
  •  
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
MacT66
New Member
New Member

--
23 Jan 2006 07:27 AM
I am trying to find the best (fastest) way to import large text files into sql server 2000 relational tables. The database has simple recovery model. The total size of data to be load is about 150gig

source:

Multiple source files with same layout. year1.txt, year2.txt, year15.txt etc. files are separate to keep size small. Each file has about 6 million rows.

.txt file has 30 columns delimited by "|".

Destination:

2 sql server tables; a parent table with a ID (IDENTITY) primary key and a 3 column unique index (alternate key) and a detail table that has a DETAIL_ID (IDENTITY) primary key and a foreign key ID to the parent table. The detail table has both char and decimal columns. both destination tables are empty.

So, for each source file, I need to convert some columns to decimal and separate the data into parent and detail tables.

Here is one way to do this:

For each source file

* use dts bulk import into working sql table (no indexes) with every column char data type and do no dts conversions
* create index on 3 columns that match the parent table
* run insert query into parent table by selecting distinct 3 columns
* run insert query to join parent table with working table and insert into detail table. The query would include the primary key (identity) from parent table and convert some char columns to decimal.
* clean up sql working table by truncating and removing index.
* process next source file

To make it fast, I want to minimize database logging and datatype conversions. This is why I drop and create indexs separate from the data import.

Would it be faster to use dts transformations to convert the source data char columns to decimal while it is importing to the working sql table instead of using sql after the import to convert the datatypes? Im guessing since dts transforms on a row-by-row basis, it would take longer than doing a single sql insert statement with conversions.

Is there a better way to do this in dts so I can go from the single source file to 2 related sql server tables and do data conversion all inside DTS?

Speed is the primary concern. I tested loading one source file of 6 million rows to sql working table and querying into 2 related tables and it took about 40 minutes. Is this a reasonable amount of time for this task? The server has 4 processors and 2 gig of ram on a raid 5 controller.

As I process these loads, the 2 related tables will get larger and larger; which will probably slow down the final insert into detail table.

Thank you for any suggestions.
mwesch
New Member
New Member

--
23 Jan 2006 10:01 AM
You are correct that you should not do transformations in DTS. The bulk load process will go much faster if you load the data as is.

One big factor you didn't mention that could be performance issue is databse filesize. If you know that you're putting in a lot of data, don't let aut-grow happen on its own, it will slow down load process considerable. Resize the database in advance to the expected extents.

If you could get the files in XML format, the SQLXML bulk load process will do the table relationsships on the fly. Although I don't recall what the performance of DTS vs SQLXML is.
You are not authorized to post a reply.

Acceptable Use Policy