I am trying to find the best (fastest) way to import large text files into sql server 2000 relational tables. The database has simple recovery model. The total size of data to be load is about 150gig
Multiple source files with same layout. year1.txt, year2.txt, year15.txt etc. files are separate to keep size small. Each file has about 6 million rows.
.txt file has 30 columns delimited by "|".
2 sql server tables; a parent table with a ID (IDENTITY) primary key and a 3 column unique index (alternate key) and a detail table that has a DETAIL_ID (IDENTITY) primary key and a foreign key ID to the parent table. The detail table has both char and decimal columns. both destination tables are empty.
So, for each source file, I need to convert some columns to decimal and separate the data into parent and detail tables.
Here is one way to do this:
For each source file
* use dts bulk import into working sql table (no indexes) with every column char data type and do no dts conversions
* create index on 3 columns that match the parent table
* run insert query into parent table by selecting distinct 3 columns
* run insert query to join parent table with working table and insert into detail table. The query would include the primary key (identity) from parent table and convert some char columns to decimal.
* clean up sql working table by truncating and removing index.
* process next source file
To make it fast, I want to minimize database logging and datatype conversions. This is why I drop and create indexs separate from the data import.
Would it be faster to use dts transformations to convert the source data char columns to decimal while it is importing to the working sql table instead of using sql after the import to convert the datatypes? Im guessing since dts transforms on a row-by-row basis, it would take longer than doing a single sql insert statement with conversions.
Is there a better way to do this in dts so I can go from the single source file to 2 related sql server tables and do data conversion all inside DTS?
Speed is the primary concern. I tested loading one source file of 6 million rows to sql working table and querying into 2 related tables and it took about 40 minutes. Is this a reasonable amount of time for this task? The server has 4 processors and 2 gig of ram on a raid 5 controller.
As I process these loads, the 2 related tables will get larger and larger; which will probably slow down the final insert into detail table.
Thank you for any suggestions.