I "inherited" some legacy software for manipulation of large text files which is currently written in Visual Foxpro (version 7 specifically).
I'm exploring options for replacing these programs with something new that will be easier to modify and maintain. However, I'm unable to get close to the performance of Foxpro for importing large text files.
For example I have a ~1gb text file with ~110,000 records each having 213 fields (each row is 9247 characters long) that I'm testing with.
Using Foxpro:
? SECONDS()
USE new
APPEND FROM test.dat TYPE sdf
USE
? SECONDS()
This import to a database is completed in just under 10 seconds on my computer.
Using C#/SQLite (with Filehelpers and System.Data.SQLite), the quickest I have been able to get this import to complete is over a minute. I've tried optimizing this as best possible using suggestions from this question (such as transactions etc). Realistically a minute for importing a 1gb file doesn't seem bad if I wasn't comparing it to 10 seconds for Foxpro.
The approximate breakdown of time spent during the ~1 minute is:
Filehelpers reading file and parsing: 12 seconds.
Building SQL commands from Filehelpers object: 15 seconds.
Running individual ExecuteNonQuery()'s: 24 seconds.
Committing transactions (every 5000 records): 12 seconds.
When compared to the thread linked my inserts per second are much slower but my records have 213 fields vs 7 so this is expected. If I break it down by fields/second I'm at approximately 360,000 vs the 630,000 of the thread. Insert rate by megabyte is ~2.24 megabyte/s for the other poster and 15.4 megatbyte/s for me. Therefore I think my performance is comparable to the other poster and there likely isn't a ton more optimization I can do.
Why is this so much slower than the Foxpro import (5-6x slower)? Is this just apples to oranges and I should just accept the slower speed in trade-off for the other benefits I get by using newer tech?
EDIT:
Here is some test code I've run with random data showing similar speeds.... am I implementing the transactions correctly? The ExecuteNonQuery() is taking a significant amount of time. I've implemented all of the SQL commands with simple, non parameterized, most likely inefficient string manipulation but at this point I'm not overly concerned with that (I'll be reading from files and this method was much quicker than using entity framework).
public class Program
{
public static void Main(string[] args)
{
TestSqlite(100, 20000);
}
public static void TestSqlite(int numColumns, int numRows)
{
Console.WriteLine("Starting Import.... columns: " + numColumns.ToString() + " rows: " + numRows.ToString());
var conn = new SQLiteConnection(@"Data Source=C:\vsdev\SqliteTest\src\SqliteTest\ddic.db;Version=3");
conn.Open();
var cmd = new SQLiteCommand(conn);
cmd.CommandText = "DROP TABLE IF EXISTS test";
cmd.ExecuteNonQuery();
string createCmd = "CREATE TABLE 'test'('testId' INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ";
for (var i = 0; i < numColumns; i++)
{
createCmd += "'test" + i.ToString() + "' TEXT, ";
}
createCmd = createCmd.Substring(0, createCmd.Length - 2);
createCmd += ")";
cmd.CommandText = createCmd;
cmd.ExecuteNonQuery();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
var transaction = conn.BeginTransaction();
int lineCount = 0;
long startTime;
long endTime;
long totalStringTime = 0;
long totalAddTime = 0;
long totalSaveTime = 0;
string command;
for (var l = 0; l < numRows; l++)
{
startTime = stopWatch.ElapsedMilliseconds;
command = CreateRandomInsert("test", numColumns);
endTime = stopWatch.ElapsedMilliseconds;
totalStringTime += endTime - startTime;
///Execute Query
startTime = stopWatch.ElapsedMilliseconds;
cmd.CommandText = command;
cmd.ExecuteNonQuery();
endTime = stopWatch.ElapsedMilliseconds;
totalAddTime += endTime - startTime;
if (lineCount > 5000)
{
lineCount = 0;
startTime = stopWatch.ElapsedMilliseconds;
transaction.Commit();
transaction.Dispose();
transaction = conn.BeginTransaction();
cmd = new SQLiteCommand(conn);
endTime = stopWatch.ElapsedMilliseconds;
totalSaveTime += endTime - startTime;
Console.Write('.');
}
lineCount += 1;
}
startTime = stopWatch.ElapsedMilliseconds;
transaction.Commit();
transaction.Dispose();
endTime = stopWatch.ElapsedMilliseconds;
totalSaveTime += endTime - startTime;
Console.WriteLine('.');
Console.WriteLine("String time: " + totalStringTime.ToString());
Console.WriteLine("ExecuteNonQuery time: " + totalAddTime.ToString() + ", per 1000 records: " + (totalAddTime / (numRows / 1000)).ToString());
Console.WriteLine("Commit time: " + totalSaveTime.ToString() + ", per 1000 records: " + (totalSaveTime / (numRows / 1000)).ToString());
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine(" in " + elapsedTime);
conn.Close();
}
public static string CreateRandomInsert(string TableName, int numColumns)
{
List<string> nameList = new List<string>();
List<string> valueList = new List<string>();
for (var i = 0; i < numColumns; i++)
{
nameList.Add("test" + i.ToString());
valueList.Add(Guid.NewGuid().ToString());
}
return CreateSql(TableName, nameList, valueList);
}
public static string CreateSql(string TableName, List<string> names, List<string> values)
{
string textCommand = "";
textCommand += "INSERT INTO " + TableName + " (";
foreach (var nameVal in names)
{
textCommand += nameVal + ", ";
}
textCommand = textCommand.Substring(0, textCommand.Length - 2);
textCommand += ") VALUES (";
foreach (var val in values)
{
textCommand += "'" + val + "', ";
}
textCommand = textCommand.Substring(0, textCommand.Length - 2);
textCommand += ");";
return textCommand;
}
}