Using C#
I'm trying to insert from csv file (~ 55000 records) to a postgresql table
with 350 columns. I'm just building the insert statement
from header and record which are comma separated.
Like this:
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
INSERT INTO public.table(field1,fields2,field3, ..... ,field350) VALUES(?,?,?,?,?,.......,?);
....etc
I tried using batch
, like gathering 1000 statements and execute them in a transaction
, but this takes for 1000 records up to ~3 sec.
I tried to follow the example from Microsoft
here in which I call ExecuteNonQuery()
on each line but I commit
the transaction
after 1000 records and begin a new transcation
and this is taking up to ~3 sec per 1000 records as well. Something like this:
foreach (var line in ReadCsvFileToList(dataFile))
{
try
{
if (firstLine)
{
header = line;
firstLine = false;
continue;
}
else
{
var formattedLine = line.Replace("\"\"", "NULL").Replace("'", "''").Replace("\"", "'");
var commandText = $"INSERT INTO public.table({header.Replace('/', '_')}) VALUES ({formattedLine})";
command.CommandText = commandText;
await command.ExecuteNonQueryAsync();
round++;
}
if (round == 1000) // batch size
{
await transaction.CommitAsync();
Console.WriteLine("batch commited to DB at: " + DateTime.Now.ToString("ddMMyyy hh:mm:ss"));
round = 0;
transaction = connection.BeginTransaction();
}
}
catch (Exception)
{
await connection.CloseAsync();
await connection.DisposeAsync();
await transaction?.RollbackAsync();
throw;
}
}
An idea on how to optimize this more?