1

I need to read large csv files and to insert them into SQL, my idea was to use streamreader and read the file line by line because if I store the content in a variable, program crashes. So thats what i thought:

using FileStream fs  
Dim list as String
 Try
 Dim MyFile as String = ("C:\\Test.txt")
            Using fs as FileStream = File.Open(MyFile, FileMode.Open, FileAccess.ReadWrite, FileShare.None) 'file is opened in a protected mode
               firstline= fs.ReadLine 'treat the firstline as columnname
                rest = fs.ReadLine 'the rest as rest
                Do While (Not rest Is Nothing) 'read the complete file
                list.Add(rest)

                Filestream.TextFieldType = FileIO.FieldType.Delimited
                Filestream.SetDelimiters(";")
                Loop
            End Using
        Catch
            ResultBlock.Text = "File not readable"
        End Try

i wrote list.Add(rest) which is actually a bad idea because the content is stored in a variable then, but i need to read and insert line for line in a sql database which seems to be pretty complicated though, does anyone has an idea how i could handle that?

Ňɏssa Pøngjǣrdenlarp
  • 37,255
  • 11
  • 50
  • 147
Mike J
  • 41
  • 7
  • You are mixing TextFieldParser elements with FileStream. `TextFieldType` is not a member of FileStream - that will not compile. FileStream is a type so you cant reference it that way. If you have a file to import use TextFieldParser or OleDB. – Ňɏssa Pøngjǣrdenlarp May 07 '16 at 23:00

2 Answers2

2

If you can't read the file into memory because it's too big then what you need is some sort of buffer that holds the records in memory and writes to the database when the list gets to a certain size.

If you really want to keep it manageable then the reader, the writer, and the buffer should all be completely separate from each other. That sounds like more work because it's more classes, but it's actually simpler because each class only does one thing.

I would create a class that represents the item that you're reading from the file, with properties for each record. Like if each line in the file represents a person with a name and employee number, create a class like

public class Person
{
    public string FirstName {get;set;}
    public string LastName {get;set;}
    public string EmployeeNumber {get;set;}
}

You'll need a buffer. The job of the buffer is to have items put into it, and flush to a writer when it reaches its maximum size. Perhaps like this:

public interface IBuffer<T>
{
    void AddItem(T item);
}

public interface IWriter<T>
{
    void Write(IEnumerable<T> items);
}

public class WriterBuffer<T> : IBuffer<T>
{
    private readonly IWriter<T> _writer;
    private readonly int _maxSize;
    private readonly List<T> _buffer;

    public WriterBuffer(IWriter<T> writer, int maxSize)
    {
        _writer = writer;
        _maxSize - maxSize;
    }

    public void AddItem(T item)
    {
        _buffer.Add(item);
        if(_buffer.Count >= _maxSize)
        {
            _writer.Write(_buffer);
            _buffer.Clear();
        }
    }    
}

Then, your reader class doesn't know about the writer at all. All it knows is that it writes to the buffer.

public class PersonFileReader
{
    private readonly string _filename;
    private readonly IBuffer<Person> _buffer;

    public PersonFileReader(string filename, IBuffer<Person> buffer)
    {
        _filename = filename;
        _buffer = buffer;
    }

    public void ReadFile()
    {
        //Reads from file.
        //Creates a new Person for each record
        //Calls _buffer.Add(person) for each Person.
    }       
}

public class PersonSqlWriter : IWriter<Person>
{
    private readonly string _connectionString;

    public PersonSqlWriter(string connectionString)
    {
        _connectionString = connectionString;
    }

    public void Write(IEnumerable<Person> items)
    {
        //Writes the list of items to the database 
        //using _connectionString;
    }
}

The result is that each of these classes does only one thing. You can use them separately from the others and test them separately from the others. That applies the Single Responsibility Principle. No one class is too complicated because each one has only one responsibility. It also applies the Dependency Inversion principle. The reader doesn't know what the buffer does. It just depends on the interface. The buffer doesn't know what the writer does. And the writer doesn't care where the data comes from.

Now the complexity is in creating the objects. You need a file name, a connection string, and a maximum buffer size. That means something like

var filename = "your file name";
var maxBufferSize = 50;
var connectionString = "your connection string"

var reader = new PersonFileReader(
    filename,
    new WriterBuffer<Person>(
        new PersonSqlWriter(connectionString),
        maxBufferSize));

Your classes are simpler, but wiring them all together has gotten a little more complicated. That's where dependency injection comes in. It manages this for you. I won't go into that yet because it might be information overload. But if you mention what sort of application this is - web, WCF service, etc., then I might be able to provide a concrete example of how a dependency injection container like Windsor, Autofac, or Unity can manage this for you.

This was all new to me several years ago. At first it just looked like more code. But it actually makes it easier to write small, simple classes, which in turn makes building complex applications much easier.

Scott Hannen
  • 21,450
  • 3
  • 33
  • 44
2

Have a look at below links:
BulkCopy How can I insert 10 million records in the shortest time possible?
This one contains code samples: http://www.sqlteam.com/article/use-sqlbulkcopy-to-quickly-load-data-from-your-client-to-sql-server

You can also use Import Wizard (https://msdn.microsoft.com/en-us/library/ms141209.aspx?f=255&MSPPError=-2147217396).

Community
  • 1
  • 1
Alex
  • 4,181
  • 3
  • 15
  • 34