1

I have an application that (among other features) stores PDF documents into a byte field in a database. Entity framework is used for all data access and handled through a repository class. The EF container is stored in the repository class and persists as long as the repository object does.

I store each document one at a time into the database. I know this is not as efficient as loading a batch at a time, but I have to do additional processing after it has been inserted.

What I cannot figure out is the large amount of memory this application is using, which is slowing the application down considerably. I will push in about 5000 PDFs at a time. It will run very quickly for the first 500 or so PDFs and then slows to a crawl. At this point the memory usage of this console application is up to around 1.5GB.

Here is the repository method call. The SaveChanges() method just calls a the save changes method of the container and then returns true/false depending on result.

public bool AddDocument(Document document)
{
    dataContainer.Documents.Add(document);

    return SaveChanges();
}

The document class is...

public partial class Document
{
    public Document()
    {
        this.Name = "";
        this.Filename = "";
    }

    public int Id { get; set; }
    public string Name { get; set; }
    public string Filename { get; set; }
    public byte[] Data { get; set; }
}

I have used the ANTS Memory Profiler and found that the memory skyrockets during the .Add(document) call. I think I am getting hit with some lazy loading that is populating the Documents collection.

How can I stop my app from running out of control with memory in hopes that it will speed up operations?

Trey Gourley
  • 351
  • 2
  • 17
  • Have you looked at the generated SQL? Id be curious to see whats going on under the hood. – Botonomous Jan 03 '14 at 19:16
  • You may need to GC after each call. Can you dispose (free memory) your objects? – Delphi.Boy Jan 03 '14 at 19:17
  • Maybe you can perform a "Performance Analysis" in Visual Studio? This helps you to identify which method(s) or object(s) takes a lot of memory. – Odrai Jan 03 '14 at 19:23
  • I have tried dispose, but it throws an error after the first insert that, "The ObjectContext instance has been disposed and can no longer be used for operations that require a connection." – Trey Gourley Jan 03 '14 at 19:30

1 Answers1

1

You could try some standard tricks to improve the performance of bulk inserts (they apply and usually help a lot, no matter if your entity contains a binary field or not):

  • Don't call SaveChanges after each call to DbSet<T>.Add but only once after a adding "a few" items (in the range of around 100, but you have to test it to find the best value)
  • Disable automatic change detection by setting AutoDetectChangesEnabled = false
  • Dispose the context after saving "a few" items (again around 100) and create a new one

All three pieces together are sketched in this code snippet.

Community
  • 1
  • 1
Slauma
  • 167,754
  • 56
  • 385
  • 407
  • What about using asynchronous approaches? – Travis J Jan 03 '14 at 19:30
  • @TravisJ: I have no clue if it would help. I'm currently a bit behind the frontline of the most recent features of EF 6. But it's certainly worth a try... – Slauma Jan 03 '14 at 19:38
  • @Slauma Have you run into issues of calling dispose on a persistent container? Per the comment on the question, I tried calling dispose after the save changes but it threw the exception after the first insert that, "The ObjectContext instance has been disposed and can no longer be used for operations that require a connection." – Trey Gourley Jan 03 '14 at 19:43
  • 1
    @TreyGourley: With "persistent container" you mean just a DbContext instance, right? Well, of course you can't dispose the context and then continue calling methods on it that need to interact with the database. You have to instantiate a new context to continue, basically working blockwise: first 100 items with the first context instance, second 100 items with the second context instance, and so on... Wrapping all blocks into a `TransactionScope` ensures that everything or nothing is stored in a single DB transaction. – Slauma Jan 03 '14 at 19:51
  • @Slauma I know it has been a couple of days, but I was able to incorporate your suggestions into insertion. It worked very well. I struggled against the persistant DbContext that is being used in the class object. I basically had to make sure that I was only using a new context, but it dramatically reduced the memory usage and runs much faster. Thanks! – Trey Gourley Jan 09 '14 at 14:54