0

TL;DR; Images converted to base64string have huge RAM footprint in large object heap.

I have some code in a windows service that consumes our product images uploaded by users, standardizes them into a web-grade format (they will upload 10MB bitmaps), and does some other things like resize them into a square and add whitespace padding.

It then converts them to a base64 string to upload them into our hosting environment via rest. The environment requires it be done this way, i cannot use URLS. When I do this, they get stored on the large object heap and the program's RAM usage skyrockets over time.

How do I get around this issue?

Here is the code,

private void HandleDocuments(IBaseProduct netforumProduct, MagentoClient client, bool isChild)
{
    if (netforumProduct.Documents == null) { return; }

    for (int idx = 0; idx < netforumProduct.Documents.Count; idx++)
    {
        JToken document = netforumProduct.Documents[idx]["Document"];
        if (document == null) { continue; }

        string fileName = document["URL"].ToString();

        // Skip photos on child products (the only identifier is part of the url string)
        if (fileName.ToLower().Contains("photo") && isChild) { continue; }

        using (HttpClient instance = new HttpClient {BaseAddress = client.NetforumFilesBaseAddress})
        {
            string trimStart = fileName.TrimStart('.');

            string base64String;

            using (Stream originalImageStream = instance.GetStreamAsync("iweb" + trimStart).Result)
            {
                using (MemoryStream newMemoryStream = new MemoryStream())
                {
                    using (Image img = Image.FromStream(originalImageStream))
                    {
                        using (Image retImg = Utility.Framework.ImageToFixedSize(img, 1200, 1200))
                        {
                            retImg.Save(newMemoryStream, ImageFormat.Jpeg);
                        }
                    }

                    newMemoryStream.Position = 0;

                    byte[] bytes = newMemoryStream.ToArray();
                    base64String = Convert.ToBase64String(bytes);
                }
            }

            // MediaGalleryEntry is a simple class with a few string properties
            MediaGalleryEntry mge = new MediaGalleryEntry
            {
                label = "Product_" + netforumProduct.Code + "_image_" + idx,
                content = new MediaGalleryContent
                {
                    base64_encoded_data = base64String,
                    name = "Gallery_Image_" + idx
                },
                file = trimStart
            };

            this.media_gallery_entries.Add(mge);
        }
    }
}

Its not the best code ever, probably not highly optimized, but its the best I can do.

CarComp
  • 1,742
  • 1
  • 18
  • 38
  • 1
    Well a 10 MB bitmap may become a 1 MB JPEG which in turn becomes a 1.3 MB base64 string. At `this.media_gallery_entries.Add(mge)` you keep a reference to this string, so it can't be garbage collected. Is that your issue? – CodeCaster Oct 21 '19 at 12:30
  • Yes, this is exactly the problem. I am really unsure how to dispose of this string once i've POST'ed to the web service. – CarComp Oct 21 '19 at 12:32
  • 1
    @CarComp Just stop referencing it and the GC will **eventually** collect it. No special disposing should be nessesary, as it is only a string. – Christopher Oct 21 '19 at 12:35
  • 1
    Depending on how you construct the web requests, building a file (that is, by streaming to it, not by constructing a string and then writing that) and uploading it may circumvent this (or even bypassing the file and streaming directly to the URL, but this may be more complicated depending on the API). This means not using `Convert` but something that supports streams [like `ToBase64Transform`](https://stackoverflow.com/a/57820500/4137916). – Jeroen Mostert Oct 21 '19 at 12:37
  • @Jeroen Mostert this is a great option, unfortunately it's not something I can do. The REST api requires I use base64. I wish it worked this way. – CarComp Oct 21 '19 at 12:44
  • 1
    You may mean the programmatic API offered (or generated) that calls the REST API works with strings, but this probably does not mean it's impossible to write a new API that doesn't have this flaw. The whole thing with REST is that it's straightforward, so any language can consume the API. Ultimately it all ends up as a stream of bytes over TCP, so there is not (cannot be) a hard requirement to construct big strings in C# from the physical endpoint alone. It may be as simple as giving `MediaGalleryEntry` a property of a `Stream` type. – Jeroen Mostert Oct 21 '19 at 12:47
  • The REST API is from Magento. I know how to make it use a stream, but this RAM problem doesn't warrant me using resources to develop a new module to consume the image this way. – CarComp Oct 21 '19 at 12:50
  • There are things like this I could implement https://github.com/olivertar/m2_api_product_images – CarComp Oct 21 '19 at 13:42

1 Answers1

1

TL;DR; Images converted to base64string have huge RAM footprint in large object heap

Yes, that is obviously true. All images are huge. Compression methods only apply to storage and transfer. But when the Image is loaded into memory - for display or further processing - all compression steps have to be undone. This is a common pitfall of people working with them.

It then converts them to a Base64 string to upload them into our hosting environment via rest. The environment requires it be done this way, i cannot use URLS. When I do this, they get stored on the large object heap and the program's RAM usage skyrockets over time." Base64 is ineffective, but will not add a lot to this. +25% IIRC.

The big questions if you are really seeing an issue here, or are only misreading the memory footprint? @CodeCaster figured out that you kept a reference (wich is a real problem and one of the few ways you can get a memory leak in .NET at all), but even if you loose those this string will still stay in memory for some time.

.NET uses the GarbageCollection Memory Management approach. That approach has one issue: While the GC collects, all other Threads accessing the same managed area have to be paused. As a result the GC is - for lack of a better term - very lazy with running. If it only runs once on application closure, that is the ideal situation. The only things that can get it to run earlier are:

  • calls to GC.Collect(); which should generally not be used in productive code, only for debugging if you got a reference memory leak
  • the danger of a OOM Expection
  • some of the alternative GC modes, particular stuff like the server one

All I can tell you that it will run eventually. But I do not think you need to know the exact time necessarily.

Milo
  • 3,002
  • 9
  • 25
  • 40
Christopher
  • 8,956
  • 2
  • 14
  • 31
  • Yes, this basically sums it up. Its an inherent drawback to doing it this way. Manipulating images uses ram and there's not a fix-all solution. – CarComp Oct 21 '19 at 12:46