5

I have an XML file which result from the export of a database (Oracle 11g Unicode) table. This table has a BLOB field which represent a file. The file could be a very large one.

So in the case where I have a very large file a get in the XML a very large string representation of that file.

I have to get the bytes of this string in order to insert the file in another database instance.

A this point the XML is charged and i have then a string representing the file.

What I've done is this:

Encoding.Unicode.GetBytes(stringFileRepresentation);

But I'm getting an OutOfMemoryException.

If I do this:

Encoding.Unicode.GetBytes(stringFileRepresentation.ToCharArray());

I get also an OutOfMemoryException.

I tried too to do this before decoding the string:

var chars = stringFileRepresentation.ToCharArray();
Encoding.Unicode.GetBytes(chars);

And I getting the OutOfMemoryException when calling ToCharArray().

So I guess is a problem when processing the string.

Then I'm trying the following method that I found here event if I'm not sure I have to conservate the encoding of the string:

byte[] bytes = new byte[str.Length * sizeof(char)];
Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);

But I'm getting also an OutOfMemoryException in instantiating the bytes variable.

Now, I ran OutOfOptions and I don't know what to do.

Community
  • 1
  • 1
sabotero
  • 3,935
  • 3
  • 25
  • 41
  • 1
    How would you write the blob to the database after getting the byte array? Is there a streaming option? (Ideally, you shouldn't need to have the whole file in memory even as a string...) – Jon Skeet Nov 21 '13 at 15:24
  • 2
    Alternatively work out a way to get the two databases to talk to each other directly. – samjudson Nov 21 '13 at 15:29
  • I started looking at ODP.NET and I dont see a streaming option. It almost seems like you'll need to break this out into chunks on your own and simply append the field in the database. Maybe write a SPROC that will append the bytes to the current field and loop calling it inserting chunks until you have inserted the entire object – Wjdavis5 Nov 21 '13 at 15:44
  • I found this : "All ODP.NET LOB objects inherit from the .NET Stream class to provide generic Stream operations." Here : http://docs.oracle.com/html/B14164_01/featLOBsupp.htm – Wjdavis5 Nov 21 '13 at 15:46
  • @samjudson I'm working in an _admin_ application to manage several database instance so they export some data today in an XML file and they re-import this data in other database other day. The generated XML file contain much more information than this file I'm talking here about. I don't know how I could work out another way to do this than treating the generated XML file. Thank you though. – sabotero Nov 21 '13 at 15:55
  • @JonSkeet I'm going to explore what you say. I'm actually loading the file in memory with an `XmlReader` implementation. To manage the database connection I'm using _Devart_ and I think they execute the insert statement directly. – sabotero Nov 21 '13 at 15:58
  • 1
    In addition to all the valid points mentioned above, read this: http://blogs.msdn.com/b/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx. You don't mention how "very large" the file is, but at some point both the string and the byte array need to be in memory. If they're each 1 GB or more, and you are running in a 32 bit process, you will run out of memory space. In fact, you will with files much smaller than that. – Kris Vandermotten Nov 21 '13 at 16:10

1 Answers1

5

Since you've already got the original full string in memory, you can use a StringReader to buffer through it:

This will get the text into a file. You can use a similar technique to write to a different stream instead of a file.

using (var sr = new StringReader(fileContents))
{
    using (var outputWriter = new StreamWriter(@"C:\temp\output.txt"))
    {
        char[] buffer = new char[10];
        int numChars;
        while ((numChars = sr.ReadBlock(buffer, 0, buffer.Length)) > 0)
        {
            outputWriter.Write(buffer, 0, numChars);
        }
    }
}

EDIT

Writing to something other than a file is pretty similar - for example, suppose you wanted to write directly to a stream (doesn't matter what kind of stream - could be a MemoryStream, HttpResponse stream, FileStream, etc.):

using (var sr = new StringReader(fileContents))
{
    using (var outputStream = GetSomeStreamFromSomewhere())
    {
        char[] buffer = new char[10];
        int numChars;
        while ((numChars = sr.ReadBlock(buffer, 0, buffer.Length)) > 0)
        {
            char[] temp = new char[numChars];
            Array.Copy(buffer, 0, temp, 0, numChars);
            byte[] byteBuffer = Encoding.UTF8.GetBytes(temp);
            outputStream.Write(byteBuffer, 0, byteBuffer.Length);
        }
    }
}
Joe Enos
  • 36,707
  • 11
  • 72
  • 128
  • The string comes already from a file, I don't find useful to put it again in a file. Thank you though! – sabotero Nov 21 '13 at 16:45
  • I don't know what you're doing with the contents one you have them. Hopefully you've got some method of writing it in chunks to its final destination. In that case, instead of `StreamWriter`, you would open some other stream attached to its final destination, then convert the char buffer into a small byte array, and write that byte array to the stream. .NET streams are mostly interchangeable, so the code would be similar. – Joe Enos Nov 21 '13 at 17:20
  • I'm trying to write the content in an oracle database, with Devart in between. – sabotero Nov 21 '13 at 17:25
  • Not sure how ODP.NET or Devart handles writing large streams into a binary field. I would expect it allows you to use a stream, in which case you would just need to create that stream and use my second example. If you can't use a stream, you may have to find some kind of workaround. – Joe Enos Nov 21 '13 at 17:33
  • 1
    BTW, that buffer length should be a lot bigger than 10 - that was just what I had in my sample. 1024 or 4096 or 8192 seem like common buffer lengths in real life. – Joe Enos Nov 21 '13 at 17:35
  • Hey you just saved me some time, upvote! Copy Paste Edit time =) I'm doing the same thing but its for a PL/1 interface that wants char 4000 and I just need to mimic the output so I keep it as an array of char – JPK May 13 '14 at 14:19