1

I am trying to retrieve a cdata section from an xml document, the format of the xml is like this:

<Configuration>
    <ConfigItem>
        <Key>Hello World</Key>
        <Value><![CDATA[For the value we have a large chunk of XAML stored in a CDATA section]]></Value>
    </ConfigItem>
</Configuration>

What I am trying to do is retrieve the XAML from the CDATA section, my code so far is as follows:

XmlDocument document = new XmlDocument();
document.Load("Configuration.xml");

XmlCDataSection cDataNode = (XmlCDataSection) document.SelectSingleNode("//*[local-name()='Value']").ChildNodes[0];

String cdata = cDataNode.Data;

However the cdata string has been truncated and is incomplete, I guess because the actual cdata is too large to fit in the string object.

Whats the correct way to do this?

EDIT:

So my original assumption that the string was too long was incorrect. The problem now is that my CDATA contains a nested CDATA within it. Reading online it appears that the proper way to escape the nested cdata is to use ]]]]><![CDATA[> which this xml is using, but it seems like when I select the node it is escaping at the wrong place.

Matthew Wilson
  • 1,895
  • 12
  • 24
  • 1
    How large is the CDATA section? – DGibbs Aug 01 '14 at 11:14
  • In terms of characters it is 3186923 – Matthew Wilson Aug 01 '14 at 11:16
  • You shouldn't rely on the CData being there anyway - just read the `InnerText` property of the element. `string` is limited to about two billion characters, so if that has anything to do with your problem, you're doing something very wrong :D – Luaan Aug 01 '14 at 11:16
  • 1
    Theoretical limit for a .NET string is ~1GB (2B unicode chars), practical probably even less. In fact, no object in .NET can be larger than 2GB, even on a 64-bit machine. You can at best use a stream, and then read and process the data on the fly, without ever creating that huge single object in memory. But it would surprise me to see such a large XML file in the first place. **(edit)** I didn't see your comment, but obviously 3MB would fit in a string. – Groo Aug 01 '14 at 11:17
  • Why does the cdata object not contain the full cdata section from the document? – Matthew Wilson Aug 01 '14 at 11:19
  • Based on the comments above its obviously not a limit with the String object in .net... – Matthew Wilson Aug 01 '14 at 11:23
  • What are you doing with the CDATA section once you're retrieved it? How are you viewing it to come to the conclusion that not all of it is being displayed? Also, your CDATA declaration looks a bit odd to me, usually it's done like this: `` – DGibbs Aug 01 '14 at 11:25
  • I am printing it to the console window and have also inspected the length when debugging, the xml above was just an example for easy reading, updated for accuracy – Matthew Wilson Aug 01 '14 at 11:28
  • 1
    Looking at the MSIL for the `XmlDocument`, I don't see that it ever checks or truncates CDATA length. What's the exact length of the node? You could use pastebin or something similar to paste the example and see what happens. – Groo Aug 01 '14 at 11:28
  • 1
    @Groo Seconded. Add the XML to pastebin – DGibbs Aug 01 '14 at 11:32
  • @Matthew: I just created a test app and it works correctly for a CDATA block of that length. So the problem is probably with its content (make sure that it doesn't contain [invalid characters](http://stackoverflow.com/a/2784200/69809), for example). Are you sure that the XAML stored inside this section doesn't contain a nested CDATA block or something? **(edit)** Well, `XmlDocument` would complain with an exception, so that's not it. – Groo Aug 01 '14 at 11:38
  • It does contain a nested CDATA block! So it is detecting the cdata escape for the nested cdata and stopping there? Is it ok to have CDATA nested like this? – Matthew Wilson Aug 01 '14 at 12:33
  • @Groo - "no object in .NET can be larger than 2GB..." - not true since [``](http://msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx) appeared: "On 64-bit platforms, enables arrays that are greater than 2 gigabytes (GB) in total size." – Damien_The_Unbeliever Aug 01 '14 at 12:51
  • @Damien: that's good to know, thanks. It probably defaults to `false` for a good reason, there are rare use cases for such large in-memory objects. – Groo Aug 01 '14 at 12:53

1 Answers1

3

When there's nested CDATA sections, what you need to do is piece the data back together. At present, you're just selecting ChildNodes[0] and ignoring all of the other children. What you'll probably find is that ChildNodes[1] contains some plain text, and then ChildNodes[2] contains another CDATA section, and so on.

You need to extract all of these, extract the data from the CData sections, and concatenate them all together to get the effective "text" contents of the Value element.

Damien_The_Unbeliever
  • 220,246
  • 21
  • 302
  • 402