1

I've read through the similar post of % character but it seems the other issues can be solved in the header line. Are there certain characters not allowed in XML or do I need to format the document another way (In my case the "=" character is giving me trouble when trying to read in the document in C#)?

Name cannot begin with the character ' ', also similar but still fixed by header.

XElement nodes = XElement.Load(filename);  

The structure of the XML is below:

<?xml version="1.0" encoding="utf-8"?>
<offer>
  <data id="Salary">
    <ocrstring>which is equal to $60,000.00 if working 40 hours per week</ocrstring>
    <rule>.*(([+-]?\$[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}))</rule>
    <output></output>
  </data>
  <data id="Hours">
    <ocrstring></ocrstring>
    <rule>"(?<=working).*?(?=hours)"</rule>    <!-- Error Occurring Here -->
    <output>bob</output>
  </data>
  <data id="Location">
    <ocrstring></ocrstring>
    <rule>Regex2</rule>
    <output>LongWindingRoad222</output>
  </data>
</offer>

ErrorParsingXML

How can I parse the XML Document without getting the Cannot Begin with Character "=" Error

Jason Aller
  • 3,391
  • 28
  • 37
  • 36
William Humphries
  • 434
  • 1
  • 8
  • 15
  • 2
    I think it's actually a side effect of the preceding "less than" (` – Phil Brubaker Jun 18 '20 at 17:09
  • Where did you get this xml? The problem must be solved in the place where it appears. Invalid xml is generated initially. You need to fix the way it is created. – Alexander Petrov Jun 18 '20 at 17:23
  • 1
    The error is due to the less than sign being in the string (open angle bracket) . The bracket is reserved in XML for tag names and when used in innertext must be < – jdweng Jun 18 '20 at 17:41

1 Answers1

5

You need to use CDATA sections for all the <rule> elements.

What does <![CDATA[]]> in XML mean?

XML

<?xml version="1.0" encoding="utf-8"?>
<offer>
    <data id="Salary">
        <ocrstring>which is equal to $60,000.00 if working 40 hours per week</ocrstring>
        <rule><![CDATA[.*(([+-]?\$[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}))]]></rule>
        <output></output>
    </data>
    <data id="Hours">
        <ocrstring></ocrstring>
        <rule><![CDATA["(?<=working).*?(?=hours)"]]></rule>
        <!-- Error Occurring Here -->
        <output>bob</output>
    </data>
    <data id="Location">
        <ocrstring></ocrstring>
        <rule>Regex2</rule>
        <output>LongWindingRoad222</output>
    </data>
</offer>
Jason Aller
  • 3,391
  • 28
  • 37
  • 36
Yitzhak Khabinsky
  • 8,935
  • 1
  • 11
  • 17