0

I am using SAX Parser to parse following piece of data with "Description" attribute containing HTML content . But I am getting error "The value of attribute "Description" associated with an element type "null" must not contain the '<' character".

How to make SAX Parser ignore this tag while XML Processing?

<Thread ThreadID="22" Title="google"
                    Description="<a href="http://google.com/">http://google.com/</a>"
                    DisplayName="Sam" LoginID="hjaja" UserEmailID="abx@ers"
                    UserSapCode="12345"
                    IsAnonymous="Yes" CreatedDate="2015-04-29T21:56:04.943" ReplyCount="0"
                    ViewCount="0" PopularityPoints="0" LastUpdatedBy="" LastPostDate="" />

Thanks in advance.

vijay
  • 486
  • 5
  • 17
  • You'll have to either escape the HTML and its XML control characters (``, and `"`) in the Description, or not include it, since it's breaking the XML structure by including XML-like data. The parser is seeing `Description=" – Shotgun Ninja Jun 11 '15 at 14:50
  • 1
    The XML is broken anyways. YOu shouldn't try to get a parser to "ignore" broken xml - you should be fixing whatever's producing the bad xml in the first place. e.g. you're asking "how do I pry out these bullets inside me" instead of "how do I stop getting shot". – Marc B Jun 11 '15 at 14:50

2 Answers2

0

I really thing that you should take a look at this post (HTML code inside XML) to see how other people recommended to tackle such problem.

Community
  • 1
  • 1
crigore
  • 320
  • 5
  • 21
0

No XML parser can parse this data as the data do not comply the xml format. Please refer XML specifications.

There are two ways you can solve this:

  1. Change the source format

Change the source to create the proper XML. You can include HTMLs by escaping the characters using these:

"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;
  1. Change the target algo

Second is by creating your own parsing algorithm for you case.

Usually answer is always the the first one.

Uday Shankar
  • 794
  • 5
  • 20