2

I have a problem with a SAX xml parser. I want to parse a xml file which obviously is not valid (I get an ExpatParser$ParseException: At line 5, column 169: not well-formed (invalid token)). I know what is wrong, but the xml file ist not created by me...so I cant change it.

Now I want to handle that Error in my DefaultHandler. But neither error() nor fatalError() nor warning() is invoked...

Can I somehow interrupt the parsing process, tell the parser what to do with that piece of invalid xml and continue parsing???

Thanks, JPM

jpm
  • 2,686
  • 1
  • 15
  • 20
  • If I was you I would wite some sort of cleanup code that you pass the XML into before the SAX parser...or tell your source to fix their XML already if at all possible because it would take them all of three seconds for a minor syntax error. – Robert Massaioli Apr 28 '11 at 22:11
  • Exactly same problem i have .... http://stackoverflow.com/questions/5673423/saxparser-fails-when-responce-contains-hindi-or-other-special-characters – Vaibhav Jani Apr 29 '11 at 03:48
  • This is a bit like life giving you lemons; the SAX Parser cannot make apple juice with lemons. For the record this is the appropriate response to the guy that is giving you the lemons: "I don't want your damn lemons! What the hell are these?! Demand to see life's manager! Make life rue the day it thought it could give Cave Johnson lemons! Do you know who I am? I'm the man who's gonna burn your house down! WITH THE LEMONS! I'm gonna get my engineers to invent a combustible lemon that BURNS YOUR HOUSE DOWN!" (Portal 2) – Robert Massaioli Apr 29 '11 at 08:05
  • http://stackoverflow.com/questions/4574710/xml-parsing-from-non-xml-document/4575099#4575099 – Mads Hansen May 03 '11 at 01:58

1 Answers1

1

I would guess that this SAXParseException is a fatal error that the SAX parser cannot recover from. In that case you probably need to fix up the bad tag before trying to parse it (as Robert suggests in his comment).

You might want to look into using a Java Regex to fix up the known badness in the XML, e.g.
Regex for quoting unquoted XML attributes

For the record, I am not advocating using regex to actually parse XML!

Community
  • 1
  • 1
Dan J
  • 24,430
  • 17
  • 95
  • 168
  • Thanks Dan and Robert, I guess I will do that. Since the xml is quite simple I maybe can parse it manually...I have to work on something else first. But I think one of those ways will solve my problem (and I still have hope we can get the source to invest the 2 seconds to fix there xml :-) ) Thanks, JPM – jpm Apr 29 '11 at 14:12