1

I was a bit surprised that using double slash as comments seems to be valid XML.

The following parses correctly with Python and xml.etree.ElementTree and under xmllint --format:

<root>
    <child1>text1</child1>
    <child2></child2> //this is a valid comment
    <child3></child3>
</root>

I first thought that this could be seen as text node for root element, but trying it on python3 proved me wrong:

>>> import xml.etree.ElementTree as ET
>>> r=ET.parse("test.xml").getroot()
>>> r.text
'\n    '
>>> child2=r[1]
>>> child2.text
>>> ET.tostring(child2)
b'<child2 /> //this is a valid comment\n    ' 

Can someone point me to the spec where this is allowed ?

kjhughes
  • 89,675
  • 16
  • 141
  • 199
navidof
  • 13
  • 3

3 Answers3

5

XML Explanation

No, comments can only be <!-- comment --> in XML. You're seeing //this is a valid comment as text, which is allowed between elements in mixed content. You could just as easily have left out the //.

Python ElementTree Explanation

ET.tostring(e) is returning e.tail (the text appearing after e) as part of its string representation of e. This can be confusing as most would expect ET.tostring(e) to return strictly some string value of the element of e and not include its text node sibling. But, since e.tail is part of ET's element data structure, I suppose ET's designers felt justified in including e.tail too.

kjhughes
  • 89,675
  • 16
  • 141
  • 199
2

This is not a valid comment but rather a text-node of the <root> element.

<child2></child2> //this is a valid comment

would be seen as

...element-node("child2"), text-node(" //this is a valid comment\n"), element-node("child3")...

What you want is

<child2></child2> <!-- this is a valid comment -->

which would translate to a real XML-comment-node

...element-node("child2"), comment-node(" //this is a valid comment"), element-node("child3")...

(I omitted empty text-nodes for simplicity.)

zx485
  • 24,099
  • 26
  • 45
  • 52
  • No it's not seen as a text node under root. Try it on python: it's seen part of child2 node – navidof Jul 07 '17 at 18:28
  • Try `xmllint --sax a.xml`. Its result shows that the characters occur after the `child2` element. – zx485 Jul 07 '17 at 18:39
  • interesting! So if it is characters not belonging to child2, why does ET.tostring(child2) prints them ? would it be a problem in xml.etree ? – navidof Jul 07 '17 at 18:44
  • @navidof: It's because `ET.tostring(child2)` is returning `child2.tail` as part of its string representation of `child2`. This is admittedly confusing, but it does not mean that that `e.tail` is a child of `child2`. See [**my answer**](https://stackoverflow.com/a/44978161/290085) for further details. – kjhughes Jul 09 '17 at 23:36
0
<!--This is a valid comment-->

You need to put the comment this way. Same way comments are formed in HTML.

james31rock
  • 2,345
  • 2
  • 17
  • 24
  • Well I know that one, I want to know why the other form also works. Couldn't find anything in the xml spec related to that. But I may have looked at the wrong place. – navidof Jul 07 '17 at 18:04
  • No other form should work. There is CDATA, but it's different.https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean. The fact that Python parses it, doesn't mean it's correct as a comment, it is treated as text under the root node, as zx485 points out. – james31rock Jul 07 '17 at 18:07
  • my tests shows the contrary. I think python respects the standard, it wouldn't parse invalid xml. So this for some reason is valid xml – navidof Jul 07 '17 at 18:37
  • it's not invalid XML, it's just not a comment. //this is a valid comment, is not a comment it's text. – james31rock Jul 07 '17 at 22:58