141

Given this simplified data format:

<a>
    <b>
        <c>C1</c>
        <d>D1</d>
        <e>E1</e>
        <f>don't select this one</f>
    </b>
    <b>
        <c>C2</c>
        <d>D2</d>
        <e>E1</e>
        <g>don't select me</g>
    </b>
    <c>not this one</c>
    <d>nor this one</d>
    <e>definitely not this one</e>
</a>

How would you select all the Cs, Ds and Es that are children of B elements?

Basically, something like:

a/b/(c|d|e)

In my own situation, instead of just a/b/, the query leading up to selecting those C, D, E nodes is actually quite complex so I'd like to avoid doing this:

a/b/c|a/b/d|a/b/e

Is this possible?

Keavon
  • 5,460
  • 7
  • 43
  • 72
nickf
  • 499,078
  • 194
  • 614
  • 709

4 Answers4

224

One correct answer is:

/a/b/*[self::c or self::d or self::e]

Do note that this

a/b/*[local-name()='c' or local-name()='d' or local-name()='e']

is both too-long and incorrect. This XPath expression will select nodes like:

OhMy:c

NotWanted:d 

QuiteDifferent:e
Dimitre Novatchev
  • 230,371
  • 26
  • 281
  • 409
  • 2
    'or' does not work on a for-each, you would need to use a vertical line instead '|' – Guasqueño Nov 26 '15 at 17:08
  • 8
    @Guasqueño, `or` is a logical operator -- it operates on two Boolean values. The XPath **union** operator `|` operates on two sets of nodes. These are quite different and there are specific use cases for each of them. Using `|` **can** solve the original problem, but it results in a longer and more complex and challenging to understand XPath expression. The simpler expression in this answer, which uses the `or` operator produces the wanted node-set and *can* be specified in the "select" attribute of an `` XSLT operation. Just try it. – Dimitre Novatchev Nov 26 '15 at 17:22
  • the local-name() method works for me because I don't care about namespace :) It might be better to update your answer to say that this option is only incorrect if you care about namespaces. :) – Jonathan Benn Aug 01 '18 at 15:58
  • 4
    @JonathanBenn , Anyone who "doesn't care about namespaces" actually doesn't care about XML, and doesn't use XML. The use of `local-name()` is only correct if we want to select all elements with that local name, regardless of the namespace the element is in. This is a very rare case -- in general people do care about the differences between: `kitchen:table` and `sql:table`, or between `architecture:column`, `sql:column`, `array:column`, `military:column` – Dimitre Novatchev Aug 01 '18 at 16:14
  • 3
    @DimitreNovatchev you make a good point. I'm using XPath for HTML inspection, which is an edge case where the namespace is not so important... – Jonathan Benn Aug 01 '18 at 17:07
  • 2
    That is super. Where did you come up with that? – Keith Tyler Jan 10 '19 at 02:42
  • @KeithTyler, Yes, XPath is a beutiful language! XPath 2.0 and XPath 3 (3.0 and 3.1) -- even more – Dimitre Novatchev Jan 10 '19 at 03:31
47

You can avoid the repetition with an attribute test instead:

a/b/*[local-name()='c' or local-name()='d' or local-name()='e']

Contrary to Dimitre's antagonistic opinion, the above is not incorrect in a vacuum where the OP has not specified the interaction with namespaces. The self:: axis is namespace restrictive, local-name() is not. If the OP's intention is to capture c|d|e regardless of namespace (which I'd suggest is even a likely scenario given the OR nature of the problem) then it is "another answer that still has some positive votes" which is incorrect.

You can't be definitive without definition, though I'm quite happy to delete my answer as genuinely incorrect if the OP clarifies his question such that I am incorrect.

the Tin Man
  • 150,910
  • 39
  • 198
  • 279
annakata
  • 70,224
  • 16
  • 111
  • 179
  • 3
    Speaking as a 3rd party here -- personally, I find Dimitre's suggestion to be the better practice except in cases where the user has explicit (and good) reason to care about tag name irrelevant of namespace; if anyone did this against a document which I was mixing in differently-namespaced content (presumably intended to be read by a different toolchain), I would consider their behavior very inappropriate. That said, the argument is -- as you suggest -- a bit unbecoming. – Charles Duffy Oct 17 '10 at 20:43
  • 4
    exactly what I was looking for. XML namespaces the way they are used in real life are a unholy mess. For a lack of being able to specify something like /a/b/(*:c|*:d|*e) your solution is exactly what is needed. Purists can argue all they want but users don't care that the app breaks because whatever generated their input file screwed up the namespaces. They just want it to work. – Ghostrider May 26 '12 at 05:02
  • 7
    I have only the vaguest idea what the difference would be between these two answers and nobody has bothered to explain. What does "namespace restrictive" mean? If I use `local-name()`, does that mean it would match tags with any namespace? If I use `self::`, what namespace would it have to match? How would I match only `OhMy:c`? – meustrus Jan 09 '14 at 20:11
16

Why not a/b/(c|d|e)? I just tried with Saxon XML library (wrapped up nicely with some Clojure goodness), and it seems to work. abc.xml is the doc described by OP.

(require '[saxon :as xml])
(def abc-doc (xml/compile-xml (slurp "abc.xml")))
(xml/query "a/b/(c|d|e)" abc-doc)
=> (#<XdmNode <c>C1</c>>
    #<XdmNode <d>D1</d>>
    #<XdmNode <e>E1</e>>
    #<XdmNode <c>C2</c>>
    #<XdmNode <d>D2</d>>
    #<XdmNode <e>E1</e>>)
Pavel Repin
  • 29,871
  • 1
  • 32
  • 39
-1

Not sure if this helps, but with XSL, I'd do something like:

<xsl:for-each select="a/b">
    <xsl:value-of select="c"/>
    <xsl:value-of select="d"/>
    <xsl:value-of select="e"/>
</xsl:for-each>

and won't this XPath select all children of B nodes:

a/b/*
Calvin
  • 4,257
  • 1
  • 22
  • 21
  • Thanks Calvin, but I'm not using XSL, and there are actually more elements underneath B which I don't want to select. I'll update my example to be clearer. – nickf Apr 06 '09 at 15:43
  • Oh, well in that case annakata seems to have the solution. – Calvin Apr 06 '09 at 15:51