1

It seems that contains(text(), 'TARGET_STRING') works fine with html_nodes as well as the "or operator": "|", but it does not work together.

Reproducible example:

html <- "<a>a</a><p>abc</p>"

xp <- "//*[self::a|self::b]" # or operator works
xp2 <- "//*[contains(text(),'abc')]" # contains text works

# but it doesnt work together
xp3 <- "//*[self::a|contains(text(),'abc')]" 

html_nodes(x = read_html(html), xpath = xp)
html_nodes(x = read_html(html), xpath = xp2)

# this one fails
html_nodes(x = read_html(html), xpath = xp3)
Tlatwork
  • 1,223
  • 5
  • 26

1 Answers1

1

Simply replace | with or to make it work:

//*[self::a or contains(text(),'abc')]
DonnyFlaw
  • 373
  • 8
  • Oh wow could have tried that one. Do you have an idea why the "|" Operator does not work here? - P.S: I can accept in three minutes. – Tlatwork Jan 19 '21 at 15:10
  • 1
    @Tlatwork , Note that `|` is not actually **OR** operator, but [**union set operator**](http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc30020_1251/html/xmlb/xmlb32.htm) – DonnyFlaw Jan 19 '21 at 15:12
  • hah ok, if you read carefully https://stackoverflow.com/questions/5350666/xpath-or-operator-for-different-nodes (my source) it is also actually mentioned in the comments. Thank you! – Tlatwork Jan 19 '21 at 15:14