1

I have the below dom structure:

<h3 class="popover-title">
 <div class="popup-title">
   <div class="title-txt">Associated Elements&nbsp;&nbsp(5)</div>
 </div>
</h3>

I am trying to write an xpath which will identify the title "Associated Elements" under h3 tag.

When my xpath is

//div[contains(@class, popover)]//h3[contains(.,'Associated Elements')]

the element is identified.

However when my xpath is

//div[contains(@class, popover)]//h3[contains(text(),'Associated Elements')]

the element is not identified. As per my understanding the dot(.) is a replacement for text(), but then why does it not identify the element when I use the text() function.

However, for another dom structure:

<h3 class="popover-title">
   <a class="btn-popover" href="#">x</a>
   "Associated Elements"
</h3>

The xpath :

//div[contains(@class, popover)]//h3[contains(text(),'Associated Elements')]

&

//div[contains(@class, popover)]//h3[contains(.,'Associated Elements')]

works fine.

Can someone please explain the behaviour of dot(.) under both these scenarios?

Is there a better way to write an xpath that holds good for both the exmaples? Please suggest.

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
  • Both of the `xpath` shouldn't match the first html block. You need to replace `//div` with `//h3` – Guy Sep 26 '19 at 10:49

2 Answers2

2

As is tagged so this answer would be based on and the associated XML Path Language (XPath) Version 1.0 specifications.


contains(string, string)

The function boolean contains(string, string) returns true if the first argument string contains the second argument string, and otherwise returns false. As an example:

//h3[contains(.,'Associated Elements')]

Text Nodes

Character data is grouped into text nodes. As much character data as possible is grouped into each text node. The string-value of a text node is the character data. A text node always has at least one character of data. In the below example, text() selects all text node children of the context node:

//h3[text()='Associated Elements']

In your usecase, within the HTML the text Associated Elements &nbsp(5) have &nbsp; which is alternatively referred to as a fixed space or hard space, NBSP (non-breaking space) used in programming to create a space in a line that cannot be broken by word wrap. Within HTML, &nbsp; allows you to create multiple spaces that are visible on a web page and not only in the source code.


Analyzing your code trials

Your first code trial with:

//h3[contains(.,'Associated Elements')]

locates the element as it successfully identifies with partial text Associated Elements

Your second code trial with:

//h3[contains(text(),'Associated Elements')]

fails as the element contains some more characters e.g. &nbsp; in addition to the text Associated Elements.


Reference

You can find a couple of relevant discussions in:

DebanjanB
  • 118,661
  • 30
  • 168
  • 217
  • ``fails the element contains some more characters in addition to the text Associated Elements`` don't you mean ``fails the element contains some more elements in addition to the text Associated Elements`` ? – Moshe Slavin Sep 26 '19 at 22:15
  • 1
    Thanks @MosheSlavin Updated the verbatim for more brevity. – DebanjanB Sep 26 '19 at 22:20
0

The text() in contains(text(),'Associated Elements') is a selector that matches all of the text nodes that are children of the context node - it returns a node-set. That node-set is converted to string and passed to the contains() function.

text() isn't a function but a node test. It is used to select all text-node children of the context node. So, if the context node is an element named x, then text() selects all text-node children of x.

When you use contains(., 'Associated Elements') only an individual text node is passed to the function and it is able to uniquely match the text.

Note: copied and edited from this and this post.

Mate Mrše
  • 6,305
  • 6
  • 26
  • 53