1

I am trying to extract mixed mode content using Capybara. I did it using Nokogiri, but wonder why similar is not possible with Capybara.

require 'nokogiri'

doc = Nokogiri::HTML("<h1><em>Name</em>A Johnson </h1>")
puts doc.at_xpath("//h1/text()").content

It works, but when I try same XPath selector in Capybara it doesn't work.

visit('http://stackoverflow.com')
puts find(:xpath, "//h1/text()").text

It raises error:

[remote server] file:///tmp/webdriver-profile20120915-8089-kxrvho/extensions/fxdriver@googlecode.com/components/driver_component.js:6582:in `unknown': The given selector //h1/text() is either invalid or does not result in a WebElement. The following error occurred: (Selenium::WebDriver::Error::InvalidSelectorError)
[InvalidSelectorError] The result of the xpath expression "//h1/text()" is: [object Text]. It should be an element.

How to extract this text?

Community
  • 1
  • 1
Иван Бишевац
  • 12,001
  • 20
  • 60
  • 88
  • What the error message says is that the `.text` property is defined only for element objects -- not for text objects. I'd try something like: `find(:xpath, "//h1").text` – Dimitre Novatchev Sep 15 '12 at 15:04
  • I understand that it doesn't return element. But your solution `find(:xpath, "//h1").text` will not work good for `

    NameA Johnson

    `, it will extract `NameA Jonson`, but I want just `A Jonson'. Using nokogiri in example above `doc.at_xpath("//h1/text()").content` works. Maybe I should first extract element with Capybara's find and after that pass it to nokogiri element, so I could write something like: `hr = find("/hr"); nokogiri::Element(hr).at_xpath("./text()").content` but I am not sure how.
    – Иван Бишевац Sep 15 '12 at 15:16
  • If you have proper XPath implementation, what you need is: `//h1/text()`. So, it is logical to try to evaluate this XPath expression with the compliant implementation (Nokogiri). – Dimitre Novatchev Sep 15 '12 at 15:20
  • I don't know how to do this. Can you give me idea how to do this. – Иван Бишевац Sep 15 '12 at 15:26
  • Nope, I don't know anything about Ruby and these tools. – Dimitre Novatchev Sep 15 '12 at 15:32
  • Been trying to find something that works but this seems like a pretty tough problem. This might be relevant: http://stackoverflow.com/questions/4071937/how-do-i-get-the-html-in-an-element-using-capybara But I couldn't find a way to do what you want. – Chris Salzberg Sep 15 '12 at 15:43

2 Answers2

2

Capybara requires a driver, and the XPath will be executed by the driver. From your error message, it is clear you are using selenium-webdriver, which will use a browser's native XPath implementation where available. For IE, it usees its own.

You appear to be using a combination where the XPath implementation is not fully compliant. You can try to change the driver or browser, but if you really want to use Nokogiri to extract content, you should be able to do the following:

doc = Nokogiri::HTML(page.html)
puts doc.at_xpath("//h1/text()").content
Mark Thomas
  • 35,360
  • 9
  • 68
  • 99
1

I do not believe Capybara or Selenium-Webdriver have any support for directly accessing text nodes. However, if you do not want to use nokogiri, you can use selenium-webdriver to execute javascript.

You can do this (in Capybara using Selenium-Webdriver):

element = page.find('h1').native
puts page.driver.browser.execute_script("return arguments[0].childNodes[1].textContent", element)
#=> A Johnson 
Justin Ko
  • 44,820
  • 5
  • 82
  • 95