How to scrape a table with rvest and xpath?

Question

using the following documentation i have been trying to scrape a series of tables from marketwatch.com

here is the one represented by the code bellow:

The link and xpath are already included in the code:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
  html() %>%
  html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
  html_table()
valuation <- valuation[[1]]

I get the following error:

Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated")

Thanks in advance.

that's not an error, it's a warning. your code will still run with that warning. — SymbolixAU, Mar 01 '16 at 00:16

SymbolixAU · Accepted Answer · 2021-04-26T03:59:54.740

13

That website doesn't use an html table, so html_table() can't find anything. It actaully uses div classes column and data lastcolumn.

So you can do something like

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="column"]')
    
valuation_data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="data lastcolumn"]')

Or even

url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="section"]')

To get you most of the way there.

Please also read their terms of use - particularly 3.4.

edited Apr 26 '21 at 03:59

answered Mar 01 '16 at 00:30

SymbolixAU

22,021
4
47
120

how you find the xpath (there is tool to find it, can you add it to the answer) – userJT Mar 08 '19 at 15:40
1

Right-click on the element and select 'inspect'. Then just read the html – SymbolixAU Mar 08 '19 at 19:04

How to scrape a table with rvest and xpath?

1 Answers1

Linked