0

I am very new to R and have managed to scrape some data from a website for a project that is due. I now need to reformat the data into a dataframe but can't seem to figure out how to do this. Here is what I have:

library('rvest')
library('xml2')
url<- 'https://usedfirst.com/cars/ford/escape/'
webpage<- read_html(url)
resale_price_html<-html_nodes(webpage,'#DepreciationTableSmall')
resale_price<-html_text(resale_price_html)
resale_price<-gsub("\n\t","", resale_price)
resale_price<-gsub("\t","", resale_price)
head(resale_price)

[1] "YearPrice% Paid% LeftNUV ®2019$27,075100%100%02018$19,27272.9%91.67%18.772017$18,29567.17%83.33%16.162016$16,90961.02%75%13.982015$14,65752.82%66.67%13.852014$13,17647.41%58.33%10.922013$11,33138.61%50%11.392012$9,24929.26%41.67%12.412011$8,33225.59%33.33%7.742010$7,47022.01%25%2.992009$6,73819.48%16.67%-2.812008$6,05217.76%8.33%-9.432007$5,17012.87%0%-12.87"

The data should look like a table with 5 columns:

Year Price    %Paid   %Left   NUV
2019 $27,075   100%   100%   
2018 $19,272   72.9%  91.67%  18.77

and so on...

Can anyone please help me out with this??

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
Lisa G
  • 3
  • 1

1 Answers1

1

It is simpler to use html_table in this case to get data in a dataframe.

library(rvest)

url<- 'https://usedfirst.com/cars/ford/escape/'
temp <- url %>% read_html %>% html_nodes('#DepreciationTableSmall') %>% html_table()
temp[[1]]

#   Year   Price % Paid % Left  NUV ®
#1  2019 $27,075   100%   100%   0.00
#2  2018 $19,272  72.9% 91.67%  18.77
#3  2017 $18,295 67.17% 83.33%  16.16
#4  2016 $16,909 61.02%    75%  13.98
#5  2015 $14,657 52.82% 66.67%  13.85
#6  2014 $13,176 47.41% 58.33%  10.92
#7  2013 $11,331 38.61%    50%  11.39
#8  2012  $9,249 29.26% 41.67%  12.41
#9  2011  $8,332 25.59% 33.33%   7.74
#10 2010  $7,470 22.01%    25%   2.99
#11 2009  $6,738 19.48% 16.67%  -2.81
#12 2008  $6,052 17.76%  8.33%  -9.43
#13 2007  $5,170 12.87%     0% -12.87

There is also another table on the page, if you want to extract that it is better to do

temp <- url %>% read_html %>% html_table()

and then use temp[[1]] to get first table and temp[[2]] to get second one.

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
  • Sorry - one more thing. When I try to export this to excel I have been using the write.xlsx command but that doesn't seem to be working in this case. Any ideas? – Lisa G Nov 28 '19 at 01:52
  • @LisaG What do you mean by not working ? Does it give you any error or writes wrong data into excel. Can you try using `writexl::write_xlsx(temp[[1]], "data.xlsx")` ? – Ronak Shah Nov 28 '19 at 01:59
  • Never mind - I can now see that you named the dataframe temp so I was able to do it. I was still using resale_price from my code in the write.xlsx command. It looks great! Thanks again! – Lisa G Nov 28 '19 at 01:59