1

I'm using this guide as an example to scrape the time that posts were published to Reddit.

It says to use SelectorGadget tool to bypass learning other languages, so that's what I did.

Although the page on old.reddit.com shows 100 posts (so 100 different times should be recorded), only 25 different time values are actually extracted from my code. Here's what my code looks like:

library(rvest)

url <- 'https://old.reddit.com/'

rawdata <- read_html(url)

rawtime <- html_nodes(rawdata, '.live-timestamp')
  #".live-timestamp" was obtained using the Chrome extension "SelectorGadget" 

finalresult <- bind_rows(lapply(xml_attrs(rawtime), function(x) data.frame(as.list(x), stringsAsFactors=FALSE)))
Dale K
  • 16,372
  • 12
  • 37
  • 62
Metehan
  • 11
  • 1
  • 1
    When I open your old.reddit link I see 25 posts, so I think when the link is opened from R the same thing happens. You should look into multi page scraping https://stackoverflow.com/a/36683564/7118188 – Peter D Jan 31 '19 at 08:14

2 Answers2

0

Alternatively, you could use PRAW to get the information from Reddit. This is a particular solution for your problem but might work.

https://praw.readthedocs.io/en/latest/

And in the subreddit r/redditdev

Carles Borredá
  • 326
  • 2
  • 4
0

You need to be logged in or use the ?limit=100 parameter in order to get 100 items in a listing.

See the API documentation for more information:

limit: the maximum number of items desired (default: 25, maximum: 100)

justcool393
  • 227
  • 1
  • 10