Reddit only returning 25 posts instead of 100 when using rvest

Question

I'm using this guide as an example to scrape the time that posts were published to Reddit.

It says to use SelectorGadget tool to bypass learning other languages, so that's what I did.

Although the page on old.reddit.com shows 100 posts (so 100 different times should be recorded), only 25 different time values are actually extracted from my code. Here's what my code looks like:

library(rvest)

url <- 'https://old.reddit.com/'

rawdata <- read_html(url)

rawtime <- html_nodes(rawdata, '.live-timestamp')
  #".live-timestamp" was obtained using the Chrome extension "SelectorGadget" 

finalresult <- bind_rows(lapply(xml_attrs(rawtime), function(x) data.frame(as.list(x), stringsAsFactors=FALSE)))

When I open your old.reddit link I see 25 posts, so I think when the link is opened from R the same thing happens. You should look into multi page scraping https://stackoverflow.com/a/36683564/7118188 — Peter D, Jan 31 '19 at 08:14

score 0 · Answer 1 · answered Jan 31 '19 at 10:27

0

Alternatively, you could use PRAW to get the information from Reddit. This is a particular solution for your problem but might work.

https://praw.readthedocs.io/en/latest/

And in the subreddit r/redditdev

answered Jan 31 '19 at 10:27

Carles Borredá

326
2
4

score 0 · Answer 2 · answered Jun 05 '19 at 03:30

0

You need to be logged in or use the ?limit=100 parameter in order to get 100 items in a listing.

See the API documentation for more information:

limit: the maximum number of items desired (default: 25, maximum: 100)

answered Jun 05 '19 at 03:30

justcool393

227
1
10

Reddit only returning 25 posts instead of 100 when using rvest

2 Answers2