7

I want to obtain the links to the atms listed on this page: https://coinatmradar.com/city/345/bitcoin-atm-birmingham-uk/

Would I need to do something about the 'load more' button at the bottom of the page?

I have been using the selector tool you can download for chrome to select the CSS path.

I've written the below code block and it only seems to retrieve the first ten links.

library(rvest)

base <- "https://coinatmradar.com/city/345/bitcoin-atm-birmingham-uk/"
base_read <- read_html(base)
atm_urls <- html_nodes(base_read, ".place > a")
all_urls_final <- html_attr(atm_urls, "href" )
print(all_urls_final)

I expected to be able to retrieve all links to the atms listed in the area but my R code has not done so.

Any help would be great. Sorry if this is a really simple question.

gersht
  • 4,777
  • 1
  • 5
  • 14
Jackb001
  • 81
  • 4

2 Answers2

11

You should give RSelenium a try. I'm able to get the links with the following code:

# install.packages("RSelenium")
library(RSelenium)
library(rvest)

# Download binaries, start driver, and get client object.
rd <- rsDriver(browser = "firefox", port = 4444L)
ffd <- rd$client

# Navigate to page.
ffd$navigate("https://coinatmradar.com/city/345/bitcoin-atm-birmingham-uk/")

# Find the load button and assign, then send click event.
load_btn <- ffd$findElement(using = "css selector", ".load-more .btn")
load_btn$clickElement()

# Wait for elements to load.
Sys.sleep(2)

# Get HTML data and parse
html_data <- ffd$getPageSource()[[1]]
html_data %>% 
    read_html() %>% 
    html_nodes(".place a:not(.operator-link)") %>% 
    html_attr("href")

#### OUTPUT ####

#  [1] "/bitcoin_atm/5969/bitcoin-atm-shitcoins-club-birmingham-uk-bitcoin-embassy/"                   
#  [2] "/bitcoin_atm/7105/bitcoin-atm-general-bytes-northampton-costcutter/"                           
#  [3] "/bitcoin_atm/4759/bitcoin-atm-general-bytes-birmingham-uk-costcutter/"                         
#  [4] "/bitcoin_atm/2533/bitcoin-atm-general-bytes-birmingham-uk-londis-# convenience/"                 
#  [5] "/bitcoin_atm/5458/bitcoin-atm-general-bytes-coventry-agg-african-restaurant/"                  
#  [6] "/bitcoin_atm/711/bitcoin-atm-general-bytes-coventry-bigs-barbers/"                             
#  [7] "/bitcoin_atm/5830/bitcoin-atm-general-bytes-telford-bpred-lion-service-station/"               
#  [8] "/bitcoin_atm/5466/bitcoin-atm-general-bytes-nottingham-24-express-off-licence/"                
#  [9] "/bitcoin_atm/4615/bitcoin-atm-general-bytes-northampton-costcutter/"                           
# [10] "/bitcoin_atm/4841/bitcoin-atm-lamassu-worcester-computer-house/"                               
# [11] "/bitcoin_atm/3150/bitcoin-atm-bitxatm-leicester-keshs-wines-and-newsagents-braustone/"         
# [12] "/bitcoin_atm/2948/bitcoin-atm-bitxatm-coventry-nisa-local/"                                    
# [13] "/bitcoin_atm/4742/bitcoin-atm-bitxatm-birmingham-uk-custcutter-coventry-road-hay-mills/"       
# [14] "/bitcoin_atm/4741/bitcoin-atm-bitxatm-derby-michaels-drink-store-alvaston/"                    
# [15] "/bitcoin_atm/4740/bitcoin-atm-bitxatm-birmingham-uk-nisa-local-crabtree-# hockley/"              
# [16] "/bitcoin_atm/4739/bitcoin-atm-bitxatm-birmingham-uk-nisa-local-subway-boldmere/"               
# [17] "/bitcoin_atm/4738/bitcoin-atm-bitxatm-birmingham-uk-ashtree-convenience-store/"                
# [18] "/bitcoin_atm/4737/bitcoin-atm-bitxatm-birmingham-uk-nisa-local-finnemore-road-bordesley-green/"
# [19] "/bitcoin_atm/3160/bitcoin-atm-bitxatm-birmingham-uk-costcutter/" 
gersht
  • 4,777
  • 1
  • 5
  • 14
3

When you click show more the page does an XHR POST request for more results using an offset of 10 (suggesting results come in batches of 10) from current set. You can mimic this so long as you have the followings params in the post body (I suspect only the bottom 3 are essential)

'direction' : 1
'sort' : 1
'offset' : 10
'pagetype' : 'city'
'pageid' : 345

And the following request header is required (at least in Python implementations)

'X-Requested-With' : 'XMLHttpRequest'

You send that correctly and you will get a response containing the additional content. Note: content is wrapped in ![CDATA[]] as instruction that content should not be interpreted as xml - you will need to account for that by extracting content within for parsing.

The total number of atms is returned from original page you have and with css selector

.atm-number

You can split on &nbsp; and take the upper bound value from the split and convert to int. You then can calculate each offset required to meet that total (being used in a loop as consecutive offset param until total achieved) e.g. 19 results will be 2 requests total with 1 request at offset 10 for additional content.

QHarr
  • 72,711
  • 10
  • 44
  • 81