My goal is to get links to all challenges of Kaggle with their title. I am using the library rvest for it but I do not seem to come far. The nodes are empty when I am a few divs in.
I am trying to do it for the first challenge at first and should be able to transfer that to every entry afterwards. The xpath of the first entry is:
/html/body/div[1]/div[2]/div/div/div[2]/div/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/a
My idea was to get the link via html_attr( , "href")
once I am in the right tag.
My idea is:
library(rvest)
url = "https://www.kaggle.com/competitions"
kaggle_html = read_html(url)
kaggle_text = html_text(kaggle_html)
kaggle_node <- html_nodes(kaggle_html, xpath = "/html/body/div[1]/div[2]/div/div/div[2]/div/div/div[2]/div[2]/div/div/div[2]/div/div/div[1]/a")
html_attr(kaggle_node, "href")
I cant go past a certain div. The following snippet shows the last node I can access
node <- html_nodes(kaggle_html, xpath="/html/body/div[1]/div[2]/div")
html_attrs(node)
Once I go one step further with html_nodes(kaggle_html,xpath="/html/body/div[1]/div[2]/div/div")
, the node will be empty.
I think the issue is that kaggle uses a smart list that expands the further I scroll down.
(I am aware that I can use %>%
. I am saving every step so that I am able to access and view them more easily to be able to learn how it properly works.)