Questions tagged [xml2]

xml2 is an R package which makes it easy to work with HTML and XML from R.

xml2 is an R package which makes it easy to work with HTML and XML from R. It is part of the tidyverse.

It leverages the C library libxml2 and the API is somewhat inspired by jQuery.

For more info, checkout the GitHub repo for the package.

231 questions
9
votes
4 answers

Python/R: generate dataframe from XML when not all nodes contain all variables?

Consider the following XML example library(xml2) myxml <- read_xml(' John tennis golf python Robert
ℕʘʘḆḽḘ
  • 15,284
  • 28
  • 88
  • 180
8
votes
1 answer

Parsing XML in R: Incorrect namespaces

I have a bunch of XML files and an R script that reads their content into a data frame. However, I got now files which I wanted to parse as usual, but there is something in their namespace definition that doesn't allow me to pick their values…
nikopartanen
  • 537
  • 6
  • 13
7
votes
4 answers

How to save and read output of read_html as an RDS file?

Objects can be saved and read like so # Save as file saveRDS(iris, "mydata.RDS") # Read back in readRDS("mydata.RDS") But this doesn't seem to work for objects made with xml2::read_html() Example library(rvest) someobject <-…
stevec
  • 15,490
  • 6
  • 67
  • 110
7
votes
2 answers

problems reading big XML file with xml2 package and trying to create a working closure

I am using the xml2 package to read a huge XML file into memory and the command fails with the following error: Error: Char 0x0 out of allowed range [9] My code looks like the following: library(xml2) doc <- read_xml('~/Downloads/FBrf.xml') The…
drmariod
  • 9,470
  • 8
  • 48
  • 96
6
votes
4 answers

Can R read html-encoded emoji characters?

Question My question, explained below, is: How can R be used to read a string that includes HTML emoji codes like ��? I'd like to: (1) represent the emoji symbol (e.g., as a unicode symbol: ) in the parsed string, OR(2) convert it…
J L
  • 358
  • 1
  • 15
6
votes
4 answers

R rvest: could not find function "xpath_element"

I am trying to simply replicate the example of rvest::html_nodes(), yet encounter an error: library(rvest) ateam <- read_html("http://www.boxofficemojo.com/movies/?id=ateam.htm") html_nodes(ateam, "center") Error in do.call(method,…
Matifou
  • 5,399
  • 1
  • 32
  • 41
5
votes
1 answer

xml_find_all function from xml2 package (R) does not find relevant nodes

I am using the xml2 package in R to access xml data, and found that it behaves different on different xml_documents. On this pet example library(xml2) doc <- read_xml( "
dertomtom
  • 109
  • 1
  • 5
5
votes
1 answer

How to configure the curl package in R with default web proxy settings?

I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication. I already have code that will configure the RCurl…
djb72
  • 51
  • 3
5
votes
1 answer

R {xml_node} to plain text while preserving the tags?

I'd like to do exactly what xml2::xml_text() or rvest::html_text() do but preserve the tags instead of replacing e.g.
with \n. The objective is to e.g. scrape a web page, extract the nodes I want, and store the plain HTML in a variable, much…
Harold Cavendish
  • 759
  • 6
  • 21
5
votes
1 answer

R and xml2: how to read text that is not in children nodes and read information even if node is missing

I use R and it's package xml2 to parse an html document. I extracted a piece of html file, which looks like this: text <- ('

1First previous

GegznaV
  • 3,168
  • 1
  • 16
  • 37
4
votes
1 answer

In R, use rvest and xml2 to extract JSON object from a