0

I am trying to convert an xml document to a flat dataframe using xml2.

Here's some sample code, with the schema section removed. I'm trying to extract all of the "Events" nodes:

library(xml2)

test_xml <- as_xml_document(
'<Root>
  <xs:schema xmlns="address.com" xmlns:mstns="address.com" id="id">
  </xs:schema>
    <NewDataSet xmlns="address.com">
      <Events>
        <VAR1>3119496</VAR1>
        <VAR2>3119496</VAR2>
        <VAR3>text</VAR3>
      </Events>
      <Events>
        <VAR1>3119496</VAR1>
        <VAR2>3119496</VAR2>
        <VAR3>text</VAR3>
      </Events>
    </NewDataSet>
</Root>'
)

And here's a picture of my RStudio when I use read_xml("file_path") %>% View():

Based on this I would expect something like the following to work...

xml_df <- test_xml %>%
    xml_child(2) %>%
    xml_find_all("//Events") %>% 
    map_df(~ { xml_attrs(.x) %>% as.list() } ) 

...but it doesn't. My guess is that the problem is with my xpath in xml_find_all, but I'm not sure. Any help would be really appreciated!

EDIT: Given that the first answer did not work (before I added in the namespaces) I am guessing that the namespaces in the new example are causing an issue.

2 Answers2

1

Perhaps you are looking for something like this:

test_xml %>% 
  xml_find_all(xpath = "//Events") %>% 
  as_list() %>% 
  lapply(function(x) as.data.frame(t(unlist(x)))) %>% 
  {do.call(rbind, .)}

#>      VAR1    VAR2
#> 1 3119496 3119496
#> 2 3119496 3119496
Allan Cameron
  • 56,042
  • 3
  • 16
  • 39
  • Thanks - this does work for the sample data I provided, but doesn't work for my actual data. My reprex must have not captured my actual data well enough. I've edited my original post to hopefully correct that. – Chad Peltier Nov 20 '20 at 21:17
0

For anyone who comes across this question in the future, the problem above was because of the namespaces in the xml.

Combining Allan's answer above with the response here, I just needed to either use xml_ns_strip() or the code below to turn my xml into a dataframe.

xml_df <- test_xml %>%
  xml_find_all(xpath = "//d1:Events", xml_ns(.)) %>%
  as_list() %>%
  map_df(~ unlist(.))