0

I am trying to build a data frame by extracting outage data from an XML file and associating each outage with a particular meter. A simplified example of the data is laid out below:

  <MeterReadings Irn="311" Source="Remote">
       <Meter MeterIrn="311" IsActive="true" /> 
       <ConsumptionData>
       </ConsumptionData>
       <IntervalData>
           <Reading TimeStamp="2016-10-13" />
       </IntervalData>
       <EventData>
           <EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
           <Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
           </Event>
       </EventData>
  </MeterReadings>

What I want is to setup a data frame that will include the meter number in the first column and the time of each outage in the second column.

I have tried using the following expressions:

    outage.inv <- data.frame(xpathSApply(doc, '//Event[contains(@EventInfo, "Outage detected from Interval Data")]/ancestor::MeterReadings', xmlGetAttr, "Irn"))
    outage.df <- data.frame(xpathSApply(doc, '//MeterReadings/EventData/EventSpec[@Type="Outage Detected from Interval Data"]/following-sibling::Event', xmlGetAttr, "TimeStamp"))
    outage.inv <- cbind(outage.inv, outage.df)

But the first expression only pulls the meter number once, so the total number of variables does not match. In this case 1 meter number and 5 outage times. Is there a way to have the ancestor attribute pulled for each occurrence of the attribute with a descendant?

I have checked the following answers, but have not been able to figure it out.

XPath to select element based on childs child value

R: How to get parent attributes and node values at the site time?

Any help will be much appreciated.

Community
  • 1
  • 1

2 Answers2

0

An alternate approach.

Here's the data:

txt <- '  <MeterReadings Irn="311" Source="Remote">
       <Meter MeterIrn="311" IsActive="true" />
       <ConsumptionData>
       </ConsumptionData>
       <IntervalData>
           <Reading TimeStamp="2016-10-13" />
       </IntervalData>
       <EventData>
           <EventSpec Type="Outage Detected from Interval Data" Category="Full Power Outage / Restoration" />
           <Event TimeStamp="2014-10-31 14:17:40" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:16:20" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:16:16" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:15:12" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data.">
           </Event>
           <Event TimeStamp="2014-10-31 14:12:00" DiscoveredAt="2014-11-01 12:05:28" Source="Event Log" EventInfo="Outage detected from Interval Data">
           </Event>
       </EventData>
  </MeterReadings>'

We can work on the records in a different way:

library(xml2)
library(purrr)
library(dplyr)

doc <- read_xml(txt)

xml_find_all(doc, "//MeterReadings") %>%
  map_df(function(x) {
    meter <- xml_attr(x, "Irn")
    xml_find_all(x, "//Event[contains(@EventInfo, 'Outage')]") %>% 
      map_df(function(y) {
        data_frame(
          meter=meter,
          timestamp=xml_attr(y, "TimeStamp"),
          discovered_at=xml_attr(y, "DiscoveredAt")
        )
      })
  })

which generates:

## # A tibble: 5 × 3
##   meter           timestamp       discovered_at
##   <chr>               <chr>               <chr>
## 1   311 2014-10-31 14:17:40 2014-11-01 12:05:28
## 2   311 2014-10-31 14:16:20 2014-11-01 12:05:28
## 3   311 2014-10-31 14:16:16 2014-11-01 12:05:28
## 4   311 2014-10-31 14:15:12 2014-11-01 12:05:28
## 5   311 2014-10-31 14:12:00 2014-11-01 12:05:28
hrbrmstr
  • 71,487
  • 11
  • 119
  • 180
  • Awesome! I made a few tweaks as it was pulling all the meters and including all timestamps for each. Modified solutions is as follows: – Bill H. Oct 14 '16 at 13:22
0

Modified answer to filter the meters and timestamps so that it did not repeat all timestamps for all meters:

outage.df <- xml_find_all(doc, "//MeterReadings[EventData/Event[contains(@EventInfo, 'Outage')]]") %>%
map_df(function(x) {
  meter <- xml_attr(x, "Irn")
  xml_find_all(x, paste("//MeterReadings[@Irn=",meter,"]/EventData/Event[contains(@EventInfo, 'Outage')]")) %>% 
    map_df(function(y) {
      data_frame(
        meter=meter,
        timestamp=xml_attr(y, "TimeStamp"),
        discovered_at=xml_attr(y, "DiscoveredAt")
      )
    })
})