3

Suppose that I wanted to prune a tree consisting of a hierarchy of nested lists in R, based on some particular criterion. I can do this "easily" enough using lapply:

# Based an example from the NetworkD3 documentation
# https://christophergandrud.github.io/networkD3/

URL <- paste0(
  "https://cdn.rawgit.com/christophergandrud/networkD3/",
  "master/JSONdata//flare.json")

flare <- jsonlite::fromJSON(URL, simplifyDataFrame = FALSE)

# Leaf nodes have a "size" attribute. Let's say we want to 
# prune all the nodes with size < 5000.

prune <- function(tree) {
  if ("children" %in% names(tree)) {
    p <- lapply(tree$children, prune)
    pp <- p[!unlist(lapply(p, is.null))]
    copied_tree = list()
    copied_tree$name = tree$name
    copied_tree$children = pp
    return(copied_tree)
  } else if (tree$size < 5000) {
    return(NULL)
  }
  return(tree)
}

pruned <- prune(flare)

In R for Data Science, Hadley Wickham discusses a number of scenarios in which purrr can replace the apply family of functions for handling hierarchical data. However, these examples seem to deal either with singly nested lists, or with specific nodes of deeply nested lists.

Is there a way to use purrr to accomplish recursive tasks such as the one discussed above?

David Bruce Borenstein
  • 1,323
  • 1
  • 13
  • 31
  • Maybe this previous attempt of mine is relevant? http://stackoverflow.com/a/39869503/6197649 – Aurèle Jan 10 '17 at 17:07
  • The issue is that I want to preserve the tree structure, except for pruning. I thought of creating a delimited node path (like xpath), then flattening, and finally reconstructing the hierarchy, but this seemed harder and more inelegant than just using lapply. – David Bruce Borenstein Jan 10 '17 at 17:09
  • 1
    `purrr::map` is a drop-in replacement for `lapply` (with some extras), but it won't really change what you're doing here. You might check out `rapply`, which _is_ recursive, but which can be a little finicky to get to work right. – alistaire Jan 10 '17 at 21:52
  • @alistaire is right, and my answer does address the `purrr` part of the question, not the 'recursive' part indeed. I think this is by design that `purrr` lacks such a recursive feature, because of "safety" concerns – Aurèle Jan 10 '17 at 23:41
  • See https://blog.rstudio.org/2016/01/06/purrr-0-2-0/ where Hadley says: "Base R has unlist(), but it’s dangerous because it always succeeds". I think it applies to `rapply()` as well. I'll see if I can think of a `rapply()` solution... – Aurèle Jan 10 '17 at 23:45

1 Answers1

4
library(purrr)
prune_2 <- function(tree) {
  # print(tree$name)
  # print(map_lgl(tree$children, ~ "size" %in% names(.x)))
  tree$children %<>%  
    map_if(~ "children" %in% names(.x), prune_2) %>% 
    discard(~ if ("size" %in% names(.x)) .x$size < 5000 else FALSE)
  tree
}
pruned_2 <- prune_2(flare)
identical(pruned, pruned_2)
# [1] TRUE
Aurèle
  • 10,219
  • 1
  • 26
  • 43
  • This is very elegant (and impressive)! Can you tell me what the tilde does here? I'm fuzzy on how this symbol is used--I know it's used in statistical models and ggplot facets, but I don't know how it's interpreted by R. – David Bruce Borenstein Jan 11 '17 at 00:13
  • 1
    Thank you. The tilde is for formulas that are a versatile syntax in R. Here they are used as a shortcut for anonymous functions, with `.x` and `.y` as implicit arguments. See `purrr` README or `help(map)` for instance: `~ .x + 1` is equivalent to `function(.x) .x + 1` – Aurèle Jan 11 '17 at 00:43