0

I am trying to tabulate a decision tree using Rpart. The code I am using is below so it can be copy pasted.

ss <- 100
set.seed(123)
x1 <- relevel(as.factor(sample(1:4,ss, replace=TRUE)), ref="4")
x11 <- ifelse(x1==1,1,0)
x12 <- ifelse(x1==2,1,0)
x13 <- ifelse(x1==3,1,0)
x2 <- relevel(as.factor(sample(1:3,ss, replace=TRUE)), ref="3")
x21 <- ifelse(x2==1,1,0)
x22 <- ifelse(x2==2,1,0)
x3 <- relevel(as.factor(sample(1:2,ss, replace=TRUE)), ref="2")
x31<- ifelse(x3==1,1,0)
y <- relevel(as.factor(sample(1:2,ss, replace=TRUE)), ref="2")
y1 <- ifelse(y==1,1,0)

n1  <- relevel(as.factor(sample(1:4,ss, replace=TRUE)), ref="4")
n11 <- ifelse(n1==1,1,0)
n12 <- ifelse(n1==2,1,0)
n13 <- ifelse(n1==3,1,0)
n2 <- relevel(as.factor(sample(1:3,ss, replace=TRUE)), ref="3")
n21  <- ifelse(n2==1,1,0)
n22 <- ifelse(n2==2,1,0)
n3 <- relevel(as.factor(sample(1:2,ss, replace=TRUE)), ref="2")
n31<- ifelse(n3==1,1,0)

xbeta <- -0.667-0.167*x11 + 0.167*x12 + 0.333*x13 + x21 -1.333*x22+ x31 + 0.667*y1 +0*n11+0*n12+0*n13+ 0*n21 + 0*n22 + 0*n31 - 1.333*y1*x21+ y1*x22 -1.333*y1*x31
p <- exp(xbeta)/(1+exp(xbeta))
R<- rbinom(ss,1,p)

fit <- rpart(R ~ x1+x2+x3+n1+n2+n3+y, method="class")

And then to look at the plotted tree, I am using

plot(fit, uniform=TRUE, main="Classification Tree")
text(fit, use.n=TRUE, all=TRUE, cex=.8)

Also, in my code, all of this is in a for loop since I am simulating a 100 such datasets. Did not include all that here for simplicity.

When you type in printcp(fit), I know how to extract "variables actually used in tree construction" and tabulate them, so that I get counts for the number of times each variable was selected. Now, the issue is, I want to capture potential interactions between x2 and y as well as x3 and y and of course, tabulate the number of times these interactions appear. Now, to that end, essentially, when one looks at the diagram of the tree (using plot(fit)), everytime y is an IMMEDIATE sub-branch of either x2 or x3, I want to somehow create a vector that keeps track of that. I say immediate sub-branch because if hypothetically, x2 is subdivided into n3 and then n3 branches into y, then no, I would not count that as a two-way interaction of x2 and y. However, if x2 branches into y, then yes, I want to count that as a 2-way interaction between x2 and y.

I tried using path.rpart for this but it seems to not help in keeping track of if either x2 or x3 immediately branch into y. I would then want to tabulate how often there are x2*y interactions and how often there are x3*y interactions.

1 Answers1

0

Here's a function that can extract parent/child pairs from the classification tree.

getparentchildpairs<-function(fit) {
    varnodes <- subset(fit$frame, var != "<leaf>", select="var")
    varnodes$var <- as.character(varnodes$var)
    cp<-Map(function(a,b) {varnodes$var[rownames(varnodes) %in% c(2*b, 2*b+1)]}, 
        varnodes$var, as.numeric(rownames(varnodes)))
    setNames(stack(Filter(length, cp)), c("child","parent"))
}

You would use it by just passing in the fit

fit <- rpart(R ~ x1+x2+x3+n1+n2+n3+y, method="class")
getparentchildpairs(fit)

#   child parent
# 1    x3     x2
# 2    x1     x3
# 3    n1     x1

You could interpret these pairs as "interactions" if you like.

enter image description here

MrFlick
  • 163,738
  • 12
  • 226
  • 242