1

I try to make a decision tree with the following dataset:

RESULT EXPG_HOME R_HOME_3DAY
 1      1.321   0.20
 2      1.123   0.30
 1      0.762   0.26

If I try this:

library(rpart)
tree <- rpart(RESULT ~ EXPG_HOME, df, method="class")
fancyRpartPlot(tree)

It works out. But when I try:

tree <- rpart(RESULT ~ R_HOME_3DAY, df, method="class")
fancyRpartPlot(tree)

I get the following error:

Error in apply(model$frame$yval2[, yval2per], 1, function(x) x[1 + x[1]]) : 
dim(X) must have a positive length

Any thoughts on what goes wrong here?

Both EXPG_HOME and R_HOME_3DAY are numeric.

And this is what I get with the relevant variable:

> table(df$R_HOME_3DAY)

      0         0.1 0.133333333 0.166666667         0.2 0.233333333 
     21          65          14          10         194          53 
0.266666667         0.3 0.333333333 0.366666667         0.4 0.433333333 
     63         248         107         185         369         169 
0.466666667         0.5 0.533333333 0.566666667         0.6 0.633333333 
    334         351         184         382         317         213 
0.666666667         0.7 0.733333333 0.766666667         0.8 0.833333333 
    336         251         112         217          92          64 
0.866666667         0.9 0.933333333 
     83          20           5 
Frank Gerritsen
  • 175
  • 4
  • 13

2 Answers2

3

Problem is you didn't get a tree, just a root (node) :)

> tree <- rpart(RESULT ~ EXPG_HOME, df, method="class")
> fancyRpartPlot(tree)
Error in apply(model$frame$yval2[, yval2per], 1, function(x) x[1 + x[1]]) : 
  dim(X) must have a positive length
> plot(tree)
Error in plot.rpart(tree) : fit is not a tree, just a root
> tree
n= 3 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 3 1 1 (0.6666667 0.3333333) *
fishtank
  • 3,460
  • 1
  • 11
  • 16
1

What is happening is that the independent variables do not provide enough information to grow your tree. The rpart package caps the depth that the tree grows by setting default limits. The following is from ?rpart.control.

rpart.control(minsplit = 20, 
              minbucket = round(minsplit/3), 
              cp = 0.01, 
              maxcompete = 4, 
              maxsurrogate = 5, 
              usesurrogate = 2, 
              xval = 10,
              surrogatestyle = 0, 
              maxdepth = 30, ...)

So, you may want to loosen the control parameters as follows:

tree <- rpart(RESULT ~ EXPG_HOME, df, method="class",
              control = rpart.control(minsplit = 1, 
                                      minbucket = 1, 
                                      cp = 0.001)

This will highly likely result in a tree with many nodes. From here, you can play around with the parameters to get a decent tree.

remykarem
  • 1,467
  • 18
  • 22