18

As in my dataset ,the Leakage have two value 1,0. There are just about 300 rows with 1 and extra in 569378 rows are with 1. This would be the reason that I just got 1 root in the rpart result.

How can I solve this?

fm.pipe<-Leakage~PipeAge +PipePressure

> printcp(CART.fit)

Regression tree:
rpart(formula = fm.pipe, data = Data)

Variables actually used in tree construction:
character(0)

Root node error: 299.84/569378 = 0.00052661

n= 569378 

         CP nsplit rel error xerror xstd
1 0.0033246      0         1      0    0
ישו אוהב אותך
  • 22,515
  • 9
  • 59
  • 80
user3172776
  • 201
  • 1
  • 3
  • 5

3 Answers3

25

There may not be a way to "solve" this, if the independent variables do not provide enough information to grow the tree. See, for example, the help for rpart.control: "Any split that does not decrease the overall lack of fit by a factor of cp is not attempted." You could try loosening the control parameters, but there's no guarantee that will result in the tree growing beyond a root.

CART.fit <- rpart(formula=fm.pipe, data=Data, control=rpart.control(minsplit=2, minbucket=1, cp=0.001))
Jean V. Adams
  • 4,214
  • 2
  • 24
  • 43
8

I'm not sure I understand your row length issue, but here's what that error typically means:

rpart uses constraints to build a decision tree. Here's the default values, from the docs:

rpart.control(minsplit = 20, minbucket = round(minsplit/3), cp = 0.01, 
      maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10,
      surrogatestyle = 0, maxdepth = 30, ...)

You need to lessen these restraints. As @JeanVAdams said, start with the bare minimum:

rpart(formula=fm.pipe, data=Data, 
      control=rpart.control(minsplit=1, minbucket=1, cp=0.001))

Your first result will probably have way too many nodes, so you will have to slowly build up these restraints until you get a decent sized tree.


If you're still confused, here's an example:

Let's say you are looking at grocery store data, and you want to see a tree of the most popular hours to shop. There's only 24 hours, right? So there's only 24 possibilities for the independent variable. Rpart has a condition that says

"There must be at least 20 things in a node for me to split it."

This means your node can't even split once. Even if you have 15 billion rows, there's only 24 possible ways to split it. It's more complex than this probably, but this is a good place to start.

I actually was looking at this exact issue (shoppers by hour), and I had to leave my constraints at the lowest possible level in order to get a tree at all:

rpart(formula=fm.pipe, data=Data, control=rpart.control(minsplit=1, minbucket=1, cp=0.001))

Travis Heeter
  • 9,968
  • 10
  • 69
  • 114
1

My dataset contains only 14 rows. Try using the following code:

dtm<-rpart(playtennis~., weathe_train, method="class", minsplit=2, minbucket=1)
Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120