0

I am using the decision tree model for the first time in general and am not sure whether the output I got from running the tree is as expected. There are over 700 predictor variables available in the dataset.

I used rpart package and issued the following statement:

data_rpart <- rpart(GOOD~.,data=data_Train)

The output shows only 2 key predictor variables ( transaction in 24 months and visit in 12 months) coming up as shown below: Rule number: 4 [GOOD=0.0735211267605634 cover=10650 (72%)] trans_24mth< 4.5 trans_24mth< 2.5

Rule number: 5 [GOOD=0.214780600461894 cover=2165 (15%)]
     trans_24mth< 4.5
     trans_24mth>=2.5

Rule number: 7 [GOOD=0.511111111111111 cover=990 (7%)]
     trans_24mth>=4.5
     visit_12mth>=10.5

Rule number: 6 [GOOD=0.307862679955703 cover=903 (6%)]
     trans_24mth>=4.5
     visit_12mth< 10.5

From an earlier logistic regression model fitted in SAS, I know these variables are relevant to the model.

My question is whether we can control the number of variables to show up in the model? Right now from 700 variables only 2 variables seem to show up. Is there a way for us to force the rpart statement to show more variables in the rules? This one shows only the transaction variables as the predictors; but say i want to see whether the demographic /psychographic variables in the dataset also play any role in identifying the good/bad in model? Thanks in advance for your help

agstudy
  • 113,354
  • 16
  • 180
  • 244
Shankar_m
  • 65
  • 1
  • 6

0 Answers0