Questions tagged [decision-tree]

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.

Decision Tree could be just a graphical tool or the learning algorithm in a post.

2224 questions
176
votes
23 answers

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'
86
votes
6 answers

Passing categorical data to Sklearn Decision Tree

There are several posts about how to encode categorical data to Sklearn Decision trees, but from Sklearn documentation, we got these Some advantages of decision trees are: (...) Able to handle both numerical and categorical data. Other techniques…
0xhfff
  • 975
  • 1
  • 6
  • 5
50
votes
1 answer

Decision tree vs. Naive Bayes classifier

I am doing some research about different data mining techniques and came across something that I could not figure out. If any one have any idea that would be great. In which cases is it better to use a Decision tree and other cases a Naive Bayes…
Youssef
  • 9,032
  • 35
  • 118
  • 192
47
votes
1 answer

What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

I've read from the relevant documentation that : Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value. But,…
makansij
  • 7,473
  • 28
  • 82
  • 156
45
votes
3 answers

How do I find which attributes my tree splits on, when using scikit-learn?

I have been exploring scikit-learn, making decision trees with both entropy and gini splitting criteria, and exploring the differences. My question, is how can I "open the hood" and find out exactly which attributes the trees are splitting on at…
tumultous_rooster
  • 10,446
  • 27
  • 81
  • 140
44
votes
11 answers

Visualizing decision tree in scikit-learn

I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows: from pandas import read_csv, DataFrame from sklearn import tree from os…
Ravi
  • 2,703
  • 7
  • 32
  • 44
44
votes
1 answer

How do I solve overfitting in random forest of Python sklearn?

I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train Accuracy: 0.914634146341 Test Accuracy: 0.55 Fold 2 :…
42
votes
1 answer

Different decision tree algorithms with comparison of complexity or performance

I am doing research on data mining and more precisely, decision trees. I would like to know if there are multiple algorithms to build a decision trees (or just one?), and which is better, based on criteria such as Performance Complexity Errors in…
39
votes
4 answers

Plot Interactive Decision Tree in Jupyter Notebook

Is there a way to plot a decision tree in a Jupyter Notebook, such that I can interactively explore its nodes? I am thinking about something like this . This is an example from KNIME. I have found…
r0f1
  • 1,533
  • 1
  • 19
  • 31
33
votes
6 answers

Help Understanding Cross Validation and Decision Trees

I've been reading up on Decision Trees and Cross Validation, and I understand both concepts. However, I'm having trouble understanding Cross Validation as it pertains to Decision Trees. Essentially Cross Validation allows you to alternate between…
chubbsondubs
  • 34,812
  • 24
  • 97
  • 134
31
votes
1 answer

How to compute error rate from a decision tree?

Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function.
teo6389
  • 513
  • 1
  • 5
  • 10
30
votes
2 answers

confused about random_state in decision tree of scikit learn

Confused about random_state parameter, not sure why decision tree training needs some randomness. My thoughts, (1) is it related to random forest? (2) is it related to split training testing data set? If so, why not use training testing split method…
Lin Ma
  • 8,271
  • 25
  • 84
  • 152
29
votes
1 answer

How do you access tree depth in Python's scikit-learn?

I'm using scikit-learn to create a Random Forest. However, I want to find the individual depths of each tree. It seems like a simple attribute to have but according to the documentation,…
iltp38
  • 469
  • 2
  • 5
  • 12
29
votes
2 answers

Using GridSearchCV with AdaBoost and DecisionTreeClassifier

I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work,…
GPB
  • 1,985
  • 7
  • 21
  • 34
28
votes
4 answers

how to explain the decision tree from scikit-learn

I have two problems with understanding the result of decision tree from scikit-learn. For example, this is one of my decision trees: My question is that how I can use the tree? The first question is that: if a sample satisfied the condition, then…
Student Jack
  • 835
  • 2
  • 12
  • 19
1
2 3
99 100