Questions tagged [gbm]

R package gbm, implementing Generalized Boosted Regression Models library.

R package gbm, implementing Generalized Boosted Regression Models library.

This package implements extensions to Freund and Schapire’s AdaBoost algorithm and Friedman’s gradient boosting machine.

Includes regression methods for least squares,absolute loss, t-distribution loss, quantile regression,logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart).

Who's using gbm?

The gbm package is used in examples in Software for Data Analysis by John Chambers.

gbm is also used in Elements of Statistical Learning by Hastie, Tibshirani and Friedman.

Richard A. Berk also uses gbm in his book, Statistical Learning from a Regression Perspective.

Source: gradientboostedmodels

328 questions
4
votes
1 answer

How can I offset exposures in a gbm model in R?

I am trying to fit a gradient boosting machine (GBM) to insurance claims. The observations have unequal exposure so I am trying to use an offset equal to the log of exposures. I tried two different ways: Put an offset term in the formula. This…
4
votes
1 answer

(R) Plot dendrograms BRT models from gbm.step

(previously posted here, to the wrong sub, with not enough info, which was closed, I edited, the edits seem to have been deleted, & the post consigned to purgatory, so apologies for re-posting, I don't know whether the previous post can/should be…
dez93_2000
  • 1,023
  • 14
  • 25
4
votes
2 answers

GBM model generating NA results

I'm trying to run a simple GBM classification model to benchmark performance against random forests and SVMs, but I'm having trouble getting the model to score correctly. It's not throwing an error, but the predictions are all NaN. I'm using the…
TomR
  • 486
  • 8
  • 19
4
votes
1 answer

What does `train.error` actually represent for gbm?

Consider the short R script below. It seems that boost.hitters$train.error does not match up with either the raw residuals or the squared errors of the training set. I could not find documentation on train.error at all, so I am wondering if anyone…
merlin2011
  • 63,368
  • 37
  • 161
  • 279
3
votes
1 answer

Negative SHAP values in H2O in Python using predict_contributions

I have been trying to compute SHAP values for a Gradient Boosting Classifier in H2O module in Python. Below there is the adapted example in the documentation for the predict_contibutions method (adapted from…
3
votes
0 answers

Recipe vs Formula vs X/Y Interface reproducibility for gbm with caret

I have trained the same model on the iris data set to investigate the reproducibility of each method. It seems that there is a discrepency between models when using all.equal() for the models trained with the recipes interface, but not with the…
JFG123
  • 517
  • 3
  • 11
3
votes
1 answer

Server Error Water.exceptions.H2OIllegalArgumentException While Implementing Grid Search using H2O

I am a newbie using H2O. I am trying to run H2OGridSearch with GBM to get my best hyper parameters. I am following the instructions given at H2O-AI Github repo. It worked well when I was trying Regression but now when I am trying classification it…
3
votes
0 answers

Implausible variable importance for GBM survival: constant difference in importance

I have a question about a GBM survival analysis. I'm trying to quantify variable importances for my variables (n=453), in a data set of 3614 individuals. The resulting graph wi th variable importances looks suspiciously arranged. I have computed…
3
votes
1 answer

Classification Tree Diagram from H2O Mojo/Pojo

This question draws heavily from the solution to this question as a jumping off point. Given that I can use R to produce a mojo model object: library(h2o) h2o.init() airlinedf <-…
RealViaCauchy
  • 217
  • 1
  • 9
3
votes
1 answer

h2o error when run on a subset of the data but runs perfectly on the original data

The error that i am getting is this. The subset[~100k examples] of my data has exactly the same number of columns as the original dataset [400k examples].But it runs perfectly on the original dataset but not on the subset. Traceback (most recent…
YNWA
  • 43
  • 5
3
votes
1 answer

GBM Bernoulli returns no results with NaN

I know this question has been asked multiple times but I've run out of ideas to get the model working. The first 50 rows of the train data: > train[1:25] a b c d e f g h i j k l m 1: 0 148.00 27 16 0 A 0 117 92 0 …
Ankhnesmerira
  • 1,068
  • 8
  • 19
3
votes
1 answer

xgboost error message about numerical variable and label

I use the xgboost function in R, and I get the following error message bst <- xgboost(data = germanvar, label = train$Creditability, max.depth = 2, eta = 1,nround = 2, objective = "binary:logistic") Error in xgb.get.DMatrix(data, label, missing,…
신익수
  • 67
  • 3
  • 7
3
votes
2 answers

How to reproduce the H2o GBM class probability calculation

I've been using h2o.gbm for a classification problem, and wanted to understand a bit more about how it calculates the class probabilities. As a starting point, I tried to recalculate the class probability of a gbm with only 1 tree (by looking at the…
3
votes
2 answers

Process categorical features when building decision tree models

I was using H2O to build classification models like GBM, DRF and DL. The dataset I have contains a few categorical columns, and if I want to use them as features for building models do I need to manually convert them into dummy variables? I read…
Selena
  • 223
  • 1
  • 2
  • 7
3
votes
1 answer

Why does gbm() give different results than h2o.gbm() in this minimal example?

Tinkering with gradient boosting and I noticed R's gbm package produces different results than h2o on a minimal example. Why? Data library(gbm) library(h2o) h2o.init() train <- data.frame( X1 = factor(c("A", "A", "A", "B", "B")), X2 =…
Ben
  • 15,465
  • 26
  • 90
  • 157
1 2
3
21 22