2

I'm trying to use LightGBM for a regression problem (mean absolute error/L1 - or similar like Huber or pseud-Huber - loss) and I primarily want to tune my hyperparameters. LightGBMTunerCV in optuna offers a nice starting point, but after that I'd like to search more in depth (without losing what the automated tuner learns). Additionally, I'd like to use mean cross-validation score + standard deviation of cross-validation scores as my metric for ranking models (i.e. I assume that a lower SD is a good sign for a more stable performance on unseen data from the same distribution).

I've done something like this:

import optuna
import optuna.integration.lightgbm as lgb

params = {
        "objective": "l1",
        "metric": "l1",
        "verbosity": -1,
        "boosting_type": "gbdt",
    }    
    
dtrain = lgb.Dataset(X, label=y)    
mystudy = optuna.create_study()    
tuner = lgb.LightGBMTunerCV(
    params, dtrain, 
    verbose_eval=False,
    time_budget=6000, 
    study = mystudy)

tuner.run()

Now I want to do a further search that takes into account these results. If I had no previous results, I might do something like this:

def objective(trial):    
        param = {
            'objective': 'l1',
            'metric': 'l1',
            'verbosity': -1,
            'boosting_type': 'gbdt',        
            'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
            'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),
            'num_leaves': trial.suggest_int('num_leaves', 2, 512),
            'feature_fraction': trial.suggest_uniform('feature_fraction', 0.1, 1.0),
            'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.1, 1.0),
            'bagging_freq': trial.suggest_int('bagging_freq', 1, 15),
            'min_child_samples': trial.suggest_int('min_child_samples', 2, 256),
            'seed': 1979
        }
       
    # insert code for getting X and y ready 
    dtrain = lgb.Dataset(X, label=y)

    lcv = lgb.cv(
        param, 
        dtrain, 
        verbose_eval=False)
                
    return lcv['l1-mean'][-1]+lcv['l1-stdv'][-1]

optuna.logging.set_verbosity(optuna.logging.WARNING)
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=2500)

I am a bit unclear on how to add the previous results to a study. Not defining a new study would presumably be solve that, but it looks like LightGBMTunerCV uses the mean score (not mean + stdv) and there is no way to easily change that? Can one somehow post-process the trials in the study to add the stdv?

I've also not even seen a clear example on how one would use FixedTrial to force a study to re-run the best tuned parameters (even if I understand that that should be possible), which could be another approach.

Björn
  • 392
  • 4
  • 19

0 Answers0