python - h2o GLM GridSearch lambda value -
i using h2o (python) playing h2ogridsearch
alpha values of glm (h2ogeneralizedlinearestimator
), using lambda_search=true
using k-fold cross-validation.
how can best model's lambda value?
edit: reproducible example
data:
34.40 17:1 73:1 127:1 265:1 912:1 1162:1 1512:1 1556:1 1632:1 1738:1 205.10 127:1 138:1 338:1 347:1 883:1 912:1 1120:1 1122:1 1512:1 7.75 66:1 127:1 347:1 602:1 1422:1 1512:1 1535:1 1738:1 8.85 127:1 608:1 906:1 979:1 1077:1 1512:1 1738:1 51.80 127:1 347:1 608:1 766:1 912:1 928:1 952:1 1034:1 1512:1 1610:1 1738:1 110.00 127:1 229:1 347:1 602:1 608:1 1171:1 1512:1 1718:1 8.90 66:1 127:1 205:1 347:1 490:1 589:1 912:1 1016:1 1512:1
call file h2o_example.svmlight
then run:
h2o_data = h2o.import_file("h2o_example.svmlight") cols = h2o_data.columns[1:] hyper_parameters = {"alpha": [0.0, 0.01, 0.99, 1.0]} grid = h2ogridsearch(h2ogeneralizedlinearestimator(family="gamma", link="log", lambda_search=true, nfolds=2, intercept=true, standardize=false), hyper_params=hyper_parameters) grid.train(y="c1", x=cols, training_frame=h2o_data) grid_table = grid.get_grid(sort_by="r2", decreasing=true) best = grid_table.models[0] best.actual_params["lambda"] best.actual_params["alpha"]
the last 2 commands fail, giving me error:
typeerror: 'property' object has no attribute '__getitem__'
apparently, using lambda_search
in wrong way. how can single alpha , lambda value best model according criterion?
final edit
there multiple ways of getting lambda (shown below) here 2 concise ways of getting lambda.(note reproducible code @ bottom)
if have lambda_search = true
, can @ model summary table under lambda_search
column , see value set lambda.min
, best lambda
model.summary()['lambda_search']
which produce list string similar to:
['nlambda = 100, lambda.max = 12.733, lambda.min = 0.05261, lambda.1se = -1.0']
if don't use lambda search , don't set lambda value (or set it) can use summary table
model.summary()['regularization']
output looks like:
['elastic net (alpha = 0.5, lambda = 0.01289 )']
other options:
look @ actual parameters of model: best.actual_params['lambda']
best.actual_params['alpha']
where best
best model in grid search results
first edit
to best model can do
grid_table = grid.get_grid(sort_by='r2', decreasing=true) best = grid_table.models[0]
then can use:
best.actual_params['lambda']
fully reproducible example
import h2o h2o.estimators.glm import h2ogeneralizedlinearestimator h2o.init() # import airlines dataset: # dataset used classify whether flight delayed 'yes' or not "no" # original data can found @ http://www.transtats.bts.gov/ airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip") # convert columns factors airlines["year"]= airlines["year"].asfactor() airlines["month"]= airlines["month"].asfactor() airlines["dayofweek"] = airlines["dayofweek"].asfactor() airlines["cancelled"] = airlines["cancelled"].asfactor() airlines['flightnum'] = airlines['flightnum'].asfactor() # set predictor names , response column name predictors = ["origin", "dest", "year", "uniquecarrier", "dayofweek", "month", "distance", "flightnum"] response = "isdepdelayed" # split train , validation sets train, valid= airlines.split_frame(ratios = [.8]) # try using `lambda_` parameter: # initialize estimator airlines_glm = h2ogeneralizedlinearestimator(family = 'binomial', lambda_ = .0001) # train model airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # print auc validation data print(airlines_glm.auc(valid=true)) # example of values grid on `lambda` # import grid search h2o.grid.grid_search import h2ogridsearch # select values lambda_ grid on hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]} # example uses cartesian grid search because search space small # , want see performance of models. larger search space use # random grid search instead: {'strategy': "randomdiscrete"} # initialize glm estimator airlines_glm_2 = h2ogeneralizedlinearestimator(family = 'binomial') # build grid search made glm , hyperparameters grid = h2ogridsearch(model = airlines_glm_2, hyper_params = hyper_params, search_criteria = {'strategy': "cartesian"}) # train using grid grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid) # sort grid models decreasing auc grid_table = grid.get_grid(sort_by = 'auc', decreasing = true) print(grid_table) best = grid_table.models[0] print(best.actual_params['lambda'])
Comments
Post a Comment