python - h2o GLM GridSearch lambda value -


i using h2o (python) playing h2ogridsearch alpha values of glm (h2ogeneralizedlinearestimator), using lambda_search=true using k-fold cross-validation.

how can best model's lambda value?

edit: reproducible example

data:

34.40 17:1 73:1 127:1 265:1 912:1 1162:1 1512:1 1556:1 1632:1 1738:1 205.10 127:1 138:1 338:1 347:1 883:1 912:1 1120:1 1122:1 1512:1 7.75 66:1 127:1 347:1 602:1 1422:1 1512:1 1535:1 1738:1 8.85 127:1 608:1 906:1 979:1 1077:1 1512:1 1738:1 51.80 127:1 347:1 608:1 766:1 912:1 928:1 952:1 1034:1 1512:1 1610:1 1738:1 110.00 127:1 229:1 347:1 602:1 608:1 1171:1 1512:1 1718:1 8.90 66:1 127:1 205:1 347:1 490:1 589:1 912:1 1016:1 1512:1 

call file h2o_example.svmlight

then run:

h2o_data = h2o.import_file("h2o_example.svmlight") cols = h2o_data.columns[1:] hyper_parameters = {"alpha": [0.0, 0.01, 0.99, 1.0]} grid = h2ogridsearch(h2ogeneralizedlinearestimator(family="gamma", link="log", lambda_search=true, nfolds=2, intercept=true, standardize=false), hyper_params=hyper_parameters) grid.train(y="c1", x=cols, training_frame=h2o_data) grid_table = grid.get_grid(sort_by="r2", decreasing=true) best = grid_table.models[0] best.actual_params["lambda"] best.actual_params["alpha"] 

the last 2 commands fail, giving me error:

typeerror: 'property' object has no attribute '__getitem__' 

apparently, using lambda_search in wrong way. how can single alpha , lambda value best model according criterion?

final edit

there multiple ways of getting lambda (shown below) here 2 concise ways of getting lambda.(note reproducible code @ bottom)

if have lambda_search = true, can @ model summary table under lambda_search column , see value set lambda.min, best lambda

model.summary()['lambda_search'] 

which produce list string similar to:

['nlambda = 100, lambda.max = 12.733, lambda.min = 0.05261, lambda.1se = -1.0'] 

if don't use lambda search , don't set lambda value (or set it) can use summary table

model.summary()['regularization'] 

output looks like:

['elastic net (alpha = 0.5, lambda = 0.01289 )'] 

other options:

look @ actual parameters of model: best.actual_params['lambda'] best.actual_params['alpha']

where best best model in grid search results

first edit

to best model can do

grid_table = grid.get_grid(sort_by='r2', decreasing=true) best = grid_table.models[0] 

then can use:

best.actual_params['lambda'] 

fully reproducible example

import h2o h2o.estimators.glm import h2ogeneralizedlinearestimator h2o.init()  # import airlines dataset: # dataset used classify whether flight delayed 'yes' or not "no" # original data can found @ http://www.transtats.bts.gov/ airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")  # convert columns factors airlines["year"]= airlines["year"].asfactor() airlines["month"]= airlines["month"].asfactor() airlines["dayofweek"] = airlines["dayofweek"].asfactor() airlines["cancelled"] = airlines["cancelled"].asfactor() airlines['flightnum'] = airlines['flightnum'].asfactor()  # set predictor names , response column name predictors = ["origin", "dest", "year", "uniquecarrier", "dayofweek", "month", "distance", "flightnum"] response = "isdepdelayed"  # split train , validation sets train, valid= airlines.split_frame(ratios = [.8])  # try using `lambda_` parameter: # initialize estimator airlines_glm = h2ogeneralizedlinearestimator(family = 'binomial', lambda_ = .0001)  # train model airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)  # print auc validation data print(airlines_glm.auc(valid=true))   # example of values grid on `lambda` # import grid search h2o.grid.grid_search import h2ogridsearch  # select values lambda_ grid on hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]}  # example uses cartesian grid search because search space small # , want see performance of models. larger search space use # random grid search instead: {'strategy': "randomdiscrete"} # initialize glm estimator airlines_glm_2 = h2ogeneralizedlinearestimator(family = 'binomial')  # build grid search made glm , hyperparameters grid = h2ogridsearch(model = airlines_glm_2, hyper_params = hyper_params,                      search_criteria = {'strategy': "cartesian"})  # train using grid grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)  # sort grid models decreasing auc grid_table = grid.get_grid(sort_by = 'auc', decreasing = true) print(grid_table)  best = grid_table.models[0] print(best.actual_params['lambda']) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -