python - Calculating gradient norm wrt weights with keras -


i attempting calculate gradient norm respect weights of neural network keras (as diagnostic tool). eventually, want create callback this, on way there have been working on creating function can compute gradient , return actual values in form of numpy array/scalar value (and not tensorflow tensor). code follows:

import numpy np import keras.backend k keras.layers import dense keras.models import sequential   def get_gradient_norm_func(model):     grads = k.gradients(model.total_loss, model.trainable_weights)     summed_squares = [k.sum(k.square(g)) g in grads]     norm = k.sqrt(sum(summed_squares))     func = k.function([model.input], [norm])     return func   def main():     x = np.random.random((128,)).reshape((-1, 1))     y = 2 * x     model = sequential(layers=[dense(2, input_shape=(1,)),                                dense(1)])     model.compile(loss='mse', optimizer='rmsprop')     get_gradient = get_gradient_norm_func(model)     history = model.fit(x, y, epochs=1)     print(get_gradient([x]))  if  __name__ == '__main__':     main() 

the code fails on call get_gradient(). traceback lengthy, involving lot shapes, little information on correct shape. how can correct this?

ideally, backend-agnostic solution, tensorflow-based solution option.

2017-08-15 15:39:14.914388: w tensorflow/core/framework/op_kernel.cc:1148] invalid argument: shape [-1,-1] has negative dimensions 2017-08-15 15:39:14.914414: e tensorflow/core/common_runtime/executor.cc:644] executor failed create kernel. invalid argument: shape [-1,-1] has negative dimensions          [[node: dense_2_target = placeholder[dtype=dt_float, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 2017-08-15 15:39:14.915026: w tensorflow/core/framework/op_kernel.cc:1148] invalid argument: shape [-1,-1] has negative dimensions 2017-08-15 15:39:14.915038: e tensorflow/core/common_runtime/executor.cc:644] executor failed create kernel. invalid argument: shape [-1,-1] has negative dimensions          [[node: dense_2_target = placeholder[dtype=dt_float, shape=[?,?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 2017-08-15 15:39:14.915310: w tensorflow/core/framework/op_kernel.cc:1148] invalid argument: shape [-1] has negative dimensions 2017-08-15 15:39:14.915321: e tensorflow/core/common_runtime/executor.cc:644] executor failed create kernel. invalid argument: shape [-1] has negative dimensions          [[node: dense_2_sample_weights = placeholder[dtype=dt_float, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]] traceback (most recent call last):   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call     return fn(*args)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn     status, run_metadata)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/contextlib.py", line 89, in __exit__     next(self.gen)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status     pywrap_tensorflow.tf_getcode(status)) tensorflow.python.framework.errors_impl.invalidargumenterror: shape [-1] has negative dimensions          [[node: dense_2_sample_weights = placeholder[dtype=dt_float, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]  during handling of above exception, exception occurred:  traceback (most recent call last):   file "gradientlog.py", line 45, in <module>     main()   file "gradientlog.py", line 42, in main     print(get_gradient([x]))   file "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 2251, in __call__     **self.session_kwargs)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run     run_metadata_ptr)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run     feed_dict_string, options, run_metadata)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run     target_list, options, run_metadata)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call     raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.invalidargumenterror: shape [-1] has negative dimensions          [[node: dense_2_sample_weights = placeholder[dtype=dt_float, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]  caused op 'dense_2_sample_weights', defined at:   file "gradientlog.py", line 45, in <module>     main()   file "gradientlog.py", line 39, in main     model.compile(loss='mse', optimizer='rmsprop')   file "/home/josteb/sandbox/keras/keras/models.py", line 783, in compile     **kwargs)   file "/home/josteb/sandbox/keras/keras/engine/training.py", line 799, in compile     name=name + '_sample_weights'))   file "/home/josteb/sandbox/keras/keras/backend/tensorflow_backend.py", line 435, in placeholder     x = tf.placeholder(dtype, shape=shape, name=name)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1530, in placeholder     return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1954, in _placeholder     name=name)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op     op_def=op_def)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op     original_op=self._default_original_op, op_def=op_def)   file "/home/josteb/.local/opt/anaconda3/envs/timeseries/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__     self._traceback = _extract_stack()  invalidargumenterror (see above traceback): shape [-1] has negative dimensions          [[node: dense_2_sample_weights = placeholder[dtype=dt_float, shape=[?], _device="/job:localhost/replica:0/task:0/cpu:0"]()]] 

there several placeholders related gradient computation process in keras:

  1. input x
  2. target y
  3. sample weights: if don't provide in model.fit(), keras still generates placeholder sample weights, , feed np.ones((y.shape[0],), dtype=k.floatx()) graph during training.
  4. learning phase: placeholder connected gradient tensor if there's layer using (e.g. dropout).

so, in provided example, in order compute gradients, need feed x, y , sample_weights graph. that's underlying reason of error.

inside model._make_train_function() there the following lines showing how construct necessary inputs k.function() in case:

inputs = self._feed_inputs + self._feed_targets + self._feed_sample_weights if self.uses_learning_phase , not isinstance(k.learning_phase(), int):     inputs += [k.learning_phase()]  k.name_scope('training'):     ...     self.train_function = k.function(inputs,                                      [self.total_loss] + self.metrics_tensors,                                      updates=updates,                                      name='train_function',                                      **self._function_kwargs) 

by mimicking function, should able norm value:

def get_gradient_norm_func(model):     grads = k.gradients(model.total_loss, model.trainable_weights)     summed_squares = [k.sum(k.square(g)) g in grads]     norm = k.sqrt(sum(summed_squares))     inputs = model.model._feed_inputs + model.model._feed_targets + model.model._feed_sample_weights     func = k.function(inputs, [norm])     return func  def main():     x = np.random.random((128,)).reshape((-1, 1))     y = 2 * x     model = sequential(layers=[dense(2, input_shape=(1,)),                                dense(1)])     model.compile(loss='mse', optimizer='rmsprop')     get_gradient = get_gradient_norm_func(model)     history = model.fit(x, y, epochs=1)     print(get_gradient([x, y, np.ones(len(y))])) 

execution output:

epoch 1/1 128/128 [==============================] - 0s - loss: 2.0073      [4.4091368] 

note since you're using sequential instead of model, model.model._feed_* required instead of model._feed_*.


Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -