tensorflow - Training neural network by making batch_size increase to avoid shocking -
... build graph ... train_step = tf.train.gradientdescentoptimizer(learning_rate).minimize(cross_entropy) sess = tf.interactivesession() tf.global_variables_initializer().run() _ in range(1000): batch_xs, batch_ys = data.next_batch(batch_size) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
in typical tensorflow neural network training, make learning_rate
decay, make batch_size
increase. think making batch_size
increase make neural network converge , avoid shocking. suggestion train neural network. think useful?
if descent noisy:
increasing batch_size
stabilise fluctuations, gradient averaged on higher number of samples.
the effect of halfing learning_rate
similar of doubling batch_size
, not same (think vector-wise how different). halfing learning_rate
better mathematical point of view, doubling batch_size
, might(!) computationally more convenient.
in case of low noise:
reducing learning_rate
viable option. if gradient direction not noisy, increasing batch_size
not going change situation much. smaller learning_rate
useful, big step make gradient direction not representative, , exit "valley".
Comments
Post a Comment