tensorflow - Training neural network by making batch_size increase to avoid shocking -


... build graph ... train_step =  tf.train.gradientdescentoptimizer(learning_rate).minimize(cross_entropy) sess = tf.interactivesession() tf.global_variables_initializer().run()  _ in range(1000):   batch_xs, batch_ys = data.next_batch(batch_size)   sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) 

in typical tensorflow neural network training, make learning_rate decay, make batch_size increase. think making batch_size increase make neural network converge , avoid shocking. suggestion train neural network. think useful?

if descent noisy:

increasing batch_size stabilise fluctuations, gradient averaged on higher number of samples.

the effect of halfing learning_rate similar of doubling batch_size, not same (think vector-wise how different). halfing learning_rate better mathematical point of view, doubling batch_size, might(!) computationally more convenient.

in case of low noise:

reducing learning_rate viable option. if gradient direction not noisy, increasing batch_size not going change situation much. smaller learning_rate useful, big step make gradient direction not representative, , exit "valley".


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -