MNIST Classification: low accuracy (10%) and no change in loss

Question

I'm learning TensorFlow and tired to apply on mnist database. My question is (see attached image) :

what could cause such output for accuracy (improving and then degrading!) & Loss (almost constant!)
the accuracy isn't that great just hovering around 10%

Despite:

5 layer network (incl. output layer), with 200/10/60/30/10 neurons respectively
Is the network not learning ? despite 0.1 learning rate (which is quite high I believe)

Full code: https://github.com/vibhorj/tf > mnist-2.py

1) here's how the layers are defined:

K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
    w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
    b1 = tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.1), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
    w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
    b2 = tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.1), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
    w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
    b3 = tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.1), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
    w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
    b4 = tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
    w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
    b_out = tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')

2) loss function

Y1 = tf.nn.sigmoid(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.sigmoid(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.sigmoid(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.sigmoid(tf.add(tf.matmul(Y3,w4),b4), name='Y4')

Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')
Y_pred_prob = tf.nn.softmax(Y_pred_logits, name='probs')

error = -tf.matmul(Y
              , tf.reshape(tf.log(Y_pred_prob),[10,-1]), name ='err')
loss = tf.reduce_mean(error, name = 'loss')

3) optimization function

opt = tf.train.GradientDescentOptimizer(0.1)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr')
z = opt.apply_gradients(grads_and_vars, global_step=ctr)

4) Tensorboard code:

evt_file = tf.summary.FileWriter('/Users/vibhorj/python/-tf/g_mnist')
evt_file.add_graph(tf.get_default_graph())

s1 = tf.summary.scalar(name='accuracy', tensor=accuracy)
s2 = tf.summary.scalar(name='loss', tensor=loss)
m1 = tf.summary.merge([s1,s2])

5) run the session (test data is mnist.test.images & mnist.test.labels

with tf.Session() as sess:
    sess.run(tf.variables_initializer(tf.global_variables()))
    for i in range(300):
       """ calc. accuracy on test data - TENSORBOARD before iteration beings """
       summary = sess.run(m1, feed_dict=test_data)
       evt_file.add_summary(summary, sess.run(ctr))
       evt_file.flush()

       """ fetch train data """        
       a_train, b_train = mnist.train.next_batch(batch_size=100)
       train_data = {X: a_train , Y: b_train}

       """ train """
       sess.run(z, feed_dict = train_data)

Appreciate your time to provide any insight into it. I'm completely clueless hwo to proceed further (even tried initializing w & b with random_normal, played with learning rates [0.1,0.01, 0.001])

Cheers!

Would you have a Gist or public repo we can checkout to find the issue? I had a quick review, but it is time-consuming to go through such code on this page. Review so far: Perhaps something with the name scopes. — Eric Platon, Nov 16 '17 at 08:19
Thanks a lot for look at it! https://github.com/vibhorj/tf > mnist-2.py — Vibhor Jain, Nov 16 '17 at 08:47
Basic questions: (1) How does your architecture compare to classic MNIST solutions? (2) Is there a similar, wider topology that trains well? For instance, what happens if you change the second layer from 10 to 30 neurons? These checks will help narrow the problem (eliminate the topology as the problem). — Prune, Nov 16 '17 at 17:29
There is public example for TF and MNiST, please read the docs — brown.2179, Nov 17 '17 at 21:47

score 1 · Accepted Answer · answered Nov 16 '17 at 17:49

Please consider

Initializing biases to zeros
Using ReLU units instead of sigmoid - avoid saturation
Using Adam optimizer - faster learning

I feel that your network is quite large. You could do with a smaller network.

K,L,M,N=200,100,60,30
""" Layer 1 """
with tf.name_scope('L1'):
    w1 = tf.Variable(initial_value = tf.truncated_normal([28*28,K],mean=0,stddev=0.1), name = 'w1')
    b1 = tf.zeros([K])#tf.Variable(initial_value = tf.truncated_normal([K],mean=0,stddev=0.01), name = 'b1')
""" Layer 2 """
with tf.name_scope('L2'):
    w2 = tf.Variable(initial_value =tf.truncated_normal([K,L],mean=0,stddev=0.1), name = 'w2')
    b2 = tf.zeros([L])#tf.Variable(initial_value = tf.truncated_normal([L],mean=0,stddev=0.01), name = 'b2')
""" Layer 3 """
with tf.name_scope('L3'):
    w3 = tf.Variable(initial_value = tf.truncated_normal([L,M],mean=0,stddev=0.1), name = 'w3')
    b3 = tf.zeros([M]) #tf.Variable(initial_value = tf.truncated_normal([M],mean=0,stddev=0.01), name = 'b3')
""" Layer 4 """
with tf.name_scope('L4'):
    w4 = tf.Variable(initial_value = tf.truncated_normal([M,N],mean=0,stddev=0.1), name = 'w4')
    b4 = tf.zeros([N])#tf.Variable(initial_value = tf.truncated_normal([N],mean=0,stddev=0.1), name = 'b4')
""" Layer output """
with tf.name_scope('L_out'):
    w_out = tf.Variable(initial_value = tf.truncated_normal([N,10],mean=0,stddev=0.1), name = 'w_out')
    b_out = tf.zeros([10])#tf.Variable(initial_value = tf.truncated_normal([10],mean=0,stddev=0.1), name = 'b_out')


Y1 = tf.nn.relu(tf.add(tf.matmul(X,w1),b1), name='Y1')
Y2 = tf.nn.relu(tf.add(tf.matmul(Y1,w2),b2), name='Y2')
Y3 = tf.nn.relu(tf.add(tf.matmul(Y2,w3),b3), name='Y3')
Y4 = tf.nn.relu(tf.add(tf.matmul(Y3,w4),b4), name='Y4')

Y_pred_logits = tf.add(tf.matmul(Y4, w_out),b_out,name='logits')

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=Y_pred_logits, name='xentropy'))
opt = tf.train.GradientDescentOptimizer(0.01)
grads_and_vars = opt.compute_gradients(loss)
ctr = tf.Variable(0.0, name='ctr', trainable=False)
train_op = opt.minimize(loss, global_step=ctr)

for v in tf.trainable_variables():
  print v.op.name

with tf.Session() as sess:
    sess.run(tf.variables_initializer(tf.global_variables()))
    for i in range(3000):
       """ calc. accuracy on test data - TENSORBOARD before iteration beings """
       #summary = sess.run(m1, feed_dict=test_data)
       #evt_file.add_summary(summary, sess.run(ctr))
       #evt_file.flush()

       """ fetch train data """
       a_train, b_train = mnist.train.next_batch(batch_size=100)
       train_data = {X: a_train , Y: b_train}

       """ train """
       l = sess.run(loss, feed_dict = train_data)
       print l
       sess.run(train_op, feed_dict = train_data)

Thanks a lot for sharing the tip. After 1) initializing bias to 0, 2) Activation RELU except output layer loss is *nan* and no change to accuracy (9.79%). EVen lowered learning rate to 0.00001 — Vibhor Jain, Nov 16 '17 at 18:55
Can you please share your latest code? I can see the loss going down for the code I posted above. — dgumo, Nov 16 '17 at 19:07
here you go https://github.com/vibhorj/tf/blob/master/so-changes/mnist-2-so.py Thanks a lot! Interesting, the code looks very similar but loss goes down in your in not in mind.. could it be because I'm not using in-built function tf.nn.softmax_cross_entropy_with_logits() for loss calculation? — Vibhor Jain, Nov 17 '17 at 03:10
May I ask why you recommend to initialise bias to 0? what implications it has.. most people say to use random values for initialisations.. — Vibhor Jain, Nov 17 '17 at 04:01
It's customary. For relu units a small positive value might be good since it would ensure that gradients flow initially. — dgumo, Nov 17 '17 at 05:38
For your NaN loss, the function I used is numerically stable. You can Google for discussion on this. BTW could you kindly consider upvoting if the solution above helped? — dgumo, Nov 17 '17 at 05:44
Thanks for clarifications re: RELU. but I;m still confused why your code works (loss goes down) but mine doesn't, they both almost same! could it be because I'm not using in-built function tf.nn.softmax_cross_entropy_with_logits() for loss calculation? — Vibhor Jain, Nov 17 '17 at 17:32
Please see this discussion: https://stackoverflow.com/questions/34240703/difference-between-tensorflow-tf-nn-softmax-and-tf-nn-softmax-cross-entropy-with — dgumo, Nov 17 '17 at 18:28
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/159259/discussion-between-vibhor-jain-and-dgumo). — Vibhor Jain, Nov 17 '17 at 20:36

MNIST Classification: low accuracy (10%) and no change in loss

1 Answers1