from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

Question

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) The training will not converge even for only one training image.

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) The training will converge for one training image.

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

Why? Is there something wrong for my usage?

This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

Calculate the pixel-wise outputs and loss for the single image from both models. The losses should be the same. — mdaoust, Aug 02 '19 at 05:05
Are your pathes exclusive? (Only one path is correct per pixel?) — Daniel Möller, Aug 03 '19 at 02:47
`channels_last`. yes, pathes is exclusive (ground truth is one-hot).@Daniel Möller — tidy, Aug 05 '19 at 07:14

score 10 · Accepted Answer · answered Aug 01 '19 at 08:10

10

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

answered Aug 01 '19 at 08:10

Shai

93,148
34
197
325

1

Yes, it is very likely that the numerical stability plays a role here. This also has been mentioned in the source code [documentation](https://github.com/tensorflow/tensorflow/blob/4f50b5dc6426f63a8e70b65d3b9e55ed8f7d38e2/tensorflow/python/keras/losses.py#L437): `Note: Using from_logits=True may be more numerically stable.`. – today Aug 01 '19 at 17:04
AFAIK Keras handles this by using an epsilon, which can turn off very-badly classified points. – mdaoust Aug 02 '19 at 04:55

score 0 · Answer 2 · answered Jul 31 '19 at 10:04

0

I guess the problem comes from the softmax activation function. Looking at the doc I found that sotmax is applied to the last axis by default. Can you look at model.summary() and check if that is what you want ?

answered Jul 31 '19 at 10:04

Simon Delecourt

1,321
4
10

From his code it looks like he is stacking binary images along the images channel dimension. Is what `CategoricalCrossEntropy` would expect. – mdaoust Aug 02 '19 at 05:02

score 0 · Answer 3 · answered Aug 03 '19 at 02:54

For softmax to work properly, you must make sure that:

You are using 'channels_last' as Keras default channel config.
- This means the shapes in the model will be like (None, height, width, channels)
- This seems to be your case because you are putting n_classes in the last axis. But it's also strange because you are using Conv2D and your output Y should be (1, height, width, n_classes) and not that strange shape you are using.
Your Y has only zeros and ones (not 0 and 255 as usually happens to images)
- Check that Y.max() == 1 and Y.min() == 0
- You may need to have Y = Y / 255.
Only one class is correct (your data does not have more than one path/channel with value = 1).
- Check that (Y.sum(axis=-1) == 1).all() is True

score 0 · Answer 4 · answered Oct 26 '20 at 06:43

from_logits = True signifies the values of the loss obtained by the model are not normalized and is basically used when we don't have any softmax function in our model. For e.g. https://www.tensorflow.org/tutorials/generative/dcgan in this model they have not used a softmax activation function or in other words we can say it helps in numerical stability.

from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

4 Answers4

Linked