Intuition behind U-net vs FCN for semantic segmentation

Question

I don't quite understand the following:

In the proposed FCN for Semantic Segmentation by Shelhamer et al, they propose a pixel-to-pixel prediction to construct masks/exact locations of objects in an image.

In the slightly modified version of the FCN for biomedical image segmentation, the U-net, the main difference seems to be "a concatenation with the correspondingly cropped feature map from the contracting path."

Now, why does this feature make a difference particularly for biomedical segmentation? The main differences I can point out for biomedical images vs other data sets is that in biomedical images there are not as rich set of features defining an object as for common every day objects. Also the size of the data set is limited. But is this extra feature inspired by these two facts or some other reason?

score 13 · Accepted Answer · answered Jun 18 '18 at 14:23

13

FCN vs U-Net:

FCN

It upsamples only once. i.e. it has only one layer in the decoder
The original implementation github repo uses bilinear interpolation for upsampling the convoloved image. That is there is no learnable filter here
variants of FCN-[FCN 16s and FCN 8s] add the skip connections from lower layers to make the output robust to scale changes

U-Net

multiple upsampling layers
uses skip connections and concatenates instead of adding up
uses learnable weight filters instead of fixed interpolation technique

answered Jun 18 '18 at 14:23

shasvat desai

359
1
11

For whatever reason, VGG16-FCN-8s (see my keras conversion https://github.com/dmitryako/keras_fcn_8s) worked much better for me, i.e. I could not get better results with U-Net. – Dmitry Konovalov Jul 01 '18 at 07:09
1

Hello, The results will depend on the task we are trying to do ans also the dataset which we are using. U net was specifically proven to work well with less [with data augmentation techniques]. Ideally in my experience UNet gives better performance because it has multiple upsamlping layers along with more skip connections which theoritically make it more robust to scale varitiaons as compared to FCN . BTW what task were you doing and what dataset did you use.Also can you post a link to your research papers here – shasvat desai Jul 02 '18 at 13:45

score 1 · Answer 2 · answered May 18 '18 at 19:11

1

U-Net is built upon J. Long's FCN paper. A couple of differences is that the original FCN paper used the decoder half to upsample the classification (i.e the entire second half of the net is of depth C - number of classes)

U-Net's think of the second half as being in feature space and do the final classification at the end.

Nothing about it is special to bio-medical IMO

answered May 18 '18 at 19:11

aivision2020

569
4
14

You are right, U-Net is not specifically biomedical, it just fits well for biomedical applications where accuracy (especially in shape) is critical, and U-Net's skip connection help a lot with that – the-lay Nov 01 '19 at 12:59

Intuition behind U-net vs FCN for semantic segmentation

2 Answers2