I have trained the following CNN model with a smaller data set, therefore it does overfitting:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))

model.add(Conv2D(32, kernel_size=(3,3), padding='same'))

model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])

The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?

Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten

or with an additional MaxPooling and Dropout like follows?

Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling - Dropout - Flatten

I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.

Overfitting can happen when your dataset is not large enough to accomodate your number of features. Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them. Therefore, max-pooling should logically reduce overfit.

Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint. Therefore, this also should help reduce overfit.

You Should NOT Use Max-pooling in order to reduce overfitting, although it has a small effect on that, BUT this small effect is not enough because you are applying Max-Pooling after the convolutional operations, which means that the features are already trained in this layer and since max-pooling is used to reduce the hight and width of the output, this will make the features in the next layer has less convolutional operations to learn from, which means a LITTLE EFFECT on the overfitting problem, that won't solve it. Actually it's not recommended at all using Pooling for this kind of problems, and here are some tips:

  1. Reduce the number of your parameters because it's very hard(not impossible) to find enough data to train 3 millions parameters without overfitting.
  2. Use regularization techniques like Drop-out which is very effective by the way, or L2-regularization,..etc.
  3. 3.DONT use max pooling for the purpose of reducing overfitting because it's is used to reduce the rapresentation and to make the network a bit more robust to some features, further more using it so much will make the network more and more robust to a some kind of featuers.

  • Would it also be good to use padding='valid' instead of padding='same'? Would it be better to omit batch normalization? – Code Now Feb 11 '20 at 21:09