5

I have trained the following CNN model with a smaller data set, therefore it does overfitting:

model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])

The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?

Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten

or with an additional MaxPooling and Dropout like follows?

Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling - Dropout - Flatten

I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.

AhmaD
  • 1,467
  • 3
  • 17
  • 31
Code Now
  • 511
  • 5
  • 14

2 Answers2

5

Overfitting can happen when your dataset is not large enough to accomodate your number of features. Max pooling uses a max operation to pool sets of features, leaving you with a smaller number of them. Therefore, max-pooling should logically reduce overfit.

Drop-out reduces reliance on any single feature by ensuring that feature is not always available, forcing the model to look for different potential hints, rather than just sticking with one -- which would easily allow the model to overfit on any apparently good hint. Therefore, this also should help reduce overfit.

Kiara Grouwstra
  • 4,924
  • 4
  • 18
  • 33
0

You Should NOT Use Max-pooling in order to reduce overfitting, although it has a small effect on that, BUT this small effect is not enough because you are applying Max-Pooling after the convolutional operations, which means that the features are already trained in this layer and since max-pooling is used to reduce the hight and width of the output, this will make the features in the next layer has less convolutional operations to learn from, which means a LITTLE EFFECT on the overfitting problem, that won't solve it. Actually it's not recommended at all using Pooling for this kind of problems, and here are some tips:

  1. Reduce the number of your parameters because it's very hard(not impossible) to find enough data to train 3 millions parameters without overfitting.
  2. Use regularization techniques like Drop-out which is very effective by the way, or L2-regularization,..etc.
  3. 3.DONT use max pooling for the purpose of reducing overfitting because it's is used to reduce the rapresentation and to make the network a bit more robust to some features, further more using it so much will make the network more and more robust to a some kind of featuers.

Hope that helps!

Warios
  • 181
  • 10
  • Would it also be good to use padding='valid' instead of padding='same'? Would it be better to omit batch normalization? – Code Now Feb 11 '20 at 21:09