I have trained the following CNN model with a smaller data set, therefore it does overfitting:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape=(28,28,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv2D(32, kernel_size=(3,3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))
model.add(Flatten())
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer=Adam(), metrics=['accuracy'])
The model has a lot of trainable parameters (more than 3 million, that's why I wonder if I should reduce the number of parameters with additional MaxPooling like follows?
Conv - BN - Act - MaxPooling - Conv - BN - Act - MaxPooling - Dropout - Flatten
or with an additional MaxPooling and Dropout like follows?
Conv - BN - Act - MaxPooling - Dropout - Conv - BN - Act - MaxPooling - Dropout - Flatten
I am trying to understand the full sense of MaxPooling and whether it can help against overfitting.