8

I was trying to understand some basics about the tensorflow and I got stuck while reading documentation for max pooling 2D layer: https://www.tensorflow.org/tutorials/layers#pooling_layer_1

This is how max_pooling2d is specified:

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

where conv1 has a tensor with shape [batch_size, image_width, image_height, channels], concretely in this case it's [batch_size, 28, 28, 32].

So our input is a tensor with shape: [batch_size, 28, 28, 32].

My understanding of a max pooling 2D layer is that it will apply a filter of size pool_size (2x2 in this case) and moving sliding window by stride (also 2x2). This means that both width and height of the image will be halfed, i.e. we will end up with 14x14 pixels per channel (32 channels in total), meaning our output is a tensor with shape: [batch_size, 14, 14, 32].

However, according to the above link, the shape of the output tensor is [batch_size, 14, 14, 1]:

Our output tensor produced by max_pooling2d() (pool1) has a shape of 
[batch_size, 14, 14, 1]: the 2x2 filter reduces width and height by 50%.

What am I missing here?

How was 32 converted to 1?

They apply the same logic later here: https://www.tensorflow.org/tutorials/layers#convolutional_layer_2_and_pooling_layer_2

but this time it's correct, i.e. [batch_size, 14, 14, 64] becomes [batch_size, 7, 7, 64] (number of channels is the same).

Nikola Stojiljkovic
  • 633
  • 1
  • 5
  • 10

2 Answers2

3

Yes, use 2x2 max pool with strides=2x2 will reduce data to a half, and the output depth will not be changed. This is my test code from your given, the output shape is (14, 14, 32), maybe something wrong?

#!/usr/bin/env python

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('./MNIST_data/', one_hot=True)

conv1 = tf.placeholder(tf.float32, [None,28,28,32])
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2,2], strides=2)
print pool1.get_shape()

the output is:

Extracting ./MNIST_data/train-images-idx3-ubyte.gz
Extracting ./MNIST_data/train-labels-idx1-ubyte.gz
Extracting ./MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ./MNIST_data/t10k-labels-idx1-ubyte.gz
(?, 14, 14, 32)
大宝剑
  • 3,564
  • 4
  • 25
  • 46
1

Nikola, it has been corrected as you thought.

Learning the concept of convolution and pooling, I come across this thread. Thank you for your question, which takes me to the informative documentation.

Tora
  • 872
  • 1
  • 8
  • 15