4

This seems like a trivial question, but I've been unable to find the answer.

I have batched sequences of images of shape:

[batch_size, number_of_frames, frame_height, frame_width, number_of_channels]

and I would like to pass each frame through a few convolutional and pooling layers. However, TensorFlow's conv2d layer accepts 4D inputs of shape:

[batch_size, frame_height, frame_width, number_of_channels]

My first attempt was to use tf.map_fn over axis=1, but I discovered that this function does not propagate gradients.

My second attempt was to use tf.unstack over the first dimension and then use tf.while_loop. However, my batch_size and number_of_frames are dynamically determined (i.e. both are None), and tf.unstack raises {ValueError} Cannot infer num from shape (?, ?, 30, 30, 3) if num is unspecified. I tried specifying num=tf.shape(self.observations)[1], but this raises {TypeError} Expected int for argument 'num' not <tf.Tensor 'A2C/infer/strided_slice:0' shape=() dtype=int32>.

Rylan Schaeffer
  • 733
  • 7
  • 21

1 Answers1

5

Since all the images (num_of_frames) are passed to the same convolutional model, you can stack both batch and frames together and do the normal convolution. Can be achieved by just using tf.resize as shown below:


# input with size [batch_size, frame_height, frame_width, number_of_channels
x = tf.placeholder(tf.float32,[None, None,32,32,3])

# reshape for the conv input
x_reshapped = tf.reshape(x,[-1, 32, 32, 3])

x_reshapped output size will be (50, 32, 32, 3)

# define your conv network
y = tf.layers.conv2d(x_reshapped,5,kernel_size=(3,3),padding='SAME')
#(50, 32, 32, 3)

#Get back the input shape
out = tf.reshape(x,[-1, tf.shape(x)[1], 32, 32, 3])

The output size would be same as the input: (10, 5, 32, 32, 3

with tf.Session() as sess:
   sess.run(tf.global_variables_initializer())

   print(sess.run(out, {x:np.random.normal(size=(10,5,32,32,3))}).shape)
   #(10, 5, 32, 32, 3) 
vijay m
  • 13,334
  • 2
  • 34
  • 48