I am going through the Udacity DeepLearning Nanodegree and working on the autoencoder mini project. I do not understand the solution, nor how to check it myself. So this is 2 questions.
We start with 28*28 images. These are fed through 3 convolutional layers, each with padding of 1, and each with a maxpooling to half the original dimensions. What I don't understand is the last element? Surely 2 rounds of maxpooling (28/2)/2 gives 7 and therefore a further maxpooling shouldn't be possible as it results in an odd number. Can someone explain why this is the case to me? The code to replicate is here:
'''
import torch
import numpy as np
from torchvision import datasets
import torchvision.transforms as transforms
# convert data to torch.FloatTensor
transform = transforms.ToTensor()
# load the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=True, transform=transform)
# Create training and test dataloaders
num_workers = 0
# how many samples per batch to load
batch_size = 20
# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
import torch.nn as nn
import torch.nn.functional as F
# define the NN architecture
class ConvDenoiser(nn.Module):
def __init__(self):
super(ConvDenoiser, self).__init__()
## encoder layers ##
# conv layer (depth from 1 --> 32), 3x3 kernels
self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
# conv layer (depth from 32 --> 16), 3x3 kernels
self.conv2 = nn.Conv2d(32, 16, 3, padding=1)
# conv layer (depth from 16 --> 8), 3x3 kernels
self.conv3 = nn.Conv2d(16, 8, 3, padding=1)
# pooling layer to reduce x-y dims by two; kernel and stride of 2
self.pool = nn.MaxPool2d(2, 2)
## decoder layers ##
# transpose layer, a kernel of 2 and a stride of 2 will increase the spatial dims by 2
self.t_conv1 = nn.ConvTranspose2d(8, 8, 3, stride=2) # kernel_size=3 to get to a 7x7 image output
# two more transpose layers with a kernel of 2
self.t_conv2 = nn.ConvTranspose2d(8, 16, 2, stride=2)
self.t_conv3 = nn.ConvTranspose2d(16, 32, 2, stride=2)
# one, final, normal conv layer to decrease the depth
self.conv_out = nn.Conv2d(32, 1, 3, padding=1)
def forward(self, x):
## encode ##
# add hidden layers with relu activation function
# and maxpooling after
x = F.relu(self.conv1(x))
x = self.pool(x)
# add second hidden layer
x = F.relu(self.conv2(x))
x = self.pool(x)
# add third hidden layer
x = F.relu(self.conv3(x))
x = self.pool(x) # compressed representation
## decode ##
# add transpose conv layers, with relu activation function
x = F.relu(self.t_conv1(x))
x = F.relu(self.t_conv2(x))
x = F.relu(self.t_conv3(x))
# transpose again, output should have a sigmoid applied
x = F.sigmoid(self.conv_out(x))
return x
# initialize the NN
model = ConvDenoiser()
print(model)
I wanted to try to understand this by passing a single image through the layers manually and see what the result was but this resulted in an error. Can someone explain to me how I can see the shapes that pass through the layers? Code is a bit messy but I left it there so you can see what I tried.
dataiter = iter(train_loader)
images, labels = dataiter.next()
# images = images.numpy()
# get one image from the batch
# img = np.squeeze(images[0])
img=images[0]
#create hidden layer
conv1 = nn.Conv2d(1, 32, 3, padding=1)
# z=torch.from_numpy(images[0])
z1=conv1(img)
Appreciate any insights you can give me.
Thanks,
J