While building a NN model, if we are working on a classification problem, as far as I know, we need an activation function at the last layer . In the tutorial (https://www.tensorflow.org/tutorials/images/cnn?hl=tr) it says "...then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs and a softmax activation." The model is:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models, layers
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
So, my question is that where is the softmax activation function in the model? That is also the same in (https://www.tensorflow.org/tutorials/images/classification?hl=tr). That is binary classification problem and there is no activation function at the last layer. Besides,
with the model above, which methods can I use directly : model.predict_classes() model.predict() model.predict_proba()
Why/when/ in what types of situations I would prefer using above structure instead of last layer having a activation="softmax" parameter?
Thanks.