YOLO algorithm output layer activation function

Question

I have built a simple YOLO localization model in Keras like,

model_layers = [
    keras.layers.Conv2D( 32 , input_shape=( input_dim , input_dim , 3 ) , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 32 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) , strides=2 ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) , strides=2 ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) , strides=2 ),
    keras.layers.Conv2D( 128 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 128 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 64 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 32 , kernel_size=( 3 , 3 ) , strides=1 , activation='relu' ),
    keras.layers.Conv2D( 8 , kernel_size=( 3 , 3 ) , strides=1 ),
]

model = keras.models.Sequential( model_layers ) 
model.compile( loss=yolo_keras_loss , optimizer=keras.optimizers.Adam( lr=0.0001 ) )
model.summary()

As observed, the last layer's activation function is 'linear'.

But with regards to YOLO's output, all the values ( confidence score, bounding box coordinates and class probabilities ) are normalized. So should I use a sigmoid activation function or a linear activation function?

I cannot find the output layer's activation function in any of resources concerning YOLO.

I think it should be linear with the same filter size and stride 1. They are normalized because of previous layers. Maybe you can put the kernel size to 1. Like in [here](https://github.com/AlexeyAB/darknet/blob/4c315ea26b56c2bf20ebc240d94386c6e3cc83db/cfg/yolov3.cfg#L772) — Hadi GhahremanNezhad, Aug 10 '19 at 19:39

score 0 · Answer 1 · answered Mar 15 '21 at 14:26

If you refer to the original paper, they use linear activation for the final layer. In section "2.2. Training" you can find:

We use a linear activation function for the final layer and all other layers use the following leaky rectified linear activation...

YOLO algorithm output layer activation function

1 Answers1