What is wrong with this MLP backpropogation implementation?

Question

I am writing a MLP neural network in C++, and I am struggling with backpropogation. My implementation follows this article closely, but I've done something wrong and can't spot the problem. My Matrix class confirms that there are no mismatched dimensions in any of Matrix calculations, but the output seems to always approach zero or some variant of infinity. Is this "vanishing" or "exploding" gradients as mentioned here, or is there something else going wrong?

Here is my activation function and its derivative:

double sigmoid(double d) {
    return 1/(1+exp(-d));
}

double dsigmoid(double d) {
    return sigmoid(d) * (1 - sigmoid(d));
}

Here is my training algorithm:

void KNN::train(const Matrix& input, const Matrix& target) {
    this->layer[0] = input;
    for(uint i = 1; i <= this->num_depth+1; i++) {
        this->layer[i] = Matrix::multiply(this->weights[i-1], this->layer[i-1]);
        this->layer[i] = Matrix::function(this->layer[i], sigmoid);
    }
    this->deltas[this->num_depth+1] = Matrix::multiply(Matrix::subtract(this->layer[this->num_depth+1], target), Matrix::function(Matrix::multiply(this->weights[this->num_depth], this->layer[this->num_depth]), dsigmoid), true);
    this->gradients[this->num_depth+1] = Matrix::multiply(this->deltas[this->num_depth+1], Matrix::transpose(this->layer[this->num_depth]));
    this->weights[this->num_depth] = Matrix::subtract(this->weights[this->num_depth], Matrix::multiply(Matrix::multiply(this->weights[this->num_depth], this->learning_rate), this->gradients[this->num_depth+1], true));
    for(int i = this->num_depth; i > 0; i--) {
        this->deltas[i] = Matrix::multiply(Matrix::multiply(Matrix::transpose(this->weights[i]), this->deltas[i+1]), Matrix::function(Matrix::multiply(this->weights[i-1], this->layer[i-1]), dsigmoid), true);
        this->gradients[i] = Matrix::multiply(this->deltas[i], Matrix::transpose(this->layer[i-1]));
        this->weights[i-1] = Matrix::subtract(this->weights[i-1], Matrix::multiply(Matrix::multiply(this->weights[i-1], this->learning_rate), this->gradients[i], true));
    }
}

The third argument in Matrix::multiply tells whether or not to use Hadamard product (default is false). this->num_depth is the number of hidden layers.

Adding biases seems to do... something, but output almost always tends towards zero.

void KNN::train(const Matrix& input, const Matrix& target) {
    this->layer[0] = input;
    for(uint i = 1; i <= this->num_depth+1; i++) {
        this->layer[i] = Matrix::multiply(this->weights[i-1], this->layer[i-1]);
        this->layer[i] = Matrix::add(this->layer[i], this->biases[i-1]);
        this->layer[i] = Matrix::function(this->layer[i], this->activation);
    }
    this->deltas[this->num_depth+1] = Matrix::multiply(Matrix::subtract(this->layer[this->num_depth+1], target), Matrix::function(Matrix::multiply(this->weights[this->num_depth], this->layer[this->num_depth]), this->dactivation), true);
    this->gradients[this->num_depth+1] = Matrix::multiply(this->deltas[this->num_depth+1], Matrix::transpose(this->layer[this->num_depth]));
    this->weights[this->num_depth] = Matrix::subtract(this->weights[this->num_depth], Matrix::multiply(Matrix::multiply(this->weights[this->num_depth], this->learning_rate), this->gradients[this->num_depth+1], true));
    this->biases[this->num_depth] = Matrix::subtract(this->biases[this->num_depth], Matrix::multiply(this->deltas[this->num_depth+1], this->learning_rate * .5));
    for(uint i = this->num_depth+1 -1; i > 0; i--) {
        this->deltas[i] = Matrix::multiply(Matrix::multiply(Matrix::transpose(this->weights[i+1 -1]), this->deltas[i+1]), Matrix::function(Matrix::multiply(this->weights[i-1], this->layer[i-1]), this->dactivation), true);
        this->gradients[i] = Matrix::multiply(this->deltas[i], Matrix::transpose(this->layer[i-1]));
        this->weights[i-1] = Matrix::subtract(this->weights[i-1], Matrix::multiply(Matrix::multiply(this->weights[i-1], this->learning_rate), this->gradients[i], true));
        this->biases[i-1] = Matrix::subtract(this->biases[i-1], Matrix::multiply(this->deltas[i], this->learning_rate * .5));
    }
}

On GitHub, you might see that I experimented with the SiLU activation function as well, but the results are the same. — Zachary, Oct 23 '20 at 17:24
I spotted a bug in my randomize function. Oops! It's fixed now. The output is still the same regardless. — Zachary, Oct 24 '20 at 04:25

What is wrong with this MLP backpropogation implementation?

0 Answers0