Backpropagation giving strange values in C++ neural network

Question

I am trying to solve the iris data set with a neural network I wrote in C++ from scratch, which has 150 rows divided up into 3 different flowers with 4 columns and then a fifth for the flower type which I converted to a 0, 1 or 2.

Problem: Whenever I run the network it will go through a test set of 90 rows, split into 3 different flowers (30, 30, 30). Every time I run an epoch it will show the output values being all very high like (0.99, 0.99, 0.98). It will do that for a few epochs and then eventually get lower to more sensible values. But when it will get to the later epochs, when I'm doing say 50 epochs, the values for the correct flower will get closer and closer to 1.00, for each flower, then do the same for the next flower and the flower after that, then it will start that process over. Instead of starting close to 1.0 indicating that it had learned and the weights were properly adjusted.

Console output for running epoch (which runs forward_prop(), back_prop() and then update_weights()), after each epoch it prints out the output values for the network. Printing at the end of the epoch means that the actual values are always {0, 0, 1}. When I ran the network I ran it 1000 times, the output values never changed for every epoch after 15. Why is it doing this?

File parsed, weights and bias randomized

Epoch 1

0.97 0.97 0.99 Epoch 2

0.93 0.94 0.99 Epoch 3

0.64 0.70 0.99 Epoch 4

0.27 0.36 0.99 Epoch 5

0.22 0.31 0.99 Epoch 6

0.21 0.30 0.99 Epoch 7

0.21 0.30 0.98 Epoch 8

0.21 0.30 0.98 Epoch 9

0.21 0.30 0.96 Epoch 10

0.21 0.30 0.88 Epoch 11

0.21 0.30 0.66 Epoch 12

0.21 0.30 0.56 Epoch 13

0.21 0.30 0.54 Epoch 14

0.21 0.30 0.53 Epoch 15

0.21 0.30 0.53 completed successfully

End console output.

Example of epoch 9

0.21 0.30 0.98
0.21 0.30 0.98
0.22 0.29 0.98
0.23 0.29 0.98
0.24 0.28 0.98
0.25 0.28 0.98
0.25 0.27 0.98
0.26 0.27 0.98 
0.27 0.27 0.98
0.28 0.26 0.98
0.29 0.26 0.98
0.30 0.26 0.98
0.31 0.26 0.98
0.32 0.25 0.98
0.34 0.25 0.98
0.35 0.24 0.98
0.36 0.24 0.98
0.37 0.24 0.98 
0.38 0.24 0.98
0.40 0.23 0.98
0.41 0.23 0.98
0.42 0.23 0.98
0.43 0.23 0.98
0.44 0.22 0.98
0.45 0.22 0.98
0.46 0.22 0.98 
0.48 0.22 0.98
0.49 0.22 0.98
0.50 0.21 0.98 
0.51 0.21 0.98
0.53 0.20 0.98
0.52 0.21 0.98
0.50 0.22 0.98
0.49 0.23 0.98
0.48 0.24 0.98
0.47 0.24 0.98
0.46 0.25 0.98
0.45 0.26 0.98
0.44 0.27 0.98 
0.43 0.28 0.98
0.42 0.29 0.98
0.42 0.30 0.98
0.41 0.32 0.98 
0.40 0.33 0.98
0.39 0.34 0.98
0.38 0.35 0.98
0.38 0.36 0.98
0.37 0.37 0.98
0.36 0.38 0.98
0.35 0.40 0.98
0.35 0.41 0.98
0.34 0.42 0.98
0.34 0.43 0.98
0.33 0.44 0.98
0.32 0.46 0.98 
0.32 0.47 0.98
0.31 0.48 0.98
0.31 0.49 0.98 
0.30 0.50 0.98
0.30 0.51 0.97
0.30 0.52 0.98
0.29 0.51 0.98
0.29 0.50 0.98
0.28 0.49 0.98
0.28 0.48 0.98
0.27 0.47 0.98
0.27 0.46 0.97 
0.27 0.45 0.98
0.26 0.44 0.98
0.26 0.43 0.98
0.26 0.42 0.98
0.25 0.41 0.98
0.25 0.40 0.98
0.25 0.40 0.98
0.24 0.39 0.98 
0.24 0.38 0.98
0.24 0.37 0.98
0.24 0.37 0.98
0.23 0.36 0.98
0.23 0.35 0.98 
0.23 0.35 0.98
0.23 0.34 0.98
0.22 0.33 0.98
0.22 0.33 0.98
0.22 0.32 0.98
0.22 0.32 0.98
0.21 0.31 0.98
0.21 0.31 0.98
0.21 0.30 0.98 
0.21 0.30 0.98 Epoch 9

So with epoch 9 the first 30 rows have an actual value of {1, 0, 0}, then next 30 have an actual value of {0, 1, 0} and finally the last 30 have an actual value of {0, 0, 1}. See how it inches closer and closer for each row of data, yet the last row stays the same, while not staying the same for all the epochs. This is strange and I am not sure exactly why it is doing this.

So the basic structure of the program is:

main() executes, declare and initialize a class Neural_Network with a input, hidden and output layer.

calling train() then executes epoch() which runs in a loop the amount of times specified when calling train.

epoch() itself runs forward_prop(), back_prop() and finally update_network(), there are also a few variables like arrays for the expected and actual values for the output.

The vectors bias, values, weights and errors all hold the values for the network separately, which I found was better for readability. the first layer or position [0] of the weights vector is empty and the input values use the weights in the hidden layer and the hidden layer uses the weights in the output layer.

Each weight is a vector of weights equal to the amount of nodes in the previous layer, Position [0] of the vector of weights is used for the node at position [0] in the previous layer.

#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <cmath>
#include <fstream>
#include <sstream>
#include <vector>
#include <array>
#include <string>
#include <numeric>

class Neural_Network
{
private:
    std::vector<std::array<double, 4>> training_set; // 30 setosa -> 30 versicolor -> 30 virginica
    std::vector<std::vector<double>> values, bias, errors;
    std::vector<std::vector<std::vector<double>>> weights;
    size_t net_size = 0;
    double dot_val(std::vector<double> val, std::vector<double> weights);
    double sigmoid(const double num);
    double random_number();
    double transfer_derivitive(double num);
    void initialize(std::vector<size_t> layers);
    void forward_prop(std::vector<double>& expected);
    void back_prop(std::vector<double> expected);
    void update_network(double l_rate);

public:
    Neural_Network(const std::vector<std::array<double, 4>>& data);
    ~Neural_Network() = default;
    void train(size_t epochs = 1);
    void display();
};

Neural_Network::Neural_Network(const std::vector<std::array<double, 4>>& data) : training_set{ data }
{
    initialize({ 4, 6, 3 });
}

double Neural_Network::dot_val(std::vector<double> val, std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}

double Neural_Network::sigmoid(const double num)
{
    return (1 / (1 + exp(-num)));
}

double Neural_Network::random_number()
{
    return (double)rand() / (double)RAND_MAX;
}

double Neural_Network::transfer_derivitive(double num)
{
    return num * (1 - num);
}

void Neural_Network::display()
{
    std::cout << std::fixed << std::setprecision(2) << "values:\n";
    for (size_t i = 0; i < values.size(); ++i)
    {
        std::cout << "layer " << i << "\n[ ";
        for (size_t j = 0; j < values[i].size(); ++j)
            std::cout << values.at(i).at(j) << " ";
        std::cout << " ]\n";
    }
}

void Neural_Network::initialize(std::vector<size_t> layers)
{
    for (size_t i = 0; i < layers.size(); ++i)
    {
        std::vector<double> v{}, b{}, e{};
        std::vector<std::vector<double>> w{};
        //initializing the nodes in the layers
        for (size_t j = 0; j < layers.at(i); ++j)
        {
            v.push_back(0);
            b.push_back(random_number());
            e.push_back(1);
            std::vector<double> inner_w{};
            if (i != 0)                                    // checking if the current layer is the input
                for (size_t k = 0; k < layers.at(i - 1); ++k) // adding weights to the current layer to the amount of nodes in the next layer
                    inner_w.push_back(random_number());    // adding a weight to the current layer for a node in the next layer
            w.push_back(inner_w);
        }
        values.push_back(v);
        bias.push_back(b);
        errors.push_back(e);
        weights.push_back(w);
        ++net_size;
    }
    std::cout << "initialize network success" << std::endl;
}

void Neural_Network::train(size_t epoch_count)
{
    const size_t count = epoch_count;
    while (epoch_count > 0)
    {
        std::cout << "\nEpoch " << 1 + (count - epoch_count) << std::endl;
        for (size_t i = 0; i < 90; ++i)
        {
            std::vector<double> expected{ 0, 0, 0 };
            if (i < 30)
                expected[0] = 1;
            else if (i < 60)
                expected[1] = 1;
            else if (i < 90)
                expected[2] = 1;
            for (size_t j = 0; j < values[0].size(); ++j) // Initialize input layer values
                values.at(0).at(j) = training_set.at(i).at(j);        // value[0] is the input layer, j is the node
            forward_prop(expected);
            back_prop(expected);
            update_network(0.05);
        }
        display();
        --epoch_count;
    }
}

void Neural_Network::forward_prop(std::vector<double>& expected)
{
    for (size_t i = 1; i < net_size - 1; ++i)                                           // looping through every layer except the first and last
        for (size_t j = 0; j < values.at(i).size(); ++j)                                   // looping through every node in the current non input/output layer
            values.at(i).at(j) = sigmoid(dot_val(values.at(i - 1), weights.at(i).at(j)) + bias.at(i).at(j)); // assigning node j of layer i a sigmoided val that is the dotval + the associated bias
    for (size_t i = 0; i < values.at(net_size - 1).size(); ++i)                            // looping through the ouptut layer
        values.at(net_size - 1).at(i) = sigmoid(dot_val(values.at(net_size - 2), weights.at(net_size - 1).at(i)) + bias.at(net_size - 1).at(i));
}

void Neural_Network::back_prop(std::vector<double> expected) // work backwards from the output layer
{
    std::vector<double> output_errors{};
    for (size_t i = 0; i < errors.at(net_size - 1).size(); ++i) // looping through the output layer
    {
        output_errors.push_back(expected.at(i) - values.at(net_size - 1).at(i));
        errors.at(net_size - 1).at(i) = output_errors.at(i) * transfer_derivitive(values.at(net_size - 1).at(i));
    }                                         // output layer finished
    for (size_t i = net_size - 2; i > 0; i--) // looping through the non output layers backwards
    {
        std::vector<double> layer_errors{};
        for (size_t j = 0; j < errors.at(i).size(); ++j) // looping through the current layer's nodes
        {
            double error = 0;
            for (size_t k = 0; k < weights.at(i + 1).size(); ++k) // looping through the current set of weights
                error += errors.at(i).at(j) * weights.at(i + 1).at(k).at(j);
            layer_errors.push_back(error);
        }
        for (size_t j = 0; j < layer_errors.size(); ++j)
            errors.at(i).at(j) = layer_errors.at(j) * transfer_derivitive(values.at(i).at(j));
    }
}

void Neural_Network::update_network(double l_rate)
{
    for (size_t i = 1; i < net_size; ++i)
    {
        for (size_t j = 0; j < weights.at(i).size(); ++j)
        {
            for (size_t k = 0; k < weights.at(i).at(j).size(); ++k)
                weights.at(i).at(j).at(k) += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j);
            bias.at(i).at(j) += l_rate * errors.at(i).at(j);
        }
    }
}

int main()
{
    std::vector<std::array<double, 4>> data = {
        {5.1, 3.5, 1.4, 0.2},
        {4.9, 3, 1.4, 0.2},
        {4.7, 3.2, 1.3, 0.2},
        {4.6, 3.1, 1.5, 0.2},
        {5, 3.6, 1.4, 0.2},
        {5.4, 3.9, 1.7, 0.4},
        {4.6, 3.4, 1.4, 0.3},
        {5, 3.4, 1.5, 0.2},
        {4.4, 2.9, 1.4, 0.2},
        {4.9, 3.1, 1.5, 0.1},
        {5.4, 3.7, 1.5, 0.2},
        {4.8, 3.4, 1.6, 0.2},
        {4.8, 3, 1.4, 0.1},
        {4.3, 3, 1.1, 0.1},
        {5.8, 4, 1.2, 0.2},
        {5.7, 4.4, 1.5, 0.4},
        {5.4, 3.9, 1.3, 0.4},
        {5.1, 3.5, 1.4, 0.3},
        {5.7, 3.8, 1.7, 0.3},
        {5.1, 3.8, 1.5, 0.3},
        {5.4, 3.4, 1.7, 0.2},
        {5.1, 3.7, 1.5, 0.4},
        {4.6, 3.6, 1, 0.2},
        {5.1, 3.3, 1.7, 0.5},
        {4.8, 3.4, 1.9, 0.2},
        {5, 3, 1.6, 0.2},
        {5, 3.4, 1.6, 0.4},
        {5.2, 3.5, 1.5, 0.2},
        {5.2, 3.4, 1.4, 0.2},
        {4.7, 3.2, 1.6, 0.2},
        {7, 3.2, 4.7, 1.4},
        {6.4, 3.2, 4.5, 1.5},
        {6.9, 3.1, 4.9, 1.5},
        {5.5, 2.3, 4, 1.3},
        {6.5, 2.8, 4.6, 1.5},
        {5.7, 2.8, 4.5, 1.3},
        {6.3, 3.3, 4.7, 1.6},
        {4.9, 2.4, 3.3, 1},
        {6.6, 2.9, 4.6, 1.3},
        {5.2, 2.7, 3.9, 1.4},
        {5, 2, 3.5, 1},
        {5.9, 3, 4.2, 1.5},
        {6, 2.2, 4, 1},
        {6.1, 2.9, 4.7, 1.4},
        {5.6, 2.9, 3.6, 1.3},
        {6.7, 3.1, 4.4, 1.4},
        {5.6, 3, 4.5, 1.5},
        {5.8, 2.7, 4.1, 1},
        {6.2, 2.2, 4.5, 1.5},
        {5.6, 2.5, 3.9, 1.1},
        {5.9, 3.2, 4.8, 1.8},
        {6.1, 2.8, 4, 1.3},
        {6.3, 2.5, 4.9, 1.5},
        {6.1, 2.8, 4.7, 1.2},
        {6.4, 2.9, 4.3, 1.3},
        {6.6, 3, 4.4, 1.4},
        {6.8, 2.8, 4.8, 1.4},
        {6.7, 3, 5, 1.7},
        {6, 2.9, 4.5, 1.5},
        {5.7, 2.6, 3.5, 1},
        {6.3, 3.3, 6, 2.5},
        {5.8, 2.7, 5.1, 1.9},
        {7.1, 3, 5.9, 2.1},
        {6.3, 2.9, 5.6, 1.8},
        {6.5, 3, 5.8, 2.2},
        {7.6, 3, 6.6, 2.1},
        {4.9, 2.5, 4.5, 1.7},
        {7.3, 2.9, 6.3, 1.8},
        {6.7, 2.5, 5.8, 1.8},
        {7.2, 3.6, 6.1, 2.5},
        {6.5, 3.2, 5.1, 2},
        {6.4, 2.7, 5.3, 1.9},
        {6.8, 3, 5.5, 2.1},
        {5.7, 2.5, 5, 2},
        {5.8, 2.8, 5.1, 2.4},
        {6.4, 3.2, 5.3, 2.3},
        {6.5, 3, 5.5, 1.8},
        {7.7, 3.8, 6.7, 2.2},
        {7.7, 2.6, 6.9, 2.3},
        {6, 2.2, 5, 1.5},
        {6.9, 3.2, 5.7, 2.3},
        {5.6, 2.8, 4.9, 2},
        {7.7, 2.8, 6.7, 2},
        {6.3, 2.7, 4.9, 1.8},
        {6.7, 3.3, 5.7, 2.1},
        {7.2, 3.2, 6, 1.8},
        {6.2, 2.8, 4.8, 1.8},
        {6.1, 3, 4.9, 1.8},
        {6.4, 2.8, 5.6, 2.1},
        {7.2, 3, 5.8, 1.6}
    };

    Neural_Network network{ data };
    network.train(1);
    return 0;
}

Edit to use .at() instead of [] for accessing std::vector in program

I hope I made everything clear, if not let me know.

note: I had this question of stackoverflow, I was told that I should move it to codereview.stackexchange, then they told me I should move it back to stackoverflow again, while reframing my question with more detail. Please don't tell me to move this question a 3rd time. If there is something wrong with the way I am asking please give me a chance to change it or add information so I can get some help, please and thank you

`double output;` -- Did your compiler warn you that this variable is uninitialized? You then use this variable in `dot_val`, thus the results could be anything. Second, there is a [std::inner_product](https://en.cppreference.com/w/cpp/algorithm/inner_product) function that would have prevented this mistake — PaulMcKenzie, Mar 07 '20 at 18:08
Hey @DMS, glad to see you after moving this from Code Review. I think this is a better for for [politics.se] though. Cheers. (just kidding if it weren't obvious) — JohnFilleau, Mar 07 '20 at 18:15
I did not know about std::inner_product, I will test that out later today or tomorrow when I have time. I also initialized `double output` to 0. Nothing changed, and no my compiler didn't warn me at all, everything is running without warning. — DMS, Mar 07 '20 at 18:19
`while(!in_file.eof())` -- [Please read this as to why this is not correct](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) — PaulMcKenzie, Mar 07 '20 at 18:26

score 1 · Answer 1 · answered Mar 07 '20 at 18:18

1

One obvious mistake is in dot_val:

double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    double output;  // <-- This is uninitialized
    for (size_t i = 0; i < weights.size(); ++i)
        output += val[i] * weights[i];
    return output;  // <-- Who knows what this will be
}

You are using an uninitialized variable. Either initialize output to 0, or you can use std::inner_product :

#include <numeric>
//...
double Neural_Network::dot_val(std::vector<double> val,std::vector<double> weights)
{
    return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}

answered Mar 07 '20 at 18:18

PaulMcKenzie

31,493
4
19
38

I made the change using `std::inner_product` and the output is still the same. – DMS Mar 07 '20 at 18:22
My answer pointed out the obvious error of summing up a value with the initial value undetermined. You may have one or more bugs, but without a [mcve], you will be on your own. – PaulMcKenzie Mar 07 '20 at 18:24
Alright I edited the post and trimmed off any excess unnecessary code and combined everything into one section. You could run the code snippet in a IDE and it will reproduce the issue. I also hard coded the training set into a vector of arrays. – DMS Mar 08 '20 at 19:03
You should start using `at()` instead of `[ ]` when accessing your vectors. If you did that, you will see that this line: `weights[i][j][k] += l_rate * errors[i][j] * values[i-1][j];` has `values` accessing an invalid index for `j`. For example: `weights[i][j][k] += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j);` will throw a `std::out_of_range` exception. Also, you should try to get your code to run in Visual Studio Community -- it would have detected this error in the debug runtime. Right now, your code is invoking undefined behavior. – PaulMcKenzie Mar 08 '20 at 23:43

Backpropagation giving strange values in C++ neural network

1 Answers1