I am trying to solve the iris data set with a neural network I wrote in C++ from scratch, which has 150 rows divided up into 3 different flowers with 4 columns and then a fifth for the flower type which I converted to a 0, 1 or 2.
Problem: Whenever I run the network it will go through a test set of 90 rows, split into 3 different flowers (30, 30, 30). Every time I run an epoch it will show the output values being all very high like (0.99, 0.99, 0.98). It will do that for a few epochs and then eventually get lower to more sensible values. But when it will get to the later epochs, when I'm doing say 50 epochs, the values for the correct flower will get closer and closer to 1.00, for each flower, then do the same for the next flower and the flower after that, then it will start that process over. Instead of starting close to 1.0 indicating that it had learned and the weights were properly adjusted.
Console output for running epoch (which runs forward_prop(), back_prop() and then update_weights()), after each epoch it prints out the output values for the network. Printing at the end of the epoch means that the actual values are always {0, 0, 1}
. When I ran the network I ran it 1000 times, the output values never changed for every epoch after 15. Why is it doing this?
File parsed, weights and bias randomized
Epoch 1
0.97 0.97 0.99 Epoch 2
0.93 0.94 0.99 Epoch 3
0.64 0.70 0.99 Epoch 4
0.27 0.36 0.99 Epoch 5
0.22 0.31 0.99 Epoch 6
0.21 0.30 0.99 Epoch 7
0.21 0.30 0.98 Epoch 8
0.21 0.30 0.98 Epoch 9
0.21 0.30 0.96 Epoch 10
0.21 0.30 0.88 Epoch 11
0.21 0.30 0.66 Epoch 12
0.21 0.30 0.56 Epoch 13
0.21 0.30 0.54 Epoch 14
0.21 0.30 0.53 Epoch 15
0.21 0.30 0.53 completed successfully
End console output.
Example of epoch 9
0.21 0.30 0.98
0.21 0.30 0.98
0.22 0.29 0.98
0.23 0.29 0.98
0.24 0.28 0.98
0.25 0.28 0.98
0.25 0.27 0.98
0.26 0.27 0.98
0.27 0.27 0.98
0.28 0.26 0.98
0.29 0.26 0.98
0.30 0.26 0.98
0.31 0.26 0.98
0.32 0.25 0.98
0.34 0.25 0.98
0.35 0.24 0.98
0.36 0.24 0.98
0.37 0.24 0.98
0.38 0.24 0.98
0.40 0.23 0.98
0.41 0.23 0.98
0.42 0.23 0.98
0.43 0.23 0.98
0.44 0.22 0.98
0.45 0.22 0.98
0.46 0.22 0.98
0.48 0.22 0.98
0.49 0.22 0.98
0.50 0.21 0.98
0.51 0.21 0.98
0.53 0.20 0.98
0.52 0.21 0.98
0.50 0.22 0.98
0.49 0.23 0.98
0.48 0.24 0.98
0.47 0.24 0.98
0.46 0.25 0.98
0.45 0.26 0.98
0.44 0.27 0.98
0.43 0.28 0.98
0.42 0.29 0.98
0.42 0.30 0.98
0.41 0.32 0.98
0.40 0.33 0.98
0.39 0.34 0.98
0.38 0.35 0.98
0.38 0.36 0.98
0.37 0.37 0.98
0.36 0.38 0.98
0.35 0.40 0.98
0.35 0.41 0.98
0.34 0.42 0.98
0.34 0.43 0.98
0.33 0.44 0.98
0.32 0.46 0.98
0.32 0.47 0.98
0.31 0.48 0.98
0.31 0.49 0.98
0.30 0.50 0.98
0.30 0.51 0.97
0.30 0.52 0.98
0.29 0.51 0.98
0.29 0.50 0.98
0.28 0.49 0.98
0.28 0.48 0.98
0.27 0.47 0.98
0.27 0.46 0.97
0.27 0.45 0.98
0.26 0.44 0.98
0.26 0.43 0.98
0.26 0.42 0.98
0.25 0.41 0.98
0.25 0.40 0.98
0.25 0.40 0.98
0.24 0.39 0.98
0.24 0.38 0.98
0.24 0.37 0.98
0.24 0.37 0.98
0.23 0.36 0.98
0.23 0.35 0.98
0.23 0.35 0.98
0.23 0.34 0.98
0.22 0.33 0.98
0.22 0.33 0.98
0.22 0.32 0.98
0.22 0.32 0.98
0.21 0.31 0.98
0.21 0.31 0.98
0.21 0.30 0.98
0.21 0.30 0.98 Epoch 9
So with epoch 9 the first 30 rows have an actual value of {1, 0, 0}, then next 30 have an actual value of {0, 1, 0} and finally the last 30 have an actual value of {0, 0, 1}. See how it inches closer and closer for each row of data, yet the last row stays the same, while not staying the same for all the epochs. This is strange and I am not sure exactly why it is doing this.
So the basic structure of the program is:
main()
executes, declare and initialize a class Neural_Network with a input, hidden and output layer.
calling train()
then executes epoch()
which runs in a loop the amount of times specified when calling train.
epoch()
itself runs forward_prop()
, back_prop()
and finally update_network()
, there are also a few variables like arrays for the expected and actual values for the output.
The vectors bias, values, weights and errors all hold the values for the network separately, which I found was better for readability. the first layer or position [0] of the weights vector is empty and the input values use the weights in the hidden layer and the hidden layer uses the weights in the output layer.
Each weight is a vector of weights equal to the amount of nodes in the previous layer, Position [0] of the vector of weights is used for the node at position [0] in the previous layer.
#include <iostream>
#include <cstdlib>
#include <iomanip>
#include <cmath>
#include <fstream>
#include <sstream>
#include <vector>
#include <array>
#include <string>
#include <numeric>
class Neural_Network
{
private:
std::vector<std::array<double, 4>> training_set; // 30 setosa -> 30 versicolor -> 30 virginica
std::vector<std::vector<double>> values, bias, errors;
std::vector<std::vector<std::vector<double>>> weights;
size_t net_size = 0;
double dot_val(std::vector<double> val, std::vector<double> weights);
double sigmoid(const double num);
double random_number();
double transfer_derivitive(double num);
void initialize(std::vector<size_t> layers);
void forward_prop(std::vector<double>& expected);
void back_prop(std::vector<double> expected);
void update_network(double l_rate);
public:
Neural_Network(const std::vector<std::array<double, 4>>& data);
~Neural_Network() = default;
void train(size_t epochs = 1);
void display();
};
Neural_Network::Neural_Network(const std::vector<std::array<double, 4>>& data) : training_set{ data }
{
initialize({ 4, 6, 3 });
}
double Neural_Network::dot_val(std::vector<double> val, std::vector<double> weights)
{
return std::inner_product(val.begin(), val.end(), weights.begin(), 0.0);
}
double Neural_Network::sigmoid(const double num)
{
return (1 / (1 + exp(-num)));
}
double Neural_Network::random_number()
{
return (double)rand() / (double)RAND_MAX;
}
double Neural_Network::transfer_derivitive(double num)
{
return num * (1 - num);
}
void Neural_Network::display()
{
std::cout << std::fixed << std::setprecision(2) << "values:\n";
for (size_t i = 0; i < values.size(); ++i)
{
std::cout << "layer " << i << "\n[ ";
for (size_t j = 0; j < values[i].size(); ++j)
std::cout << values.at(i).at(j) << " ";
std::cout << " ]\n";
}
}
void Neural_Network::initialize(std::vector<size_t> layers)
{
for (size_t i = 0; i < layers.size(); ++i)
{
std::vector<double> v{}, b{}, e{};
std::vector<std::vector<double>> w{};
//initializing the nodes in the layers
for (size_t j = 0; j < layers.at(i); ++j)
{
v.push_back(0);
b.push_back(random_number());
e.push_back(1);
std::vector<double> inner_w{};
if (i != 0) // checking if the current layer is the input
for (size_t k = 0; k < layers.at(i - 1); ++k) // adding weights to the current layer to the amount of nodes in the next layer
inner_w.push_back(random_number()); // adding a weight to the current layer for a node in the next layer
w.push_back(inner_w);
}
values.push_back(v);
bias.push_back(b);
errors.push_back(e);
weights.push_back(w);
++net_size;
}
std::cout << "initialize network success" << std::endl;
}
void Neural_Network::train(size_t epoch_count)
{
const size_t count = epoch_count;
while (epoch_count > 0)
{
std::cout << "\nEpoch " << 1 + (count - epoch_count) << std::endl;
for (size_t i = 0; i < 90; ++i)
{
std::vector<double> expected{ 0, 0, 0 };
if (i < 30)
expected[0] = 1;
else if (i < 60)
expected[1] = 1;
else if (i < 90)
expected[2] = 1;
for (size_t j = 0; j < values[0].size(); ++j) // Initialize input layer values
values.at(0).at(j) = training_set.at(i).at(j); // value[0] is the input layer, j is the node
forward_prop(expected);
back_prop(expected);
update_network(0.05);
}
display();
--epoch_count;
}
}
void Neural_Network::forward_prop(std::vector<double>& expected)
{
for (size_t i = 1; i < net_size - 1; ++i) // looping through every layer except the first and last
for (size_t j = 0; j < values.at(i).size(); ++j) // looping through every node in the current non input/output layer
values.at(i).at(j) = sigmoid(dot_val(values.at(i - 1), weights.at(i).at(j)) + bias.at(i).at(j)); // assigning node j of layer i a sigmoided val that is the dotval + the associated bias
for (size_t i = 0; i < values.at(net_size - 1).size(); ++i) // looping through the ouptut layer
values.at(net_size - 1).at(i) = sigmoid(dot_val(values.at(net_size - 2), weights.at(net_size - 1).at(i)) + bias.at(net_size - 1).at(i));
}
void Neural_Network::back_prop(std::vector<double> expected) // work backwards from the output layer
{
std::vector<double> output_errors{};
for (size_t i = 0; i < errors.at(net_size - 1).size(); ++i) // looping through the output layer
{
output_errors.push_back(expected.at(i) - values.at(net_size - 1).at(i));
errors.at(net_size - 1).at(i) = output_errors.at(i) * transfer_derivitive(values.at(net_size - 1).at(i));
} // output layer finished
for (size_t i = net_size - 2; i > 0; i--) // looping through the non output layers backwards
{
std::vector<double> layer_errors{};
for (size_t j = 0; j < errors.at(i).size(); ++j) // looping through the current layer's nodes
{
double error = 0;
for (size_t k = 0; k < weights.at(i + 1).size(); ++k) // looping through the current set of weights
error += errors.at(i).at(j) * weights.at(i + 1).at(k).at(j);
layer_errors.push_back(error);
}
for (size_t j = 0; j < layer_errors.size(); ++j)
errors.at(i).at(j) = layer_errors.at(j) * transfer_derivitive(values.at(i).at(j));
}
}
void Neural_Network::update_network(double l_rate)
{
for (size_t i = 1; i < net_size; ++i)
{
for (size_t j = 0; j < weights.at(i).size(); ++j)
{
for (size_t k = 0; k < weights.at(i).at(j).size(); ++k)
weights.at(i).at(j).at(k) += l_rate * errors.at(i).at(j) * values.at(i - 1).at(j);
bias.at(i).at(j) += l_rate * errors.at(i).at(j);
}
}
}
int main()
{
std::vector<std::array<double, 4>> data = {
{5.1, 3.5, 1.4, 0.2},
{4.9, 3, 1.4, 0.2},
{4.7, 3.2, 1.3, 0.2},
{4.6, 3.1, 1.5, 0.2},
{5, 3.6, 1.4, 0.2},
{5.4, 3.9, 1.7, 0.4},
{4.6, 3.4, 1.4, 0.3},
{5, 3.4, 1.5, 0.2},
{4.4, 2.9, 1.4, 0.2},
{4.9, 3.1, 1.5, 0.1},
{5.4, 3.7, 1.5, 0.2},
{4.8, 3.4, 1.6, 0.2},
{4.8, 3, 1.4, 0.1},
{4.3, 3, 1.1, 0.1},
{5.8, 4, 1.2, 0.2},
{5.7, 4.4, 1.5, 0.4},
{5.4, 3.9, 1.3, 0.4},
{5.1, 3.5, 1.4, 0.3},
{5.7, 3.8, 1.7, 0.3},
{5.1, 3.8, 1.5, 0.3},
{5.4, 3.4, 1.7, 0.2},
{5.1, 3.7, 1.5, 0.4},
{4.6, 3.6, 1, 0.2},
{5.1, 3.3, 1.7, 0.5},
{4.8, 3.4, 1.9, 0.2},
{5, 3, 1.6, 0.2},
{5, 3.4, 1.6, 0.4},
{5.2, 3.5, 1.5, 0.2},
{5.2, 3.4, 1.4, 0.2},
{4.7, 3.2, 1.6, 0.2},
{7, 3.2, 4.7, 1.4},
{6.4, 3.2, 4.5, 1.5},
{6.9, 3.1, 4.9, 1.5},
{5.5, 2.3, 4, 1.3},
{6.5, 2.8, 4.6, 1.5},
{5.7, 2.8, 4.5, 1.3},
{6.3, 3.3, 4.7, 1.6},
{4.9, 2.4, 3.3, 1},
{6.6, 2.9, 4.6, 1.3},
{5.2, 2.7, 3.9, 1.4},
{5, 2, 3.5, 1},
{5.9, 3, 4.2, 1.5},
{6, 2.2, 4, 1},
{6.1, 2.9, 4.7, 1.4},
{5.6, 2.9, 3.6, 1.3},
{6.7, 3.1, 4.4, 1.4},
{5.6, 3, 4.5, 1.5},
{5.8, 2.7, 4.1, 1},
{6.2, 2.2, 4.5, 1.5},
{5.6, 2.5, 3.9, 1.1},
{5.9, 3.2, 4.8, 1.8},
{6.1, 2.8, 4, 1.3},
{6.3, 2.5, 4.9, 1.5},
{6.1, 2.8, 4.7, 1.2},
{6.4, 2.9, 4.3, 1.3},
{6.6, 3, 4.4, 1.4},
{6.8, 2.8, 4.8, 1.4},
{6.7, 3, 5, 1.7},
{6, 2.9, 4.5, 1.5},
{5.7, 2.6, 3.5, 1},
{6.3, 3.3, 6, 2.5},
{5.8, 2.7, 5.1, 1.9},
{7.1, 3, 5.9, 2.1},
{6.3, 2.9, 5.6, 1.8},
{6.5, 3, 5.8, 2.2},
{7.6, 3, 6.6, 2.1},
{4.9, 2.5, 4.5, 1.7},
{7.3, 2.9, 6.3, 1.8},
{6.7, 2.5, 5.8, 1.8},
{7.2, 3.6, 6.1, 2.5},
{6.5, 3.2, 5.1, 2},
{6.4, 2.7, 5.3, 1.9},
{6.8, 3, 5.5, 2.1},
{5.7, 2.5, 5, 2},
{5.8, 2.8, 5.1, 2.4},
{6.4, 3.2, 5.3, 2.3},
{6.5, 3, 5.5, 1.8},
{7.7, 3.8, 6.7, 2.2},
{7.7, 2.6, 6.9, 2.3},
{6, 2.2, 5, 1.5},
{6.9, 3.2, 5.7, 2.3},
{5.6, 2.8, 4.9, 2},
{7.7, 2.8, 6.7, 2},
{6.3, 2.7, 4.9, 1.8},
{6.7, 3.3, 5.7, 2.1},
{7.2, 3.2, 6, 1.8},
{6.2, 2.8, 4.8, 1.8},
{6.1, 3, 4.9, 1.8},
{6.4, 2.8, 5.6, 2.1},
{7.2, 3, 5.8, 1.6}
};
Neural_Network network{ data };
network.train(1);
return 0;
}
Edit to use .at() instead of [] for accessing std::vector in program
I hope I made everything clear, if not let me know.
note: I had this question of stackoverflow, I was told that I should move it to codereview.stackexchange, then they told me I should move it back to stackoverflow again, while reframing my question with more detail. Please don't tell me to move this question a 3rd time. If there is something wrong with the way I am asking please give me a chance to change it or add information so I can get some help, please and thank you