Questions tagged [neural-network]

Network structure inspired by simplified models of biological neurons (brain cells). Neural networks are trained to "learn" by supervised and unsupervised techniques, and can be used to solve optimization problems, approximation problems, classify patterns, and combinations thereof.

Neural networks have many practical applications within the software realm.

An application of neural networks for supervised learning would be training a neural network for optical character recognition or handwriting recognition. The network would be trained on exemplars of characters, and given enough data which are a representative sample of the population, the network can generalize to a wider spectrum of cases that were not encountered during training. The procedure of training a neural network in a supervised learning manner involves a learning algorithm for finding the optimal weights of the neurons in the network that minimize its error at performing a task. Gradient Descent is an example for a learning algorithm common for adjusting the weights of a neural network. It is often accompanied by the backpropagation technique in order to measure the contribution of each weight to the error signal and determine the gradients that guides the learning algorithm in adjusting each weight.

For an example of a backpropagation network in action, see the source of GNU Backgammon

A frequently used network topology in unsupervised learning is the Self-Organizing Map, often attributed to Kohonen. These networks can be used for clustering data, and in general, providing a lower dimensional representation of a higher dimensional space.

See this code project article for an application of the Self-Organizing Map in clustering different images to find all of the unique faces.

Introductory Video

Neural Networks Demystified (Jupyter Notebooks)

Resources/ Recommendations

Neural Networks - Michael Nielsen

18076 questions
877
votes
19 answers

What is the role of the bias in neural networks?

I'm aware of the gradient descent and the back-propagation algorithm. What I don't get is: when is using a bias important and how do you use it? For example, when mapping the AND function, when I use two inputs and one output, it does not give the…
446
votes
14 answers

Epoch vs Iteration when training neural networks

What is the difference between epoch and iteration when training a multi-layer perceptron?
388
votes
6 answers

What are advantages of Artificial Neural Networks over Support Vector Machines?

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is…
Channel72
  • 22,459
  • 30
  • 97
  • 168
328
votes
1 answer

Extremely small or NaN values appear in training neural network

I'm trying to implement a neural network architecture in Haskell, and use it on MNIST. I'm using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn't crash. But the problem is,…
Charles Langlois
  • 3,968
  • 2
  • 11
  • 20
321
votes
2 answers

Keras input explanation: input_shape, units, batch_size, dim, etc

For any Keras layer (Layer class), can someone explain how to understand the difference between input_shape, units, dim, etc.? For example the doc says units specify the output shape of a layer. In the image of the neural net below hidden layer1…
scarecrow
  • 5,584
  • 5
  • 18
  • 38
310
votes
11 answers

What is the meaning of the word logits in TensorFlow?

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathematical function? loss_function =…
232
votes
3 answers

How to interpret loss and accuracy for a machine learning model

When I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch. How should I interpret this variable? Higher loss is better or worse, or what does it mean for the final performance (accuracy) of my…
191
votes
12 answers

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I'm trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don't understand why this is. It's a multiclass problem, doesn't that mean that I have…
188
votes
8 answers

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't see where I'm supposed to call it. Below is my code…
pr338
  • 7,310
  • 14
  • 45
  • 64
186
votes
10 answers

Why use softmax as opposed to standard normalization?

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution: This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are…
Tom
  • 5,773
  • 9
  • 33
  • 45
166
votes
10 answers

Why do we have to normalize the input for an artificial neural network?

Why do we have to normalize the input for a neural network? I understand that sometimes, when for example the input values are non-numerical a certain transformation must be performed, but when we have a numerical input? Why the numbers must be in a…
karla
  • 4,118
  • 5
  • 32
  • 39
166
votes
8 answers

What's is the difference between train, validation and test set, in neural networks?

I'm using this library to implement a learning agent. I have generated the training cases, but I don't know for sure what the validation and test sets are. The teacher says: 70% should be train cases, 10% will be test cases and the rest 20% should…
Daniel
  • 1,981
  • 5
  • 16
  • 11
163
votes
2 answers

Why do we need to call zero_grad() in PyTorch?

The method zero_grad() needs to be called during training. But the documentation is not very helpful | zero_grad(self) | Sets gradients of all model parameters to zero. Why do we need to call this method?
user1424739
  • 7,204
  • 10
  • 38
  • 67
159
votes
9 answers

Ordering of batch normalization and dropout?

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the correct answer for TensorFlow. When using batch normalization and dropout in…
golmschenk
  • 9,361
  • 17
  • 69
  • 117
155
votes
13 answers

Why must a nonlinear activation function be used in a backpropagation neural network?

I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used? This question is followed by this…
corazza
  • 27,785
  • 32
  • 104
  • 177
1
2 3
99 100