Questions tagged [neural-network]

Network structure inspired by simplified models of biological neurons (brain cells). Neural networks are trained to "learn" by supervised and unsupervised techniques, and can be used to solve optimization problems, approximation problems, classify patterns, and combinations thereof.

Neural networks have many practical applications within the software realm.

An application of neural networks for supervised learning would be training a neural network for optical character recognition or handwriting recognition. The network would be trained on exemplars of characters, and given enough data which are a representative sample of the population, the network can generalize to a wider spectrum of cases that were not encountered during training. The procedure of training a neural network in a supervised learning manner involves a learning algorithm for finding the optimal weights of the neurons in the network that minimize its error at performing a task. Gradient Descent is an example for a learning algorithm common for adjusting the weights of a neural network. It is often accompanied by the backpropagation technique in order to measure the contribution of each weight to the error signal and determine the gradients that guides the learning algorithm in adjusting each weight.

For an example of a backpropagation network in action, see the source of GNU Backgammon

A frequently used network topology in unsupervised learning is the Self-Organizing Map, often attributed to Kohonen. These networks can be used for clustering data, and in general, providing a lower dimensional representation of a higher dimensional space.

See this code project article for an application of the Self-Organizing Map in clustering different images to find all of the unique faces.

Introductory Video

Neural Networks Demystified (Jupyter Notebooks)

Resources/ Recommendations

Neural Networks - Michael Nielsen

18076 questions

877

votes

19 answers

What is the role of the bias in neural networks?

I'm aware of the gradient descent and the back-propagation algorithm. What I don't get is: when is using a bias important and how do you use it? For example, when mapping the AND function, when I use two inputs and one output, it does not give the…

asked Mar 19 '10 at 21:18

Karan

10,539
6
32
38

446

votes

14 answers

Epoch vs Iteration when training neural networks

What is the difference between epoch and iteration when training a multi-layer perceptron?

machine-learning neural-network deep-learning artificial-intelligence terminology

asked Jan 20 '11 at 21:11

mohammad

4,597
4
14
12

388

votes

6 answers

What are advantages of Artificial Neural Networks over Support Vector Machines?

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is…

machine-learning neural-network classification svm

asked Jul 24 '12 at 13:59

Channel72

22,459
30
97
168

328

votes

1 answer

Extremely small or NaN values appear in training neural network

I'm trying to implement a neural network architecture in Haskell, and use it on MNIST. I'm using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn't crash. But the problem is,…

algorithm haskell neural-network backpropagation

asked Jun 21 '17 at 21:32

Charles Langlois

3,968
2
11
20

321

votes

2 answers

Keras input explanation: input_shape, units, batch_size, dim, etc

For any Keras layer (Layer class), can someone explain how to understand the difference between input_shape, units, dim, etc.? For example the doc says units specify the output shape of a layer. In the image of the neural net below hidden layer1…

neural-network deep-learning keras keras-layer tensor

asked Jun 25 '17 at 14:29

scarecrow

5,584
5
18
38

310

votes

11 answers

What is the meaning of the word logits in TensorFlow?

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathematical function? loss_function =…

tensorflow machine-learning neural-network deep-learning cross-entropy

asked Jan 04 '17 at 02:02

Milad P.

3,447
3
9
8

232

votes

3 answers

How to interpret loss and accuracy for a machine learning model

When I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch. How should I interpret this variable? Higher loss is better or worse, or what does it mean for the final performance (accuracy) of my…

machine-learning neural-network mathematical-optimization deep-learning objective-function

asked Dec 29 '15 at 20:33

mamatv

3,133
2
17
23

191

votes

12 answers

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I'm trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don't understand why this is. It's a multiclass problem, doesn't that mean that I have…

machine-learning keras neural-network deep-learning conv-neural-network

asked Feb 07 '17 at 03:34

Daniel Messias

2,383
2
16
21

188

votes

8 answers

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't see where I'm supposed to call it. Below is my code…

python keras neural-network data-science batch-normalization

asked Jan 11 '16 at 07:47

pr338

7,310
14
45
64

186

votes

10 answers

Why use softmax as opposed to standard normalization?

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution: This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are…

math neural-network softmax

asked Jun 19 '13 at 09:20

Tom

5,773
9
33
45

166

votes

10 answers

Why do we have to normalize the input for an artificial neural network?

Why do we have to normalize the input for a neural network? I understand that sometimes, when for example the input values are non-numerical a certain transformation must be performed, but when we have a numerical input? Why the numbers must be in a…

machine-learning neural-network normalization

asked Jan 12 '11 at 22:16

karla

4,118
5
32
39

166

votes

8 answers

What's is the difference between train, validation and test set, in neural networks?

I'm using this library to implement a learning agent. I have generated the training cases, but I don't know for sure what the validation and test sets are. The teacher says: 70% should be train cases, 10% will be test cases and the rest 20% should…

artificial-intelligence neural-network

asked Jun 04 '10 at 17:37

Daniel

1,981
5
16
11

163

votes

2 answers

Why do we need to call zero_grad() in PyTorch?

The method zero_grad() needs to be called during training. But the documentation is not very helpful | zero_grad(self) | Sets gradients of all model parameters to zero. Why do we need to call this method?

python neural-network deep-learning pytorch gradient-descent

asked Dec 28 '17 at 04:31

user1424739

7,204
10
38
67

159

votes

9 answers

Ordering of batch normalization and dropout?

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the correct answer for TensorFlow. When using batch normalization and dropout in…

python neural-network tensorflow conv-neural-network

asked Sep 25 '16 at 21:12

golmschenk

9,361
17
69
117

155

votes

13 answers

Why must a nonlinear activation function be used in a backpropagation neural network?

I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used? This question is followed by this…

math machine-learning neural-network deep-learning

asked Mar 20 '12 at 06:06

corazza

27,785
32
104
177

2 3

…

99 100 Next