Questions tagged [neural-networks]

For questions about the mathematics of artificial neural networks: their underlying multilayered graph object or their use as a data structure in machine learning algorithms. Consider also using the tags (machine-learning) or (graph-theory).

Neural networks (or commonly called artificial neural networks, or ANNs) are utilized in the fields of computer science and engineering in machine learning or deep learning algorithms. Technically the term neural network refers to the entire data structure, but because of their ubiquity the term neural network is often used to refer to the underlying weighted graph object too. Sometimes in papers you see it called a multilayer graph instead (a special case of a multipartite graph).

Artificial Neural Network
Image sourced from extremetech.com

Essentially a neural network is a "chain" of complete bipartite graphs. The first layer of nodes in the chain is the input layer, the last is the output layer, and all the other nodes are in hidden layers. The big idea here is that a neural network is designed to simulate the way the human brain (a real neural networks) recognizes patterns. Because of this, neural networks are typically used in many pattern-recognition algorithms, like general handwriting recognition (when you can deposit a check by taking a picture of it), or facial recognition (when Facebook asks you to tag friends in a picture because it recognizes their face).


The common example used when explaining applications of ANNs is handwriting recognition. Adam Harley of Ryerson University created a beautiful online visualization of this example. Suppose someone writes a digit in the range [0, 1, ..., 9] and your ANN was trained to recognize what that digit is. First you codify the person's handwritten digit (make it into a black-and-white image and for each pixel use a zero if the pixel is black and a one if it's white, or something like that). This feeds into the input layer as the weights of those nodes. Then these initial input values are propagated through the ANN, being "operated on" by the weights of the edges and the nodes in the hidden layers (lots of details glossed over here). Then when the values arrive in the output layer, that output is compared to the set of expected output values to decide what the handwritten digit was. Since we have ten expected output values, our output layer would probably be set up with ten nodes, and we'd say, for example, that a value of (0,0,0,1,0,0,0,0,0,0) would be the expected output if the digit was three.

And of course how accurate this ANN is depends completely on the weights in the hidden layers. Prior to having a working ANN, you have to train it. To do this you get a large collection of input images (handwritten digits) for which you know what the output they are supposed to be. Then you feed the input through the ANN, look at the output and compare it to your expected output for that input. Using the differences between the output and the expected output, you back-propagate through the ANN and alter the weights to values that would have given a more accurate result. Changing these weights over a large collection of sample inputs essential teaches the network to accurately recognize patterns.

728 questions
27
votes
7 answers

What are the best books to study Neural Networks from a purely mathematical perspective?

I am looking for a book that goes through the mathematical aspects of neural networks, from simple forward passage of multilayer perceptron in matrix form or differentiation of activation functions, to back propagation in CNN or RNN (to mention some…
20
votes
2 answers

Why do deep neural networks work well?

The universal approximation theorem, as I understand it, states that for any continuous bounded function $f: X \rightarrow \mathbb{R}$ with compact domain $X$ and any threshold $\varepsilon$ there is a neural network $N: X \rightarrow \mathbb{R}$…
20
votes
4 answers

What areas of math can be tackled by artificial intelligence?

Artificial intelligence is nearing, with image/speech recognition, chess/go engines etc. My question is, what areas of math that are interesting to mathematicians, is likely to be the first to be able to be tackled by artificial intelligence? Is…
16
votes
2 answers

Tricky proof of a result of Michael Nielsen's book "Neural Networks and Deep Learning".

In his free online book, "Neural Networks and Deep Learning", Michael Nielsen proposes to prove the next result: If $C$ is a cost function which depends on $v_{1}, v_{2}, ..., v_{n}$, he states that we make a move in the $\Delta v$ direction to…
14
votes
0 answers

What is the Probability of Transmission Between Two Nodes in a Neural Network?

I have a network which is an Erdős–Rényi graph. It is a simple neural network with degree 0.7N where N is the number of nodes. Each weight between neurons is $\frac{1}{N}$, meaning that if node n has fired the probability that any connected node k…
14
votes
1 answer

How can I derive the back propagation formula in a more elegant way?

When you compute the gradient of the cost function of a neural network with respect to its weights, as I currently understand it, you can only do it by computing the partial derivative of the cost function with respect to each one of the weights…
13
votes
3 answers

Scaling factor and weights in Unscented Transform (UKF)

I'm trying to implement the UKF for parameter estimation as described by Eric A. Wan and Rudolph van der Merwe in Chapter 7 of the Kalman Filtering and Neural Networks book: Free PDF I am confused by the setting of $\lambda$ (used in the selection…
11
votes
1 answer

Category Theory & Artificial Intelligence (AI)

Category theory turns out to be useful in more and more areas. (see e.g. MSE - Category Theory & Biology) Question. Does anyeone know of some connection of category theory to (convolutional) neural networks (CNNs) / deep learning (or to machine…
11
votes
1 answer

Is there a connection between topological mixing and squashing functions used in neural networks?

Sigmoid, ReLU, tanh, logistic -type "squashing" functions are popular in neural networks to introduce nonlinearity into the transformations of the input vector, allowing the network to fit complex input-output surfaces. Deeper layers (stacking…
10
votes
2 answers

Why do we use gradient descent in the backpropagation algorithm?

The common approach for training neural networks, as far as I know, is the backpropagation algorithm, which uses gradient descent to reduce the error. (i) Why should one use a fixed learning rate / simulated annealing over, let's say, Armijo…
kutschkem
  • 373
  • 1
  • 4
  • 14
10
votes
2 answers

What temperature of Softmax layer should I use during neural network training?

I've written GRU (gated recurrent unit) implementation in C#, it works fine. But my Softmax layer has no temperature parameter (T=1). I want to implement "softmax with temperature": $$ P_{i} =…
R. A.
  • 101
  • 1
  • 1
  • 5
10
votes
0 answers

How to calculate the Lie algebra of a neural network?

Define $F$ as the standard multi-layer feed-forward perceptron: \begin{equation} F(\mathbf{x}) = \Theta( W_1 \circ \Theta( W_2 \circ .... W_L(\mathbf{x}))) \end{equation} where $\Theta$ is the sigmoid function and $W_\ell$ is the weight matrix for…
Yan King Yin
  • 1,063
  • 5
  • 16
9
votes
3 answers

How many parameters does the neural network have?

We have a neural network with an input layer of ℎ0 nodes, hidden layers of ℎ1 , ℎ2 , ℎ3 , ..., ℎ−1 nodes respectively and an output layer of ℎ nodes. How many parameters does the network have?
emily
  • 99
  • 1
  • 1
  • 4
9
votes
1 answer

Neural Networks and the Chain Rule

With neural networks, back-propagation is an implementation of the chain rule. However, the chain rule is only applicable for differentiable functions. With non-differentiable functions, there is no chain rule that works in general. And so, it…
NicNic8
  • 6,492
  • 3
  • 17
  • 31
8
votes
1 answer

Relation between information geometry and geometric deep learning

I'm currently working on information geometry (IG) and geometric deep learning (GDL). As I started without specific knowledge of both, their respective names led me to believe for a short and naive period that GDL was defined by the use of IG…
1
2 3
48 49