Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

2942 questions

191

votes

1 answer

Derivative of Softmax loss function

I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function: \begin{equation} p_j = \frac{e^{o_j}}{\sum_k e^{o_k}} \end{equation} This is used in a loss function of the…

linear-algebra derivatives machine-learning

asked Sep 25 '14 at 16:43

Moos Hueting

2,107
3
11
10

138

votes

11 answers

What is the difference between regression and classification?

What is the difference between regression and classification, when we try to generate output for a training data set $x$?

regression machine-learning

asked May 05 '12 at 13:59

Bober02

2,310
3
16
15

117

votes

8 answers

derivative of cost function for Logistic Regression

I am going over the lectures on Machine Learning at Coursera. I am struggling with the following. How can the partial derivative of $$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$ where…

statistics regression machine-learning partial-derivative

asked Aug 27 '13 at 10:41

dreamwalker

1,295
3
9
6

votes

3 answers

Why we consider log likelihood instead of Likelihood in Gaussian Distribution

I am reading Gaussian Distribution from a machine learning book. It states that - We shall determine values for the unknown parameters $\mu$ and $\sigma^2$ in the Gaussian by maximizing the likelihood function. In practice, it is more convenient…

statistics normal-distribution machine-learning

asked Aug 10 '14 at 11:11

Kaidul Islam

votes

5 answers

Why divide by $2m$

I'm taking a machine learning course. The professor has a model for linear regression. Where $h_\theta$ is the hypothesis (proposed model. linear regression, in this case), $J(\theta_1)$ is the cost function, $m$ is the number of elements in the…

regression machine-learning

asked Aug 01 '14 at 17:42

Daniel says Reinstate Monica

1,265
4
18
28

votes

2 answers

How is logistic loss and cross-entropy related?

I found that Kullback-Leibler loss, log-loss or cross-entropy is the same loss function. Is the logistic-loss function used in logistic regression equivalent to the cross-entropy function? If yes, can anybody explain how they are related? Thanks

probability-distributions information-theory machine-learning

asked Dec 19 '14 at 07:23

jojodigi

votes

3 answers

Mathematical preparation for postgraduate studies in Linguistics

I am an undergraduate student in Mathematics and I would like to continue my postgraduate studies in the harder, more mathematical aspects of Linguistics. What exactly would that include is unknown even to me, but possible areas of interest would…

soft-question applications machine-learning artificial-intelligence

asked Feb 18 '13 at 06:49

Orest Xherija

1,039
3
12
26

votes

2 answers

Mathematical introduction to machine learning

At first glance, this is once again a reference request for "How to start machine learning". However, my mathematical background is relatively strong and I am looking for an introduction to machine learning using mathematics and actually proving…

reference-request self-learning machine-learning

asked Mar 25 '15 at 08:44

Quickbeam2k1

2,019
19
22

votes

7 answers

What are the best books to study Neural Networks from a purely mathematical perspective?

I am looking for a book that goes through the mathematical aspects of neural networks, from simple forward passage of multilayer perceptron in matrix form or differentiation of activation functions, to back propagation in CNN or RNN (to mention some…

book-recommendation machine-learning mathematical-modeling neural-networks

asked Mar 13 '19 at 03:03

Ile

votes

2 answers

Invert the softmax function

Is it possible to revert the softmax function in order to obtain the original values $x_i$? $$S_i=\frac{e^{x_i}}{\sum e^{x_i}} $$ In case of 3 input variables this problem boils down to finding $a$, $b$, $c$ given $x$, $y$ and…

probability exponential-function machine-learning logistic-regression

asked May 18 '18 at 17:22

jojek

1,142
1
11
17

votes

1 answer

Log of Softmax function Derivative.

Could someone explain how that derivative was arrived at. According to me, the derivative of $\log(\text{softmax})$ is $$ \nabla\log(\text{softmax}) = \begin{cases} 1-\text{softmax}, & \text{if $i=j$} \\ -\text{softmax}, & \text{if $i \neq…

derivatives machine-learning gradient-descent

asked Nov 14 '16 at 04:34

Sridhar Thiagarajan

votes

4 answers

Deriving the normal distance from the origin to the decision surface

While studying discriminant functions for linear classification, I encountered the following: .. if $\textbf{x}$ is a point on the decision surface, then $y(\textbf{x}) = 0$, and so the normal distance from the origin to the decision surface is…

linear-algebra self-learning machine-learning

asked Nov 19 '14 at 13:27

BitRiver

votes

4 answers

Deriving cost function using MLE :Why use log function?

I am learning machine learning from Andrew Ng's open-class notes and coursera.org. I am trying to understand how the cost function for the logistic regression is derived. I will start with the cost function for linear regression and then get to my…

statistics logarithms regression machine-learning

asked Aug 03 '14 at 19:23

cmelan

votes

2 answers

What is divergence in image processing?

What is the difference between gradient and divergence? I understood that gradient points in the direction of steepest ascent and divergence measures source strength. I couldn't relate this to the concept of divergence in image processing. What is…

calculus machine-learning image-processing

asked Feb 26 '14 at 00:15

Premnath D

votes

1 answer

Why does a radial basis function kernel imply an infinite dimension map?

I understand that each kernel implies a particular feature map. For instance for $x,z \in R^2$ the kernel $K(x,z)=(\textrm{dot}(x,z))^2$ implies a feature map $$\langle\phi(x_1), \phi(x_2)\rangle=\langle [x_1^2 , x_1 x_2 , x_1 x_2, x_2^2], [z_1^2 ,…

machine-learning

asked Jan 13 '13 at 03:28

DuckMaestro

2 3

…

99 100 Next