Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

2942 questions
191
votes
1 answer

Derivative of Softmax loss function

I am trying to wrap my head around back-propagation in a neural network with a Softmax classifier, which uses the Softmax function: \begin{equation} p_j = \frac{e^{o_j}}{\sum_k e^{o_k}} \end{equation} This is used in a loss function of the…
Moos Hueting
  • 2,107
  • 3
  • 11
  • 10
138
votes
11 answers

What is the difference between regression and classification?

What is the difference between regression and classification, when we try to generate output for a training data set $x$?
Bober02
  • 2,310
  • 3
  • 16
  • 15
117
votes
8 answers

derivative of cost function for Logistic Regression

I am going over the lectures on Machine Learning at Coursera. I am struggling with the following. How can the partial derivative of $$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))$$ where…
59
votes
3 answers

Why we consider log likelihood instead of Likelihood in Gaussian Distribution

I am reading Gaussian Distribution from a machine learning book. It states that - We shall determine values for the unknown parameters $\mu$ and $\sigma^2$ in the Gaussian by maximizing the likelihood function. In practice, it is more convenient…
Kaidul Islam
  • 693
  • 1
  • 6
  • 6
59
votes
5 answers

Why divide by $2m$

I'm taking a machine learning course. The professor has a model for linear regression. Where $h_\theta$ is the hypothesis (proposed model. linear regression, in this case), $J(\theta_1)$ is the cost function, $m$ is the number of elements in the…
39
votes
2 answers

How is logistic loss and cross-entropy related?

I found that Kullback-Leibler loss, log-loss or cross-entropy is the same loss function. Is the logistic-loss function used in logistic regression equivalent to the cross-entropy function? If yes, can anybody explain how they are related? Thanks
34
votes
3 answers

Mathematical preparation for postgraduate studies in Linguistics

I am an undergraduate student in Mathematics and I would like to continue my postgraduate studies in the harder, more mathematical aspects of Linguistics. What exactly would that include is unknown even to me, but possible areas of interest would…
29
votes
2 answers

Mathematical introduction to machine learning

At first glance, this is once again a reference request for "How to start machine learning". However, my mathematical background is relatively strong and I am looking for an introduction to machine learning using mathematics and actually proving…
Quickbeam2k1
  • 2,019
  • 19
  • 22
27
votes
7 answers

What are the best books to study Neural Networks from a purely mathematical perspective?

I am looking for a book that goes through the mathematical aspects of neural networks, from simple forward passage of multilayer perceptron in matrix form or differentiation of activation functions, to back propagation in CNN or RNN (to mention some…
24
votes
2 answers

Invert the softmax function

Is it possible to revert the softmax function in order to obtain the original values $x_i$? $$S_i=\frac{e^{x_i}}{\sum e^{x_i}} $$ In case of 3 input variables this problem boils down to finding $a$, $b$, $c$ given $x$, $y$ and…
24
votes
1 answer

Log of Softmax function Derivative.

Could someone explain how that derivative was arrived at. According to me, the derivative of $\log(\text{softmax})$ is $$ \nabla\log(\text{softmax}) = \begin{cases} 1-\text{softmax}, & \text{if $i=j$} \\ -\text{softmax}, & \text{if $i \neq…
20
votes
4 answers

Deriving the normal distance from the origin to the decision surface

While studying discriminant functions for linear classification, I encountered the following: .. if $\textbf{x}$ is a point on the decision surface, then $y(\textbf{x}) = 0$, and so the normal distance from the origin to the decision surface is…
BitRiver
  • 413
  • 3
  • 10
18
votes
4 answers

Deriving cost function using MLE :Why use log function?

I am learning machine learning from Andrew Ng's open-class notes and coursera.org. I am trying to understand how the cost function for the logistic regression is derived. I will start with the cost function for linear regression and then get to my…
cmelan
  • 185
  • 1
  • 2
  • 7
18
votes
2 answers

What is divergence in image processing?

What is the difference between gradient and divergence? I understood that gradient points in the direction of steepest ascent and divergence measures source strength. I couldn't relate this to the concept of divergence in image processing. What is…
Premnath D
  • 391
  • 2
  • 4
  • 8
18
votes
1 answer

Why does a radial basis function kernel imply an infinite dimension map?

I understand that each kernel implies a particular feature map. For instance for $x,z \in R^2$ the kernel $K(x,z)=(\textrm{dot}(x,z))^2$ implies a feature map $$\langle\phi(x_1), \phi(x_2)\rangle=\langle [x_1^2 , x_1 x_2 , x_1 x_2, x_2^2], [z_1^2 ,…
DuckMaestro
  • 333
  • 1
  • 2
  • 8
1
2 3
99 100