I was reading the link given in the thread's last comment. I understood initial part. I understand that in case of the hill, if we take any point on that hill, the gradient of the original function will always point towards the peak of the mountain. But I am a bit confused about the gradient of the constraint. Why would gradient of the constraint at constraint's topmost point will point in the direction of hill's peak? I thought that it will be 0 as we have reached the topmost point of the constraint. any explanation?
1 Answers
You are moving on a level set of the constraint  $g(x) = 0$  and since the gradient of $g$ is always perpendicular to its level sets, $\nabla g$ is always perpendicular to $g(x)=0$.
When you are at the constrained maximum of $f$, $\nabla f$ is also perpendicular to $g(x)$. Therefore $\nabla f$ and $\nabla g$ are parallel, i.e. $\nabla f = \lambda \nabla g$ and since $\nabla f$ points uphill, so does $\nabla g$. But that is a consequence of the fact that the two gradients are parallel, and only true at the top of the constrained hill.
EDIT: No, the gradient is never zero. (In fact, the method of Lagrange multipliers requires zero to be a regular value of $g$, i.e., $\nabla g \neq 0$ whenever $g=0$.)
Here's a concrete example:
$$\max\, x \quad \textrm{s.t.}\quad x^2+y^2=1$$
In other words, you are trying to maximize $x$ over the unit circle. This has obvious solution $x=1, y=0$.
Now the constraint function is $g(x,y) = x^2+y^21.$ The gradient is $$\nabla g = (2x, 2y).$$ At the top of the constrained hill, $(1,0)$, this gradient points further uphill  in the direction $(2,0)$  but is nowhere near zero. In fact it is easy to check that when $g=0$, the gradient of $g$ is never zero: $$\\nabla g\ = \sqrt{4x^2+4y^2} = 2\sqrt{g+1} = 2.$$
 45,846
 11
 84
 142

let me repeat my question...why at the constraint's maximum, the gradient is not zero? Don't we get slope = zero at the maximum of function? If possible provide me actual function for the hill and the constraint in the question and then we can compare gradients. That will help me understand...thanks – user2543622 Nov 01 '13 at 02:45

In case of the constraint, I think that gradient with respect to x1=0 as we are perpendicular to x1 axis and gradient with respect to x2 will change as we travel on the constraint. At the top of the constraint it will be zero – user2543622 Nov 01 '13 at 02:57

It is making sense now. But a bit unclear still...I agree that gradient will never be zero. Earlier I was considering the constraint a 2D curve and that's why I felt that gradient will be 0 at the top. But now I understand that in 3D, gradient won't be zero. My confusion is: at the topmost point of the constraint, our gradient will point right along Z direction, am I correct? While on the hill at f=a2 curve, gradient will point to the tip of the mountain. The gradient will not be exactly parallel to Z axis, is that correct? Then the both gradients are not parallel ? – user2543622 Nov 01 '13 at 17:42

In my example, both $f$ and $g$ are functions over the $xy$ plane, and so the gradients of both of these are in the $xy$ plane  $\nabla f$ points in the direction that $f$ increases fastest, but doesn't itself have any $z$ component. – user7530 Nov 01 '13 at 17:44

i am not talking about your example. I am talking about example given in the link. I understand that your example is in xy plane only – user2543622 Nov 01 '13 at 18:12

There is no difference between my example and the link's. The example in the link is a "cartoon": it shows the gradient as being tangent to the hill, but this is not mathematically the case. The gradient lies entirely in the $x_1, x_2$ plane. – user7530 Nov 01 '13 at 18:20

More specifically, $\nabla f$ has only two components: $\frac{\partial f}{\partial x_1}$ and $\frac{\partial f}{\partial x_2}$. This is a vector in $\mathbb{R}^2$. – user7530 Nov 01 '13 at 18:21

AHA, thanks friend it is making sense now. In your example can I say this: As mentioned in the attached link, let's say I started walking from point (1,0) towards circle's top, and when I come at point (1/squareroot(2), 1/squareroot(2)) and compare gradient at that point with gradient of objective then I get gradient at that point = (squareroot(2),squareroot(2)), while gradient of objective is (1,0) therefore we need to go further upwards on the circle? At the point (0,1), gradient of the constraint and the objective are the same.. – user2543622 Nov 01 '13 at 21:13

Yep, that's it. – user7530 Nov 01 '13 at 21:38

I solved your example using Lagrange multiplier. At the solution (1,0) as we discussed earlier gradient of the constraint and the gradient of objective are the same, I also got lambda=0.5, what does that lambda signify? Any good link to understand "magnitude of gradient"? – user2543622 Nov 01 '13 at 22:52