I am trying to learn about Calculus of Variations and I am beginning to see some constrained optimization problems in the domain of functionals, by using Lagrange multipliers. It seems that things work like in the finite-dimensional calculus but I need some semi-formal explanations; which I failed to find anywhere. (Looks like this is a rare area of interest).

In the finite dimensional calculus, when we wanted to optimize a function $f(x), f:\mathbb{R^n} \mapsto \mathbb{R}$ such that it obeys the constraint $g(x)=0$, the idea was that $g(x)=0$ was a $n-1$ dimensional surface and $\nabla g(x)$ was normal to the surface's tangent hyperplane. In order to move on the surface, we had to move in directions $u$ where $\nabla g(x)^T \cdot u=0$. This requires that at a constrained extremum it is needed that $\nabla f(x)^T \cdot u=0$ whenever $\nabla g(x)^T \cdot u=0$, which requires that $\nabla f(x) = \lambda \nabla g(x)$.

Is this valid in a function space, like $C[a,b]$ as well? Can it be shown that if we have a nice behaving constraint functional $G[y(x)]=0$ in this space, the gradient $\nabla G[y(x)]$ is perpendicular to the function space surface $G[y(x)]=0$? We have the inner product $\int_{a}^{b}f(x)g(x)dx$ defined for this space which introduces angles and orthogonality between functions; so I intuitively think that this is true, but I wonder what the actual explanation is.

Assuming 1. is true, can be said that at a constrained extremum, it should be $\nabla F[y(x)] = \lambda \nabla G[y(x)]$, like in the finite dimensional case?

Does one build a "Lagrangian functional" $L[y(x),\lambda]=F[y(x)] + \lambda G[y(x)]$ like in the finite dimensional case, where the problem turns into the unconstrained optimization of the weird thing $L[y(x),\lambda]$, which depends both on a functional and a scalar variable? How does one maximizes such a thing then: This space is something like $C[a,b] \times \mathbb{R}$. How are derivatives etc. defined here; for example can we define a Gateaux variation here like $$\lim_{\epsilon \to 0} \dfrac{L[y(x) + \epsilon h(x),\lambda + \epsilon] - L[y(x),\lambda]}{\epsilon}$$ which leads to unconstrained optimization?

I need some explanations for such cases.