What you call "constraints" is in reality the definition of the compact set $Q$ over which the function $f$ has to be maximized. In order to find this maximum one has to set up a list of candidate points $P_k\in Q$, using the "stratification" of $Q$ into open manifolds of dimensions $2$, $1$ and $0$. The reason for that is the following: Setting a derivative (or $\nabla f$) to $0$ will only find stationary points in the interior of some domain of "full dimension".

In your example the set $Q$ is stratified as follows: It consists of a $2$-dimensional open interior $Q^\circ$, of four $1$-dimensional boundary segments $E_i$ (without endpoints!) and of four vertices $V_i$.

The possible candidates in $Q^\circ$ are brought to the fore by solving the equation $\nabla f(x,y)=0$, i.e. the system $f_x(x,y)=0$, $f_y(x,y)=0$, and retaining the solutions that lie in $Q^\circ$.

For the possible candidates on an edge $E_i$ there are two methods: Either you are able to produce a parametric representation $t\mapsto\bigl(x(t),y(t)\bigr)$ of $E_i$ (which in your example you certainly can) and then compute the "conditionally stationary points" of $f$ on $E_i$ by looking at the pullback
$\phi(t):=f\bigl(x(t),y(t)\bigr)$. This means you solve the equation $\phi'(t)=0$, obtain some $t_k$ and add the points $\bigl(x(t_k),y(t_k)\bigr)$ to the list, if they belong to $Q$. If your edge $E_i$ is given by a constraint $G(x,y)=0$ which you cannot solve for $x$ or $y$ then you have to resort to Lagrange multipliers. (I omit the details.)

Finally you have to add the vertices $V_i$ to your candidate list.

Assume you now have a finite candidate list $L=\{P_1, P_2,\ldots, P_r\}\subset Q$. Then
$$\max_{(x,y)\in Q} f(x,y)\ =\ \max\{f(P_1),f(P_2),\ldots, f(P_r)\}\ .$$

I'm sorry that things are so cumbersome, even for such a simple domain as a square. But it's easy to cook up an example where by forgetting one vertex or a conditionally stationary point on an edge you miss the maximum of $f$ on $Q$.