At first I deleted this answer when I saw Fabian's because obviously he knew what he was talking about and I wasn't :-). But I'm undeleting it because I realized that my method might be computationally more efficient for $m\ll n$, since you don't need to calculate the entire SVD of $M$, which takes $O(n^3)$ time; you only need to perform multiplications that take $O(mn^2)$ and inversions that take $O(m^3)$. You only need to calculate $M^TM$ once, and that doesn't take $O(n^3)$ since $M$ is sparse. If $M$ is sufficiently sparse, you can also treat $M^TM$ as sparse; then the time for the multiplications becomes linear in $n$.

P.S: The Wikipedia article on the SVD has a section "Truncated SVD" that says that the part of the SVD corresponding to the largest singular values can be calculated much more efficiently; so presumably the same savings apply there, too. I'm leaving the answer undeleted now but there's probably nothing of practical value to be gained from it if you know how to efficiently calculate the truncated SVD. (The Wikipedia article doesn't indicate how to do that, but there are lots of Google hits for "truncated SVD".)

If you mean "approximates $M$ as best as possible in a least-squares sense, this might lead to a feasible algorithm:

If you hold one of $A$ and $B$ fixed, this is a linear least-squares fit:

$$f (A) = \sum_{ik}\left(\sum_ja_{ij}b_{jk}-m_{ik}\right)^2\to\min$$
$$\frac{\partial f(A)}{\partial a_{ln}}=\sum_k \left(\sum_ ja_{lj}b_{jk}-m_{ik}\right)b_{nk}=0$$
$$A=MB^T(BB^T)^{-1}\;,$$

and analogously

$$B=(AA^T)^{-1}A^TM\;.$$

So you might be able to alternatingly iterate towards the solution, but I have no idea how good the convergence would be. You can also substitute one of these into the other to get e.g.

$$
\begin{eqnarray}
B
&=&
\left((BB^T)^{-1}BM^TMB^T(BB^T)^{-1}\right)^{-1}(BB^T)^{-1}BM^TM
\\
&=&
BB^T(BM^TMB^T)^{-1}BM^TM\;,
\end{eqnarray}
$$

which you can either regard as an iteration prescription (which cuts the number of matrix inversions in half) or try to solve directly for the fixed point (good luck).

By the way, $A$ and $B$ are only determined up to an invertible matrix, since $(AS^{-1})(SB)=AB$. You can see how that plays out in the above equations: If you start off with $SB$, you get $AS^{-1}$ and vice versa, and $S$ cancels out in the fixed-point equation for $B$.

Here's another way of thinking about this that might help: The columns of $M'$ are in the column space of $A$. So you're looking for $m$ column vectors to make up $A$ such that the sum of the squared distances of the columns of $M$ from the space spanned by these vectors is minimal. (In this view, optimizing $B$ determines the coefficients in the linear combinations of the column vectors of $A$ such that the nearest vector in the column space to each column vector of $M$ is used.) So what you're really looking for isn't the matrix $A$, but its column space, and this should be as close as possible to the column vectors of $M$. The matrix $S$ above just mixes the column vectors of $A$ but doesn't change the column space they span.