I cannot see how Willie Wong's example of the Bernstein-Robinson result supports his conclusion. It seems to me to do the opposite, and I am not alone here. Halmos admits himself in his autobiography: "The Bernstein-Robinson proof uses non-standard models of higher order predicate languages, and when Abby [Robinson] sent me his preprint I really had to sweat to pinpoint and translate its mathematical insight." Halmos did sweat because, as all of his comments and actions regarding NSA indicate, he was against it for philosophical or personal reasons, and so he was eager to downplay this result precisely because it seemed like support for using NSA, which, at least in Robinson's approach, is nonconstructive due to the reliance on the existence of nonprincipal ultrafilters (also, the compactness theorem relies on some equivalent of the axiom of choice).
Also, the fact that a formal proof of some formula exists (which is precisely what it means to be a theorem) is only trivially relevant to the question of whether a theory might help you find a proof. Besides, who other than automated theorem-provers actually thinks in terms of formal proofs? In my experience, the concepts and tools of a theory, the objects that it lets you talk about, and the ideas that it lets you express are what make a theory useful for proving things.
One thing that the OP might find attractive about NSA is that saying "x is infinitely close to y" is perfectly fine and meaningful -- and it probably means what you already think it means: two numbers are infinitely close iff their difference is infinitely small, i.e., an infinitesimal. You also get things like halos (all numbers infinitely close to some number) and shadows (the standard number infinitely close to some number), which can be fun and intuitive concepts to think with.
For example, here is how the limit of a (hyperreal) sequence is defined. First, sequences are no longer indexed by the natural numbers $\mathbb{N}$. Rather, sequences are indexed by the hypernaturals $^*\mathbb{N}$, which include numbers larger than any standard natural. Such numbers are called infinite (or unlimited). (Warning: this is not the same concept as "infinity" in "as x goes to infinity"; infinite naturals are smaller than (positive) infinity, when it makes sense to compare them.) Now, a hyperreal L is the limit of a sequence $\langle s_n \rangle$ (indexed by $^*\mathbb{N}$!) iff L is infinitely close to $s_n$ for all infinite n.
For another example, consider proofs using "sloppy" reasoning where you end up with some infinitesimal term and so just ignore it or drop it from an equation (provoking derisive comments about "ghosts of departed entities"). In NSA, rather than ignoring the term, you can actually say that it's infinitesimal and end up with a result that is infinitely close to the result of your sloppy alternative. E.g., let (the hyperfunction) $f(x) = x^2$ and consider the (I presume familiar) formula for the derivative, where we will let h be a nonzero infinitesimal:
$$\begin{align}
\frac{f(x+h) - f(x)}{h} &= \frac{(x+h)^2 - x^2}{h} \\
&= \frac{x^2 + 2xh + h^2 - x^2}{h} \\
&= 2x + h \\
&\simeq 2x
\end{align}$$
The symbol $\simeq$ denotes the relation "infinitely close". This derivation works because, when h is an infinitesimal, a + h is infinitely close to a for any hyperreal a. Under sensible restrictions on f and x, this derivation shows that $2x$ is the standard derivative of $x^2$, as every schoolgirl knows.
A cost-benefit analysis for learning NSA should probably include (i) for a benefit, how interesting or valuable you find the nonstandard concepts and (ii) for a cost, how much work you'll have to do to learn it. The latter will depend on what text or approach you choose. If you are willing to take some things for granted and just use the resulting tools, you can get away with bypassing a good chunk of the model-theoretic machinery (compactness, ultrafilters, elementary extensions, transfer, formal languages). If you understand the ultrapower construction, which constructs the hyperreals as equivalence classes of infinite sequences of real numbers (similar to the construction of the reals from the rationals using Cauchy sequences), then the resulting system behaves like you would expect -- relations and operations are defined componentwise. This part is relatively easy. Alternatively, you can get away with not understanding the construction very well if you are willing to internalize the definitions of the relations and operations on the hyperreals just as axiomatic.
If you want to look into NSA, I would recommend either (a) Goldblatt's Lectures on the Hyperreals if you don't have a strong background or interest in mathematical logic or (b) Hurd and Loeb's Introduction to Nonstandard Real Analysis otherwise. The latter is out of print and sadly about $100 if you want to buy it, but check libraries. It's very thoughtful and well-written. Also, if you are excited about the model-theoretic aspects, look them up in Chang and Keisler's Model Theory book as you go along. Hodges' model theory book is also very good but doesn't cover this material as extensively.
Cheers, Rachel