Java: 32-bit fp implementation of Math.sqrt()

Question

The standard Math.sqrt() method seems pretty fast in Java already, but it has the inherent drawback that it is always going to involve 64-bit operations which does nothing but reduce speed when dealing with 32-bit float values. Is it possible to do better with a custom method that uses a float as a parameter, performs 32-bit operations only, and returns a float as a result?

I saw:

Fast sqrt in Java at the expense of accuracy

and it did little more than reinforce the notion that Math.sqrt() is generally hard-to-beat. I also saw:

http://www.codeproject.com/Articles/69941/Best-Square-Root-Method-Algorithm-Function-Precisi

which showed me a bunch of interesting C++/ASM hacks that I am simply too ignorant to port directly to Java. Though sqrt14 might be interesting as a part of a JNI call . . .

I also looked at Apache Commons FastMath, but it looks like that library defaults to the standard Math.sqrt() so no help there. And then there's Yeppp!:

http://www.yeppp.info/

but I haven't bothered with that yet.

I'm not quite sure you'll get the speed benefit you hope you'd get — Marcus Müller, Jun 11 '15 at 09:10
"64-bit operations ... does nothing but reduce speed when dealing with 32-bit float values" is a fallacy. In general, floating-point operations are always carried out in the precision of the FPU, and the overhead comes in widening and narrowing the `float` operands` to `double` to suit the FPU. — user207421, Jun 11 '15 at 09:41
@EJP it's true for `sqrtsd` vs `sqrtss`, but of course from a Java perspective you can't control that. As for the old-style FPU which works as you describe, it's essentially obsolete (and severely crippled in some Intel Atoms) — harold, Jun 11 '15 at 10:41

score 6 · Accepted Answer · answered Jun 11 '15 at 10:34

You need nothing to speed up sqrt for 32-bit values. HotSpot JVM does it automatically for you.

JIT compiler is smart enough to recognize f2d -> Math.sqrt() -> d2f pattern and replace it with faster sqrtss CPU instruction instead of sqrtsd. The source.

The benchmark:

@State(Scope.Benchmark)
public class Sqrt {
    double d = Math.random();
    float f = (float) d;

    @Benchmark
    public double sqrtD() {
        return Math.sqrt(d);
    }

    @Benchmark
    public float sqrtF() {
        return (float) Math.sqrt(f);
    }
}

And the results:

Benchmark    Mode  Cnt       Score      Error   Units
Sqrt.sqrtD  thrpt    5  145501,072 ± 2211,666  ops/ms
Sqrt.sqrtF  thrpt    5  223657,110 ± 2268,735  ops/ms

Interesting! Learn something new every day. – user3765373 Jun 13 '15 at 04:57 — user3765373, Jun 13 '15 at 04:57

score 0 · Answer 2 · answered Jun 11 '15 at 09:13

As you seem to know JNI:

just write a minimal wrapper for double sqrt(double) and float sqrt(float) from C's standard library's math.h and compare performance.

Hint: you won't feel a difference unless you do a lot of square rooting, and then the performance advantage of using SIMD instructions to do multiple sqrts at once will most probably dominate the effects. You will need to get a memory-aligned array of the floating point values from Java, which can be quite hard, if you're using Java standard libraries.

Java: 32-bit fp implementation of Math.sqrt()

2 Answers2