Hardware implementation of square root?

Question

I'm trying to find a little bit more information for efficient square root algorithms which are most likely implemented on FPGA. A lot of algorithms are found already but which one are for example from Intel or AMD? By efficient I mean they are either really fast or they don't need much memory.

EDIT: I should probably mention that the question is generally a floating point number and since most of the hardware implements the IEEE 754 standard where the number is represented as: 1 sign bit, 8 bits biased exponent and 23 bits mantissa.

Thanks!

http://stackoverflow.com/questions/1528727/why-is-sse-scalar-sqrtx-slower-than-rsqrtx-x has detailed information. — Vladislav Zorov, Jan 15 '12 at 17:07
Why not implement [this](http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_.28base_2.29)? You only do shifts and adds and no extra memory is needed for things like look-up tables. Looks like a good candidate for an FPGA. — Alexey Frunze, Jan 15 '12 at 18:36
Thanks for the comment @Alex. I'll try to find some more resources, because I still don't how can I implement that in VHDL. One more question, doesn't that find just the integer part of sqrt? — Dimitar Petrov, Jan 15 '12 at 19:49
Do you want to solve one square root as fast as possible, or solve a continuous stream of square roots as fast as possible? — President James K. Polk, Jan 16 '12 at 02:08
@DimitarPetrov: right, that particular piece of code calculates the integer square root of an integer. But you can reuse it for floating-point values too because sqrt(mantissa\*2^exponent)=sqrt(mantissa)\*2^(exponent/2) and you can always represent your number as an integer mantissa times some even power of 2. You should have included the details about floating-point square root and VHDL in the question. Actually, the VHDL probably deserves a separate question. — Alexey Frunze, Jan 16 '12 at 05:15
@GregS: Doesn't it mean that solving one as fast as possible will solve a continious stream also fast as possible? I was asking for one square root. Alex: The VHDL part is not so important right now, so that's why I didn't add that to the question. I'll try to implement it later. — Dimitar Petrov, Jan 16 '12 at 08:48
@DimitarPetrov: No. For example one may take 8 ticks per square root, while the other may take 16 ticks before the first answer comes out of the pipeline, but then each successive answer comes out one tick later. — President James K. Polk, Jan 16 '12 at 13:16
Thanks for the naswer @GregS. So any recommendations for algorithm calculating fast single square root? — Dimitar Petrov, Jan 16 '12 at 13:38
@DimitarPetrov: No, I'm useless in that regard, though I think Paul S has the correct answer. — President James K. Polk, Jan 16 '12 at 13:39
@Alex: "sqrt(mantissa*2^exponent)=sqrt(mantissa)*2^(exponent/2)" is true, but sqrt(mantissa) might be outside of the range 0.25->0.5. If that's the case you'll need to re-normalise it, and have to adjust the exponent +/- 1. This is just a final stage on the output though. — Paul S, Jan 16 '12 at 17:11
@PaulS: Of course one will need to make sure the result is properly normalized if possible. — Alexey Frunze, Jan 16 '12 at 17:41

score 6 · Accepted Answer · answered Jan 16 '12 at 02:09

6

Not a full solution, but a couple of pointers.

I assume you're working in floating point, so point 1 is remember that floating point is stored as a mantissa and exponent. The exponent of the square root will be approximately half the exponent of the original number thanks to logarithms.

Then the mantissa can be approximated with a look-up table, and then you can use a couple of newton-raphson rounds to give some accuracy to the result from the LUT.

I haven't implemented anything like this for about 8 years, but I think this is how I did it and was able to get a result in 3 or 4 cycles.

answered Jan 16 '12 at 02:09

Paul S

7,415
2
22
35

Thanks Paul! Can you point me to particular algorithm? Yep, I'm working in floating point and just editted my question... or could you expand a little bit your explanation because I didn't quite understand everyhing :) – Dimitar Petrov Jan 16 '12 at 14:40
Unfortunately it's been so long that I can't remember the precise details, and it was for a previous employer so I can't look it up either. If you have specific questions I'll do my best to answer them. – Paul S Jan 16 '12 at 16:00
Thanks @Paul. I've marked your question as best answer, since that gave me some ideas and point me (hopefully) in the right direction. Thanks – Dimitar Petrov Jan 16 '12 at 22:01

score 3 · Answer 2 · answered Jan 15 '12 at 17:07

3

This is a great one for fast inverse-quare root.
Have a look at it here. Notice it's pretty much about the initial guess, rather amazing document :)

answered Jan 15 '12 at 17:07

ScarletAmaranth

4,835
2
20
32

Thanks a lot! I've seen that already and looks pretty impresive, however the "magic number" scared me a little bit in the beginning. I will take a look again :) – Dimitar Petrov Jan 15 '12 at 17:41
1

I don't feel this answer is really relevant to the question. This algorithm only computes a rough approximation of the inverse of the square root, and definitely is not was is implemented on FPGA's – Sven Marnach Jan 15 '12 at 18:09
Thanks Sven. Does most implementations on FPGAs just use some variant on the Newton-Raphson method? Like some inversion to get rid from the division, which itself is a expensive operation? – Dimitar Petrov Jan 15 '12 at 18:13
@DimitarPetrov The magic number is only required for the inverse square root. The square root can be achieved using a right shift and a few rounds of Newton-Raphson. – S.S. Anne Jun 26 '20 at 15:31

Hardware implementation of square root?

2 Answers2