9

I'm doing some statistics calculations. I need them to be fast, so I rewrote most of it to use SSE. I'm pretty much new to it, so I was wondering what the right approach here is:

To my knowledge, there is no log2 or ln function in SSE, at least not up to 4.1, which is the latest version supported by the hardware I use.

Is it better to:

  1. extract 4 floats, and do FPU calculations on them to determine enthropy - I won't need to load any of those values back into SSE registers, just sum them up to another float
  2. find a function for SSE that does log2
Aziz Shaikh
  • 15,104
  • 9
  • 55
  • 73
  • What kind of range and accuracy do you need for your log2 ? – Paul R Jan 17 '12 at 23:14
  • Same accuracy I get from the FPU would be desirable –  Jan 18 '12 at 07:06
  • 1
    There seem to be a few SSE log2 implementations around, e.g. http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html – Paul R Jan 18 '12 at 07:45
  • Neat, thanks! I'll try that and benchmark it. Extracting the floats to an array and then doing 4 consecutive log2's on that via FPU was disappointingly slow. Instruments said it's wasting 95% of its time there. –  Jan 18 '12 at 07:50
  • There is also the Intel Approximate Maths Library - it's old (2000) but it's SSE2 and it should still work reasonably well: http://www.intel.com/design/pentiumiii/devtools/AMaths.zip – Paul R Jan 18 '12 at 07:54
  • 2
    Woah... I'd tried the implementation from the blog you linked, the one I can approximate as close as I'd like. It's FAST. Cut down processing time down to about 10%. Thanks a LOT! –  Jan 18 '12 at 09:31
  • OK - I'll put those two links in an answer for future reference. – Paul R Jan 18 '12 at 09:35
  • Here is another link: [http://gruntthepeon.free.fr/ssemath](http://gruntthepeon.free.fr/ssemath/). Implements only the log function with SSE, but with with one more instruction you'll get the log2 – Bentoy13 Apr 04 '14 at 09:55

2 Answers2

9

There seem to be a few SSE log2 implementations around, e.g. this one.

There is also the Intel Approximate Maths Library which has a log2 function among others - it's old (2000) but it's SSE2 and it should still work reasonably well.


See also:
Paul R
  • 195,989
  • 32
  • 353
  • 519
  • 1
    Due to the method used on the blog, the function is now memory bound, instead of CPU bound. I unrolled the loop a little to make use of some _mm_prefetch love, and it still is memory bound. Thanks for that awesome pointer! –  Jan 18 '12 at 15:14
  • Glad it worked for you. You probably already know this, but if you're hitting a memory bandwidth bottleneck then try to combine other operations with your log2 so that you make more use of data while it's in cache. – Paul R Jan 18 '12 at 15:16
  • 1
    If you are updating your answer, you might want to mention libmvec, which is shipped with recent glibc. – Marc Glisse Nov 09 '16 at 08:36
2

There is no SSE instruction that implements a logarithm function. However, there's also no single x86 instruction that performs a generic logarithm either. If you're thinking about using a logarithm function like log or log10 from the C standard library, it's worth taking a look at the implementation that is used in an open-source library like libc. You can easily roll your own logarithm approximation that operates across all elements in an SSE register.

Such a function is often implemented using a polynomial approximation that is valid within some accuracy specification over a certain region of input arguments, such as a Taylor series. You can then take advantage of logarithm properties to wrap a generic input argument into the acceptable input range for your logarithm routine. In addition, you can parameterize the base of the logarithm by taking advantage of the property:

log_y(x) = log_a(x) / log_a(y)

Where a is the base of the logarithm routine that you created.

Jason R
  • 10,110
  • 5
  • 44
  • 72