Maximum SIMD integer multiplications on Ivy Bridge using SSE/AVX?

Question

Would somebody be able to advise me how I can work out the maximum number of 32-bit unsigned integer multiplications I would be able to do concurrently on an Ivy Bridge CPU using SIMD via SSE/AVX?

I understand AVX did have 256-bit registers for multiplication but this was for floating point (AVX2 introduced 256-bit integer registers). Therefore I am not overly sure whether it would be better to use floating-point registers for integer multiplication (if thats even possible)?

In addition, I am unsure whether it matters on just the number of registers, or whether I need to look at the ports of the CPU. Looks like port 0 and port 5 can handle SSE integer ALU?

you could use double instead. All 32-bit integers are represented exactly by double — phuclv, Apr 20 '14 at 17:04
@LưuVĩnhPhúc: While that's true, it also takes more space and only 4 fit in an AVX register, completely defeating the purpose. You can already fit four 32-bit integers in an SSE register. — Ben Voigt, Apr 20 '14 at 17:05
Guys I found this example of what I need but its for signed integers, does anyone know how to change it for unsigned? http://stackoverflow.com/questions/10500766/sse-multiplication-of-4-32-bit-integers — user997112, Apr 20 '14 at 17:28
Signed and unsigned multiplication are the same thing by the way, there's only a difference if you want the upper half as well. — harold, Apr 20 '14 at 18:45

score 1 · Answer 1 · answered Apr 20 '14 at 17:00

1

You can do one pmulld = 4 multiplications per clock.

Therefore I am not overly sure whether it would be better to use floating-point registers for integer multiplication (if thats even possible)?

Nothing like that is possible. You can put 8 integers in an ymm register of course, but then you're stuck. The instruction you'd need to do something useful with them is in AVX2.

answered Apr 20 '14 at 17:00

harold

53,069
5
75
140

There's an Intel intrinsic called _mm_mul_ps which takes 2x 128-bit array-of-4-ints, multiplies them together and uses MULPS. What is the difference with MULPS and PMULLD? – user997112 Apr 20 '14 at 17:03
3

@user997112 `mulps` multiplies floats, `pmulld` is for integers. – harold Apr 20 '14 at 17:04
I found this: http://stackoverflow.com/questions/10500766/sse-multiplication-of-4-32-bit-integers but they are looking at multiplying signed integers and I would like unsigned. Could you advise how the code would change? – user997112 Apr 20 '14 at 17:24
1

@user997112 it's based around using `pmuludq` to emulate `pmulld`, but the `ymm` version of that is also `AVX2`, and so is everything else that might be useful. This way of doing it wouldn't even be useful if the 256 bit `vpmuludq` *was* in AVX, it needs two multiplications which undoes the improvement from the doubled width. – harold Apr 20 '14 at 17:50

score 0 · Answer 2 · edited May 23 '17 at 10:32

0

As you can see here:

There is no current solution to improve multiplication of long integers with SSE or AVX.

edited May 23 '17 at 10:32

Community

1
1

answered Apr 20 '14 at 16:54

AntiClimacus

1,322
7
21

Maximum SIMD integer multiplications on Ivy Bridge using SSE/AVX?

2 Answers2