13

How faster is tensorflow-gpu with AVX and AVX2 compared with it without AVX and AVX2?

I tried to find an answer using Google but with no success. It's hard to recompile tensorflow-gpu for Windows. So, I want to know if it worth it.

Dmitry
  • 13,126
  • 20
  • 95
  • 177

1 Answers1

11

If your computation is one giant matmul on CPU, you will get 3x speed-up on Xeon V3 (see benchmark here). But it's also possible to see no speed-up, presumably because there's not enough time spent in high arithmetic intensity ops executed on CPU.

Here's a table from "High Performance Models" guide for training of resnet50 on CPU with difference optimizations. It looks like you can get 2.5 speed-up with best settings

| Optimization | Data Format | Images/Sec   | Intra threads | Inter Threads |
:              :             : (step time)  :               :               :
| ------------ | ----------- | ------------ | ------------- | ------------- |
| AVX2         | NHWC        | 6.8 (147ms)  | 4             | 0             |
| MKL          | NCHW        | 6.6 (151ms)  | 4             | 1             |
| MKL          | NHWC        | 5.95 (168ms) | 4             | 1             |
| AVX          | NHWC        | 4.7 (211ms)  | 4             | 0             |
| SSE3         | NHWC        | 2.7 (370ms)  | 4             | 0             |

If you are able to compile an optimized version for Windows, it would help to mention it in this issue -- https://github.com/yaroslavvb/tensorflow-community-wheels/issues/13 , it seems there's some demand for such a build

Yaroslav Bulatov
  • 53,323
  • 19
  • 126
  • 181
  • Can you please describe your steps before you get error? I stuck on the following: https://stackoverflow.com/a/46140317/865475 – Dmitry Sep 11 '17 at 03:49
  • sorry, I have no windows background. BTW, added table with timings for actual network – Yaroslav Bulatov Sep 11 '17 at 18:30
  • Note there's an issue here, I guess other people have trouble building this -- https://github.com/tensorflow/tensorflow/issues/12978 – Yaroslav Bulatov Sep 11 '17 at 21:21
  • cmake doesn't work also with error: ` C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" e xited with code 1. [C:\tensorflow\tensorflow\contrib\cmake\build\cub.vcxproj]` (revision from the last successful nightly build) – Dmitry Sep 11 '17 at 21:24
  • Windows 10 also steals GPU memory. So, must die. I'm going to install Ubuntu. – Dmitry Sep 11 '17 at 21:26
  • much sadness :( – Yaroslav Bulatov Sep 11 '17 at 21:28
  • The master revision fails with another error using cmake: `C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" e xited with code 1. [C:\tensorflow-cmake\tensorflow\contrib\cmake\build\gemmlowp.vcxproj]` – Dmitry Sep 11 '17 at 21:52
  • Cmake issue report: https://github.com/tensorflow/tensorflow/issues/12977 – Dmitry Sep 11 '17 at 21:57
  • 1
    I checked both tensorflow with AVX and AVX2 support and without them. Performance difference is 0% (zero percent). By the way, my CPU is never loaded up to 100%. Everything works on GPU. – Dmitry Sep 15 '17 at 03:26
  • The question was about tensorflow-gpu, but this answer seems to quote some beanchmarks conducted on CPU-only training mode. – ivan866 Aug 12 '20 at 10:03