Why python does (not) use more CPUs?

Question

I have a simple python script which trains an agent with (stochastic) Policy Gradients on Pong. It is not specifically made to be parallelized (multiprocessing library is not imported) but it uses numpy and gym, which actually permit multicore processing.

When I run it on my laptop (Intel i7-4600M @ 2.90GHz with an integrated GPU) it does use the 100% of all the four processors. However, the same script executed on a more powerful desktop PC (Intel i7-4790 CPU @ 3.60GHz with a GPU that supports CUDA) with 8 cores and a dedicated GPU runs slower, using just one CPU.

Both PCs have the same version of Python (2.7.6). Anyone has an idea of why this happens?

I highly doubt that it uses all your processors on your laptop. Python simply can't. — Martijn Pieters, Jun 01 '17 at 07:58
I've duped you to the canonical answer on Python and multithreading. Without more details as to what you are actually doing, we can't tell you anything more. — Martijn Pieters, Jun 01 '17 at 07:59
What does the script do? Specialized libraries such as `numpy` use multiprocessing if necessary, but will use the GPU if available, thus you will not see the CPU being used. — Liran Funaro, Jun 01 '17 at 08:01
If you want I can show you the htop screen, but I thought you would have trusted me :) Actually I read some topics about it, but none of them talk about different behaviors in different CPUs. The code I'm running is this: [https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5#file-pg-pong-py] — giubacchio, Jun 01 '17 at 08:09
After a brief overview of the code, I can see you use `numpy` and `gym`, which both use multiprocessing and the GPU. This will explain the behavior, as I mentioned in my previous comment. — Liran Funaro, Jun 01 '17 at 08:15
You are right, but the notebook has just the integrated GPU, while just the desktop supports CUDA. Even so, in the notebook runs faster! — giubacchio, Jun 01 '17 at 08:16
@MartijnPieters I don't think this is a duplicate since he used external libraries which are capable of multiprocessing. — Liran Funaro, Jun 01 '17 at 08:17
@giubacchio It reasonable that a code that is parallelized using the GPU but often jumps back to the CPU, will run slower than just running on the CPU due to the overhead of the communication. — Liran Funaro, Jun 01 '17 at 08:18
@LiranFunaro: how do you know what is used? There is no such detail **in the question**. — Martijn Pieters, Jun 01 '17 at 08:19
@LiranFunaro: and my answer does cover external libraries releasing the GIL: *This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores.* — Martijn Pieters, Jun 01 '17 at 08:19
@giubacchio Why don't you add the source code to your question to make it a valid question. — Liran Funaro, Jun 01 '17 at 08:19
@MartijnPieters I know that your answer includes that, but it does not explain the matter at hand, which is "why it runs slower?" which is specific to the interleaving between the CPU and the GPU — Liran Funaro, Jun 01 '17 at 08:23
@LiranFunaro: then by all means edit the question and make it clear what the unique situation is here; as I stated in my initial comment, without details in the question we can't answer, and can only offer the general advice. — Martijn Pieters, Jun 01 '17 at 08:28
Liran I totally agree with you, still I think that it's a bit awkward that in the "weak" notebook runs 3x faster than in a "strong" desktop. p.s. I edited the question, if it is still incomplete I will add more details — giubacchio, Jun 01 '17 at 08:29
I very much doubt this is related to the GPU being used or not. Most likely one PC has BLAS/LAPACK libraries built with multiprocessing support and the other has not. — kazemakase, Jun 01 '17 at 08:32
@kazemakase The OP stated that the strong desktop have CUDA, but the weak notebook doesn't. I doubt that `numpy` and `gym` are not using it. — Liran Funaro, Jun 01 '17 at 08:35
@MartijnPieters I think it is no longer useful the duplicate marking — giubacchio, Jun 01 '17 at 08:45
Please put the code **here**, not in an external link. External links have a different lifetime from Stack Overflow questions, and can go away at any time making this question as incomplete as it was before. — Martijn Pieters, Jun 01 '17 at 08:47
I don't agree: reasoning in this way it will not make sense to permit external link at all. Everything it is needed to know is now reported in the question. However, I'm not an expert of stackoverflow, and if this is the policy I will change the question, making it in my opinion too much heavy. — giubacchio, Jun 01 '17 at 08:50
As far as I know, [parallelization in numpy comes from the underlying numeric library](https://stackoverflow.com/a/27809987/5067311). In other words, the desktop machine has BLAS etc configured differently, most likely, possibly with a `OMP_NUM_THREADS` envvar set to 1. — Andras Deak, Jun 01 '17 at 09:05
You can also compare the libraries available to numpy and their settings by calling `numpy.show_config()` on both machines. — Andras Deak, Jun 01 '17 at 09:13
@MartijnPieters The issue with the full code inclusion is a discussed off-topic: https://meta.stackoverflow.com/questions/302078/collapsible-code-markup. I suggest to leave it out of this post. — Liran Funaro, Jun 01 '17 at 09:15
@LiranFunaro the obvious solution is to create a [MCVE]. Come up with a small runnable script that reproduces your issue. In this case, write one that keeps multiplying huge arrays with `numpy.dot`, and see if that produces the CPU use discrepancy. If it does, you have a verifiable piece of code. — Andras Deak, Jun 01 '17 at 09:17
As you guessed, the problem seems to be related with BLAS/LAPACK libreries. 'numpy.show_config()' tells that BLAS/LAPACK libraries are not available. I'm trying to fixing this, thanks! p.s. the question is still marked as duplicate, and I think this is not useful to the _stackoverflow_ community since the redirecting brings you to a totally different topic. @MartijnPieters it is not nice to use the duplicate marking as a threat: if you don't think that the question is well made mark it as inappropriate or remove it. — giubacchio, Jun 01 '17 at 09:44
giubacchio, this has nothing to do with threats. Questions that inherently can't be answered will be closed. @Martijn closed your question with a dupe that seemed relevant based on available information. We _could_ redupe your question to another one, but ultimately this question is unlikely to help future readers, so instead of playing around with duplicate targets we should delete it once your problem is solved. If you have a question whose formulatin and answer can benefit others in the future, don't hesitate to [edit] and the question will be reopened. — Andras Deak, Jun 01 '17 at 09:59
What I meant is that it was asked to me to edit the question in order to remove the duplicate marking, even if this edit wasn't related at all with changing the content of the question. This does not make so much sense. Anyway thank you for the explanation and for your help! — giubacchio, Jun 01 '17 at 10:23

Why python does (not) use more CPUs?

0 Answers0