How to profile my code?

Question

I want to know how to profile my code.

I have gone through the docs, but as there were no examples given I could not get anything from it.

I have a large code and it is taking so much time, hence I want to profile and increase its speed. I havent written my code in method, there are few in between but not completely. I don't have any main in my code. I want to know how to use profiling. I'm looking for some example or sample code of about how to profile.

I tried psyco, i.e just addded two lines at the top of my code:

import psyco
psyco.full()

Is this right? It did not show any improvement. Any other way of speeding up, please suggest.

"but could not succeed" Means nothing. Provide specific, concrete code you used and errors you got. Please be **specific**. — S.Lott, Jun 15 '10 at 13:30
psyco doesn't always mean improved performance. If all your application does is wait for IO, faster execution won't make a difference. You need to profile first and optimize later. — Mattias Nilsson, Jun 18 '10 at 07:38

score 66 · Accepted Answer · edited May 23 '17 at 11:45

66

The standard answer to this question is to use cProfile.

You'll find though that without having your code separated out into methods that cProfile won't give you particularly rich information.

Instead, you might like to try what another poster here calls Monte Carlo Profiling. To quote from another answer:

If you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes.

Caveat: programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack. Call graphs don't give you the same information, because 1) they don't summarize at the instruction level, and 2) they give confusing summaries in the presence of recursion. They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find [emphasis added].

It's not orthodox, but I've used it very successfully in a project where profiling using cProfile was not giving me useful output.

The best thing about it is that this is dead easy to do in Python. Simply run your Python script in the interpreter, press [Control-C], note the traceback and repeat a number of times.

edited May 23 '17 at 11:45

Community

1
1

answered Jun 18 '10 at 08:11

fmark

50,804
25
88
106

8

I find it weird to say that this is "not orthodox"; I mean, this is seriously how profiling tools such as gprof actually work. From the gprof manual: "Profiling also involves watching your program as it runs, and keeping a histogram of where the program counter happens to be every now and then." and "The run-time figures that gprof gives you are based on a sampling process, so they are subject to statistical inaccuracy.". – Jay Freeman -saurik- Oct 21 '11 at 12:48
2

We use this same technique to profile embedded C applications. – David Poole Oct 21 '11 at 15:18
14

[joke] "My code seems to spend all its time in the KeyboardInterrupt handler." – David Poole Oct 21 '11 at 15:19
That is also the technology that plop https://github.com/bdarnell/plop profiler is using. – Vivian De Smedt May 08 '15 at 03:26
One downside of this method is that the GIL (global interpreter lock) means that this can fail horribly if you are using multiple threads. Basically, signals are only processed by the main thread which prevents ^C from having useful output. (or working for that matter) – Oscar Smith Jun 29 '16 at 14:16
@JayFreeman-saurik- and others: There is a huge difference between this method and what stack-sampling profilers do, not on the input but on the *output*. The difference on the output is that every profiler makes numerical summaries of various kinds. They do not let the user *examine the actual samples*, complete with call sites. If the program is doing something unnecessary, and the sample is in it, it is clear in examining the sample. How many? If you see the problem twice, you know you've found it, and the bigger it is, the fewer samples you need. (gprof is not a stack-sampler.) – Mike Dunlavey Sep 30 '16 at 12:28
@OscarSmith: [*This user*](http://stackoverflow.com/a/317160/23771) doesn't think so. – Mike Dunlavey Sep 30 '16 at 12:32
@VivianDeSmedt: See my comment to Jay above. (I forgot to mention that examining samples finds any speedup a profiler can find, plus ones that they don't, because the reason a program is doing something unnecessary often cannot be seen without examining all the information in a sample.) – Mike Dunlavey Sep 30 '16 at 13:05

campos.ddc · Answer 2 · 2016-09-29T19:39:21.860

Edit:

This answer has been implemented in https://github.com/campos-ddc/cprofile_graph

Profiling with cProfile

Here's a post I wrote some time ago on profiling with cProfile with some graphical aid.

cProfile is one of the most used python profilers out there, and although very powerful, the standard text output is somewhat lackluster. Here I'll show you how to use cProfile on your application in an easier way.

There are two common ways to use cProfile, you can use it as a command in prompt to profile a given module, or you can use it inside your code, to profile specific snippets of code.

Profiling a module

To use cProfile to profile an entire module, simply use the following command in your prompt:

python -m cProfile -o output_filename.pstats path/to/script arg1 arg2

This will run your module with the given arguments (they are optional) and dump the output in output_filename.pstats.

There are lots of ways to read the data on that output file, but for the purpose of this post, let's not worry about those and focus on getting that graphical visualization.

Profiling from inside

Sometimes you don't want to profile an entire module, just a few lines of it.

To do so, you are gonna have to add some code to your module.

First of all:

import cProfile

And then, you can replace any segment of code with the following:

cProfile.runctx('Your code here', globals(), locals(), 'output_file')

For example, here is a test before and after profiling:

import unittest

class Test(unittest.TestCase):

    def testSomething(self):
        self.DoSomethingIDontCareAbout()

        param = 'whatever'
        self.RunFunctionIThinkIsSlow(param)

        self.AssertSomeStuff() # This is after all, a test

After:

import unittest
import cProfile

class Test(unittest.TestCase):

    def testSomething(self):
        self.DoSomethingIDontCareAbout()

        param = 'whatever'
        cProfile.runctx(
            'self.RunFunctionIThinkIsSlow(param)',
            globals(),
            locals(),
            'myProfilingFile.pstats'
        )

        self.AssertSomeStuff() # This is after all, a test

Converting a pstats file to a graph

To convert your profiling file to a graph, you will need a couple of things:

gprof2dot: This module will convert your output into a dot file, a standard file format for graph descriptions.
GraphViz: It turns your dot file into an image.

After you have downloaded gprof2dot and installed GraphViz, run this command in your prompt:

python gprof2dot -f pstats myProfileFile | dot -Tpng -o image_output.png

You might have to use a complete path for gprof2dot and/or dot, or you could add them to your PATH env variable.

After all of this, you should have an image that looks kinda like this:

results example

Hotter colors (red, orange, yellow) indicate functions that take up more of the total runtime than colder colors (green, blue)
On each node, you can see what percentage of the total runtime that function used and how many times it was called.
Arrows between nodes indicate which function called other functions, and such arrows also have a caption indicating what percentage of the runtime came through there.

Note: percentages won't always add up to 100%, especially on code sections that reference C++ code, which won't be profiled. cProfile also won't be able to determine what's called from inside an "eval" statement, so you might see some jumps in your graph.

Did you originally post this on a blog? If so, do you have a link? — joshua.r.smith, Apr 22 '16 at 21:54
@joshua.r.smith it was an internal company blog so I guess you can just use the permalink from stackoverflow :) — campos.ddc, Apr 23 '16 at 09:49
A call graph like that is a beautiful example of pretty pixels that don't find speedups. All they do is give you a general idea of how busy some routines are, and most routines are not very busy. It is trivially easy for speedups to [*hide in all that*](http://stackoverflow.com/a/25870103/23771). — Mike Dunlavey, Sep 29 '16 at 20:03
Sure, there can be stuff hiding in there for reasons explained in your post. My personal experience shows that these kinds of graphs are still very useful and will quickly point you to somewhere you can do 80% of the optimizations. If that fails, feel free to run your code 100 times, randomly interrupting, and reading the call stack :) — campos.ddc, Sep 29 '16 at 21:23
@campos.ddc: I've never had to take more than 20 samples. 10 is often enough to find something. I could have one problem taking 10%, and that would require 20 samples on average to see it twice. But empirically there is always a bigger one lurking, that a profiler might not find, but examining samples certainly does. After the bigger one is removed (suppose 33%) now the 10% problem is 15%, so it only takes 13 samples (on average) to see it twice. — Mike Dunlavey, Sep 30 '16 at 12:57

score 6 · Answer 3 · answered Jun 18 '10 at 07:23

6

Use cProfile. You can use it from the command line and pass in your module as a parameter, so you don't need a main method.

answered Jun 18 '10 at 07:23

Johannes Charra

26,855
5
39
47

can u just give an example of cmd to be used.. do i need to import anything..is cprofile part of python 2.6 – kaki Jun 18 '10 at 07:26
The link posted points to something called "Instant users manual", which is actually a good starting point. It also contains examples. – Mattias Nilsson Jun 18 '10 at 07:35
1

I linked the docs, you can find the example command `python -m cProfile myscript.py` there. – Johannes Charra Jun 18 '10 at 07:36

How to profile my code?

3 Answers3

Edit:

Profiling with cProfile

Profiling a module

Profiling from inside

Converting a pstats file to a graph

Linked

Related