Why is float() faster than int()?

Question

Experimenting with some code and doing some microbenchmarks I just found out that using the float function on a string containing an integer number is a factor 2 faster than using int on the same string.

>>> python -m timeit int('1')
1000000 loops, best of 3: 0.548 usec per loop

>>> python -m timeit float('1')
1000000 loops, best of 3: 0.273 usec per loop

It gets even stranger when testing int(float('1')) which runtime is shorter than the bare int('1').

>>> python -m timeit int(float('1'))
1000000 loops, best of 3: 0.457 usec per loop

I tested the code under Windows 7 running cPython 2.7.6 and Linux Mint 16 with cPython 2.7.6.

I have to add that only Python 2 is affected, Python 3 shows a way smaller (not remarkable) difference between the runtimes.

I know that the information I get by such microbenchmarks are easy to misuse, but I'm curious why there is such a difference in the functions' runtime.

I tried to find the implementations of int and float but I can not find it in the sources.

Can't replicate difference `jakob@devbox:~$ python -m timeit "int("1")" 10000000 loops, best of 3: 0.104 usec per loop` `jakob@devbox:~$ python -m timeit "float("1")" 10000000 loops, best of 3: 0.106 usec per loop` — Jakob Bowyer, Jan 20 '14 at 11:27
@JakobBowyer: I can, in Python 2.7. In Python 3.3, the times are much closer together. — Martijn Pieters, Jan 20 '14 at 11:27
@MartijnPieters Im using Python 2.7.3, Each time its showing that int is faster or near identical to float. — Jakob Bowyer, Jan 20 '14 at 11:28
@JakobBowyer: Interesting. This is 2.7.5 on OS X. Perhaps there was a regression.. — Martijn Pieters, Jan 20 '14 at 11:29
@JakobBowyer: 2.7.1 on Mac actually has a *larger* difference still. — Martijn Pieters, Jan 20 '14 at 11:30
2.7.4 (32bit) on Windows gives nearly the same time for int and float (int slightly faster). Same result for a Linux system running 2.7.3 and 3.3.0. — Matthias, Jan 20 '14 at 11:31
@wim because it's a perfectly good opportunity to learn something new? Maybe the knowledge we gain will be too narrow to be applicable to anything else, but that will never stop curious people — loopbackbee, Jan 20 '14 at 11:55
@JakobBowyer: That's most likely because you messed up the syntax. You nested double quotes without any escaping. I'm not sure what that does, but a quick test seems to indicate that's why you're not seeing it. — user2357112 supports Monica, Jan 20 '14 at 11:56
I suspect finding the real answer to this question will require profiling the C implementation of the functions involved. A quick look at the source didn't reveal anything that would obviously be the culprit, but then again, I didn't really expect to see anything screaming "time-waster". — user2357112 supports Monica, Jan 20 '14 at 11:59
As for the source code: find it in the `Objects/` subdirectory of the source tree. Current 2.7 branch mercurial links: [`floatobject.c`](http://hg.python.org/cpython/file/2.7/Objects/floatobject.c) and [`intobject.c`](http://hg.python.org/cpython/file/2.7/Objects/intobject.c). Look for the `float_new` and `int_new` functions. — Martijn Pieters, Jan 20 '14 at 12:00
@JakobBowyer: I can reproduce your timings if I use your exact command line **which is flawed**. You are not escaping the quotes, so you are parsing a *literal `int` value of `1`*, not a string! — Martijn Pieters, Jan 20 '14 at 12:03
A float is in many ways a simpler object than a python int, so the question in the title is not interesting at all (apples and oranges). Now `int(float(my_string))` being faster than `int` is a curiosity, but I expect it will be due to some boring reason such as the float validators coming from whatever optimised code running very close to the hardware level that's been around for decades. — wim, Jan 20 '14 at 12:24
For the record, I can reproduce your findings on 2.7.5, but on 3.3.2 I have the bare int faster by almost a factor of 2. — wim, Jan 20 '14 at 12:26
`int(float("1"))` is essentially `float("1")` followed by `int(1.0)`. Both operations are with 0.10 usec and 0.08 usec still faster than straight `int("1")` with 0.26 usec. `float(int("1"))` is of course slower still (0.37 usec). — Adaephon, Jan 20 '14 at 13:14
`int(1.0)` uses a dedicated 'as integer' hook on the `float` object to do the conversion. For a float with no decimals that operation is straightforward. — Martijn Pieters, Jan 20 '14 at 13:17

score 16 · Answer 1 · edited Jan 20 '14 at 13:29

16

int has lots of bases.

*, 0*, 0x*, 0b*, 0o* and it can be long, it takes time to determine the base and other things

if the base is set, it saves a lot of time

python -m timeit "int('1',10)"       
1000000 loops, best of 3: 0.252 usec per loop

python -m timeit "int('1')"   
1000000 loops, best of 3: 0.594 usec per loop

as @Martijn Pieters metions the code the Object/intobject.c(int_new) and Object/floatobject.c(float_new)

edited Jan 20 '14 at 13:29

senshin

8,994
5
40
55

answered Jan 20 '14 at 12:47

michaeltang

2,752
13
17

3

This doesn't explain why `int('1')` in Python 3 is as fast as `float('1')`, however. – Martijn Pieters Jan 20 '14 at 12:51
5

The base determination is just scanning the first character; if it is *not* `0` then `base` is set to 10. That hardly makes it use double the time. Python 3 has the *exact same test*, so this cannot be the cause. – Martijn Pieters Jan 20 '14 at 12:55
It seems that all integers are long integers in python 3.3 (it doesn't exist `intobject.c`). http://docs.python.org/3.3/c-api/long.html vs. http://docs.python.org/2.7/c-api/long.html and http://docs.python.org/2.7/c-api/int.html – xbello Jan 20 '14 at 14:03
I checked with a profiler. Figuring out what base to use is not what's taking the time. @MartijnPieters is correct, `PyInt_FromString` + `PyOS_strtoul` only takes 5% of the total time for the `int` version. Is it possible that the massive extra interpreter overhead is coming from figuring out whether to call the one-arg or two-arg version of `int`? +1 for discovering that setting an explicit base speeds it up, but -1 for the incorrect guess that it's actually scanning the string for the base that takes time. I can repro that effect. – Peter Cordes Mar 21 '18 at 15:53
@PeterCordes: there is no one or two argument version of `int()`. All arguments are passed to the function as an array and the C code parses out what values it needs using a helper function. – Martijn Pieters Mar 21 '18 at 16:32
@MartijnPieters: Then is the C code for `int()` maybe checking whether it's an array of length 2 vs. 1? Any thoughts from the profile results in my answer, showing the "self" time broken down by function name for float, `int("1")` and `int("1",10)`? – Peter Cordes Mar 21 '18 at 16:37
@PeterCordes: you'd have to find another command that doesn't alter behaviour between number of arguments passed, and time that. Argument parsing is highly optimised and [available as a general API](https://docs.python.org/3/c-api/arg.html), so I highly doubt that that is the culprit here. – Martijn Pieters Mar 21 '18 at 16:45
@PeterCordes: I've looking into this; it's a simple question of a `getattr()` test for `__trunc__`. – Martijn Pieters Mar 21 '18 at 18:46
3

Just to be explicit: this answer is entirely wrong. `int("1")` never looks for a base, it will always use base 10. You need to explicitly set the second argument to `0` for Python to even look for a base character in the input string, **and this case is faster than the base case of `int('1')`**. And there's no support for parsing an int literal with an `l` at the end, there's no 'long' parsing (`str(1L)` produces `'1'`, not `'1L'`, no need for a reverse there). – Martijn Pieters Mar 21 '18 at 19:20
On top of that floats have a hexadezimal form too. Surprisingly though float() does no accept those: `printf "%a\n" 1.23 ==> 0x9.d70a3d70a3d70a4p-3` but `float("0x9.d70a3d70a3d70a4p-3") ==> ValueError` – Goswin von Brederlow Jul 09 '18 at 10:08

Martijn Pieters · Answer 2 · 2018-03-21T23:56:04.223

int() has to account for more possible types to convert from than float() has to. When you pass a single object to int() and it is not already an integer, then various things are tested for:

if it is an integer already, use it directly
if the object implements the __int__ method, call it and use the result
if the object is a C-derived subclass of int, reach in and convert the C integer value in the structure to an int() object.
if the object implements the __trunc__ method, call it and use the result
if the object is a string, convert it to an integer with the base set to 10.

None of these tests are executed when you pass in a base argument, the code then jumps straight to converting a string to an int, with the selected base. That’s because there are no other accepted types, not when there is a base given.

As a result, when you pass in a base, suddenly creating an integer from a string is a lot faster:

$ bin/python -m timeit "int('1')"
1000000 loops, best of 3: 0.469 usec per loop
$ bin/python -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.277 usec per loop
$ bin/python -m timeit "float('1')"
1000000 loops, best of 3: 0.206 usec per loop

When you pass a string to float(), the first test made is to see if the argument is a string object (and not a subclass), at which point it is being parsed. There’s no need to test other types.

So the int('1') call makes a few more tests than int('1', 10) or float('1'). Of those tests, tests 1, 2, and 3 are quite fast; they are just pointer checks. But the fourth test uses the C equivalent of getattr(obj, '__trunc__'), which is relatively expensive. This has to test the instance, and the full MRO of the string, and there is no cache, and in the end it raises an AttributeError(), formatting an error message that no-one will ever see. All work that's pretty useless here.

In Python 3, that getattr() call has been replaced with code that is a lot faster. That's because in Python 3, there is no need to account for old-style classes so the attribute can be looked up directly on the type of the instance (the class, the result of type(instance)), and class attribute lookups across the MRO are cached at this point. No exceptions need to be created.

float() objects implement the __int__ method, which is why int(float('1')) is faster; you never hit the __trunc__ attribute test at step 4 as step 2 produced the result instead.

If you wanted to look at the C code, for Python 2, look at the int_new() method first. After parsing the arguments, the code essentially does this:

if (base == -909)  // no base argument given, the default is -909
    return PyNumber_Int(x);  // parse an integer from x, an arbitrary type. 
if (PyString_Check(x)) {
    // do some error handling; there is a base, so parse the string with the base
    return PyInt_FromString(string, NULL, base);
}

The no-base case calls the PyNumber_Int() function, which does this:

if (PyInt_CheckExact(o)) {
    // 1. it's an integer already
    // ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
    // 2. it has an __int__ method, return the result
    // ...
}
if (PyInt_Check(o)) { /* An int subclass without nb_int */
    // 3. it's an int subclass, extract the value
    // ...
}
trunc_func = PyObject_GetAttr(o, trunc_name);
if (trunc_func) {
    // 4. it has a __trunc__ method, call it and process the result
    // ...
}
if (PyString_Check(o))
    // 5. it's a string, lets parse!
    return int_from_string(PyString_AS_STRING(o),
                           PyString_GET_SIZE(o));

where int_from_string() is essentially a wrapper for PyInt_FromString(string, length, 10), so parsing the string with base 10.

In Python 3, intobject was removed, leaving only longobject, renamed to int() on the Python side. In the same vein, unicode has replaced str. So now we look at long_new(), and testing for a string is done with PyUnicode_Check() instead of PyString_Check():

if (obase == NULL)
    return PyNumber_Long(x);

// bounds checks on the obase argument, storing a conversion in base

if (PyUnicode_Check(x))
    return PyLong_FromUnicodeObject(x, (int)base);

So again when no base is set, we need to look at PyNumber_Long(), which executes:

if (PyLong_CheckExact(o)) {
    // 1. it's an integer already
    // ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
    // 2. it has an __int__ method
    // ...
}
trunc_func = _PyObject_LookupSpecial(o, &PyId___trunc__);
if (trunc_func) {
    // 3. it has a __trunc__ method
    // ...
}
if (PyUnicode_Check(o))
    // 5. it's a string
    return PyLong_FromUnicodeObject(o, 10);

Note the _PyObject_LookupSpecial() call, this is the special method lookup implementation; it eventually uses _PyType_Lookup(), which uses a cache; since there is no str.__trunc__ method that cache will forever return a null after the first MRO scan. This method also never raises an exception, it just returns either the requested method or a null.

The way float() handles strings is unchanged between Python 2 and 3, so you only need to look at the Python 2 float_new() function, which for strings is pretty straightforward:

// test for subclass and retrieve the single x argument
/* If it's a string, but not a string subclass, use
   PyFloat_FromString. */
if (PyString_CheckExact(x))
    return PyFloat_FromString(x, NULL);
return PyNumber_Float(x);

So for string objects, we jump straight to parsing, otherwise use PyNumber_Float() to look for actual float objects, or things with a __float__ method, or for string subclasses.

This does reveal a possible optimisation: if int() were to first test for PyString_CheckExact() before all those other type tests it would be just as fast as float() when it comes to strings. PyString_CheckExact() rules out a string subclass that has a __int__ or __trunc__ method so is a good first test.

To address other answers blaming this on base parsing (so looking for a 0b, 0o, 0 or 0x prefix, case insensitively), the default int() call with a single string argument does look for a base, the base is hardcoded to 10. It is an error to pass in a string with a prefix in that case:

>>> int('0x1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0x1'

Base prefix parsing is only done if you explicitly set the second argument to 0:

>>> int('0x1', 0)
1

Because no testing is done for __trunc__ the base=0 prefix parsing case is just as fast as setting base explicitly to any other supported value:

$ python2.7 -m timeit "int('1')"
1000000 loops, best of 3: 0.472 usec per loop
$ python2.7 -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.268 usec per loop
$ python2.7 bin/python -m timeit "int('1', 0)"
1000000 loops, best of 3: 0.271 usec per loop
$ python2.7 bin/python -m timeit "int('0x1', 0)"
1000000 loops, best of 3: 0.261 usec per loop

Peter Cordes · Answer 3 · 2018-03-21T15:58:39.487

This is not a full answer, just some data and observations.

Profiling results from x86-64 Arch Linux, Python 2.7.14, on a 3.9GHz Skylake i7-6700k running Linux 4.15.8-1-ARCH. float: 0.0854 usec per loop. int: 0.196 usec per loop. (So about a factor of 2)

float

$ perf record python2.7 -m timeit 'float("1")'
10000000 loops, best of 3: 0.0854 usec per loop

Samples: 14K of event 'cycles:uppp', Event count (approx.): 13685905532
Overhead  Command    Shared Object        Symbol
  29.73%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
   8.54%  python2.7  libpython2.7.so.1.0  [.] _Py_dg_strtod
   8.30%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   5.81%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   4.79%  python2.7  libpython2.7.so.1.0  [.] PyFloat_FromString
   4.67%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   4.16%  python2.7  libpython2.7.so.1.0  [.] float_new.lto_priv.219
   3.93%  python2.7  libpython2.7.so.1.0  [.] _PyOS_ascii_strtod
   3.54%  python2.7  libc-2.26.so         [.] __strchr_avx2
   3.34%  python2.7  libpython2.7.so.1.0  [.] PyOS_string_to_double
   3.21%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   3.05%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   2.69%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.15%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   1.88%  python2.7  itertools.so         [.] _init
   1.78%  python2.7  libpython2.7.so.1.0  [.] _Py_set_387controlword
   1.19%  python2.7  libpython2.7.so.1.0  [.] _Py_get_387controlword
   1.10%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   1.07%  python2.7  libpython2.7.so.1.0  [.] PyType_IsSubtype
   1.07%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   ...

IDK why the heck Python is messing around with the x87 control word, but yes, the tiny _Py_get_387controlword function really runs fnstcw WORD PTR [rsp+0x6] and then reloads it into eax as an integer return value with movzx, but probably spends more of its time writing and checking the stack canary from -fstack-protector-strong.

It's weird because _Py_dg_strtod uses SSE2 (cvtsi2sd xmm1,rsi) for FP math, not x87. (The hot part with this input is mostly integer, but there are mulsd and divsd in there.) x86-64 code normally only uses x87 for long double (80-bit float). dg_strtod stands for David Gay's string to double. Interesting blog post about how it works under the hood.

Note that this function only takes 9% of the total run time. The rest is basically interpreter overhead, compared to a C loop that called strtod in a loop and threw away the result.

int

$ perf record python2.7 -m timeit 'int("1")'
10000000 loops, best of 3: 0.196 usec per loop

$ perf report -Mintel
Samples: 32K of event 'cycles:uppp', Event count (approx.): 31257616633
Overhead  Command    Shared Object        Symbol
  29.00%  python2.7  libpython2.7.so.1.0  [.] PyString_FromFormatV
  13.11%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
   5.49%  python2.7  libc-2.26.so         [.] __strlen_avx2
   3.87%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   3.68%  python2.7  libpython2.7.so.1.0  [.] PyNumber_Int
   3.10%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromString
   2.75%  python2.7  libpython2.7.so.1.0  [.] PyErr_Restore
   2.68%  python2.7  libc-2.26.so         [.] __strchr_avx2
   2.41%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   2.10%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.00%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtoul
   1.93%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   1.87%  python2.7  libpython2.7.so.1.0  [.] _PyObject_GenericGetAttrWithDict
   1.73%  python2.7  libpython2.7.so.1.0  [.] PyString_FromStringAndSize
   1.71%  python2.7  libc-2.26.so         [.] __memmove_avx_unaligned_erms
   1.67%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   1.63%  python2.7  libpython2.7.so.1.0  [.] PyObject_Malloc
   1.48%  python2.7  libpython2.7.so.1.0  [.] int_new.lto_priv.68
   1.45%  python2.7  libpython2.7.so.1.0  [.] PyErr_Format
   1.45%  python2.7  libpython2.7.so.1.0  [.] PyObject_Realloc
   1.37%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   1.30%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtol
   1.23%  python2.7  libpython2.7.so.1.0  [.] _PyString_Resize
   1.16%  python2.7  libc-2.26.so         [.] __ctype_b_loc
   1.11%  python2.7  libpython2.7.so.1.0  [.] _PyType_Lookup
   1.06%  python2.7  libpython2.7.so.1.0  [.] PyString_AsString
   1.04%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   1.02%  python2.7  libpython2.7.so.1.0  [.] PyObject_Free
   0.93%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromLong
   0.90%  python2.7  libpython2.7.so.1.0  [.] PyObject_GetAttr
   0.52%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   0.52%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   0.48%  python2.7  itertools.so         [.] _init
   ...

Notice that PyEval_EvalFrameEx takes 13% of the total time for int, vs. 30% of the total for float. That's about the same absolute time, and PyString_FromFormatV is taking twice as much time. Plus more functions taking more small chunks of time.

I haven't figured out what PyInt_FromString does, or what it's spending its time on. 7% of its cycle counts are charged to a movdqu xmm0, [rsi] instruction near the start; i.e. loading a 16-byte arg that was passed by reference (as the 2nd function arg). This may be getting more counts than it deserves if whatever stored that memory was slow to produce it. (See this Q&A for more about how cycle counts get charge to instructions on out-of-order execution Intel CPUs where lots of different work is in flight every cycle.) Or maybe it's getting counts from a store-forwarding stall if that memory was written recently with separate narrower stores.

It's surprising that strlen is taking so much time. From looking at the instruction profile within it, it's getting short strings, but not exclusively 1-byte strings. Looks like a mix of len < 32 bytes and 64 < len >= 32 bytes. Might be interesting to set a breakpoint in gdb and see what args are common.

The float version has a strchr (maybe looking for a . decimal point?), but no strlen of anything. It's surprising that the int version has to redo a strlen inside the loop at all.

The actual PyOS_strtoul function takes 2% of the total time, run from PyInt_FromString (3% of the total time). These are "self" times, not including their children, so allocating memory and deciding on the number base is taking more time than parsing the single digit.

An equivalent loop in C would run ~50x faster (or maybe 20x if we're generous), calling strtoul on a constant string and discarding the result.

int with explicit base

For some reason this is as fast as float.

$ perf record python2.7 -m timeit 'int("1",10)'
10000000 loops, best of 3: 0.0894 usec per loop

$ perf report -Mintel
Samples: 14K of event 'cycles:uppp', Event count (approx.): 14289699408
Overhead  Command    Shared Object        Symbol
  30.84%  python2.7  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
  12.56%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords
   6.70%  python2.7  libpython2.7.so.1.0  [.] PyInt_FromString
   5.19%  python2.7  libpython2.7.so.1.0  [.] tupledealloc.lto_priv.335
   5.17%  python2.7  libpython2.7.so.1.0  [.] int_new.lto_priv.68
   4.12%  python2.7  libpython2.7.so.1.0  [.] lookdict_string.lto_priv.1492
   4.08%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtoul
   3.78%  python2.7  libc-2.26.so         [.] __strchr_avx2
   3.29%  python2.7  libpython2.7.so.1.0  [.] type_call.lto_priv.51
   3.26%  python2.7  libpython2.7.so.1.0  [.] PyTuple_New
   3.09%  python2.7  libpython2.7.so.1.0  [.] PyOS_strtol
   3.06%  python2.7  libpython2.7.so.1.0  [.] PyObject_Call
   2.49%  python2.7  libpython2.7.so.1.0  [.] PyArg_ParseTupleAndKeywords
   2.01%  python2.7  libpython2.7.so.1.0  [.] PyType_IsSubtype
   1.65%  python2.7  libc-2.26.so         [.] __strlen_avx2
   1.52%  python2.7  libpython2.7.so.1.0  [.] object_init.lto_priv.86
   1.19%  python2.7  libpython2.7.so.1.0  [.] vgetargskeywords.cold.59
   1.03%  python2.7  libpython2.7.so.1.0  [.] PyInt_AsLong
   1.00%  python2.7  libpython2.7.so.1.0  [.] PyString_Size
   0.99%  python2.7  libpython2.7.so.1.0  [.] PyObject_GC_UnTrack
   0.87%  python2.7  libc-2.26.so         [.] __ctype_b_loc
   0.85%  python2.7  libc-2.26.so         [.] __memset_avx2_unaligned_erms
   0.47%  python2.7  itertools.so         [.] _init

The profile by function looks pretty similar to the float version, too.

`PyString_FromFormatV` is used indirectly by `PyErr_Format`, called to format the attribute error raised for the `getattr(str, '__trunc__')` call. Formatting a string triggers a few `strlen` calls. — Martijn Pieters, Mar 21 '18 at 19:40

Why is float() faster than int()?

3 Answers3

float

int

int with explicit base

Linked

Related