int()
has to account for more possible types to convert from than float()
has to. When you pass a single object to int()
and it is not already an integer, then various things are tested for:
- if it is an integer already, use it directly
- if the object implements the
__int__
method, call it and use the result
- if the object is a C-derived subclass of
int
, reach in and convert the C integer value in the structure to an int()
object.
- if the object implements the
__trunc__
method, call it and use the result
- if the object is a string, convert it to an integer with the base set to 10.
None of these tests are executed when you pass in a base argument, the code then jumps straight to converting a string to an int, with the selected base. That’s because there are no other accepted types, not when there is a base given.
As a result, when you pass in a base, suddenly creating an integer from a string is a lot faster:
$ bin/python -m timeit "int('1')"
1000000 loops, best of 3: 0.469 usec per loop
$ bin/python -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.277 usec per loop
$ bin/python -m timeit "float('1')"
1000000 loops, best of 3: 0.206 usec per loop
When you pass a string to float()
, the first test made is to see if the argument is a string object (and not a subclass), at which point it is being parsed. There’s no need to test other types.
So the int('1')
call makes a few more tests than int('1', 10)
or float('1')
. Of those tests, tests 1, 2, and 3 are quite fast; they are just pointer checks. But the fourth test uses the C equivalent of getattr(obj, '__trunc__')
, which is relatively expensive. This has to test the instance, and the full MRO of the string, and there is no cache, and in the end it raises an AttributeError()
, formatting an error message that no-one will ever see. All work that's pretty useless here.
In Python 3, that getattr()
call has been replaced with code that is a lot faster. That's because in Python 3, there is no need to account for old-style classes so the attribute can be looked up directly on the type of the instance (the class, the result of type(instance)
), and class attribute lookups across the MRO are cached at this point. No exceptions need to be created.
float()
objects implement the __int__
method, which is why int(float('1'))
is faster; you never hit the __trunc__
attribute test at step 4 as step 2 produced the result instead.
If you wanted to look at the C code, for Python 2, look at the int_new()
method first. After parsing the arguments, the code essentially does this:
if (base == -909) // no base argument given, the default is -909
return PyNumber_Int(x); // parse an integer from x, an arbitrary type.
if (PyString_Check(x)) {
// do some error handling; there is a base, so parse the string with the base
return PyInt_FromString(string, NULL, base);
}
The no-base case calls the PyNumber_Int()
function, which does this:
if (PyInt_CheckExact(o)) {
// 1. it's an integer already
// ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
// 2. it has an __int__ method, return the result
// ...
}
if (PyInt_Check(o)) { /* An int subclass without nb_int */
// 3. it's an int subclass, extract the value
// ...
}
trunc_func = PyObject_GetAttr(o, trunc_name);
if (trunc_func) {
// 4. it has a __trunc__ method, call it and process the result
// ...
}
if (PyString_Check(o))
// 5. it's a string, lets parse!
return int_from_string(PyString_AS_STRING(o),
PyString_GET_SIZE(o));
where int_from_string()
is essentially a wrapper for PyInt_FromString(string, length, 10)
, so parsing the string with base 10.
In Python 3, intobject
was removed, leaving only longobject
, renamed to int()
on the Python side. In the same vein, unicode
has replaced str
. So now we look at long_new()
, and testing for a string is done with PyUnicode_Check()
instead of PyString_Check()
:
if (obase == NULL)
return PyNumber_Long(x);
// bounds checks on the obase argument, storing a conversion in base
if (PyUnicode_Check(x))
return PyLong_FromUnicodeObject(x, (int)base);
So again when no base is set, we need to look at PyNumber_Long()
, which executes:
if (PyLong_CheckExact(o)) {
// 1. it's an integer already
// ...
}
m = o->ob_type->tp_as_number;
if (m && m->nb_int) { /* This should include subclasses of int */
// 2. it has an __int__ method
// ...
}
trunc_func = _PyObject_LookupSpecial(o, &PyId___trunc__);
if (trunc_func) {
// 3. it has a __trunc__ method
// ...
}
if (PyUnicode_Check(o))
// 5. it's a string
return PyLong_FromUnicodeObject(o, 10);
Note the _PyObject_LookupSpecial()
call, this is the special method lookup implementation; it eventually uses _PyType_Lookup()
, which uses a cache; since there is no str.__trunc__
method that cache will forever return a null after the first MRO scan. This method also never raises an exception, it just returns either the requested method or a null.
The way float()
handles strings is unchanged between Python 2 and 3, so you only need to look at the Python 2 float_new()
function, which for strings is pretty straightforward:
// test for subclass and retrieve the single x argument
/* If it's a string, but not a string subclass, use
PyFloat_FromString. */
if (PyString_CheckExact(x))
return PyFloat_FromString(x, NULL);
return PyNumber_Float(x);
So for string objects, we jump straight to parsing, otherwise use PyNumber_Float()
to look for actual float
objects, or things with a __float__
method, or for string subclasses.
This does reveal a possible optimisation: if int()
were to first test for PyString_CheckExact()
before all those other type tests it would be just as fast as float()
when it comes to strings. PyString_CheckExact()
rules out a string subclass that has a __int__
or __trunc__
method so is a good first test.
To address other answers blaming this on base parsing (so looking for a 0b
, 0o
, 0
or 0x
prefix, case insensitively), the default int()
call with a single string argument does look for a base, the base is hardcoded to 10. It is an error to pass in a string with a prefix in that case:
>>> int('0x1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0x1'
Base prefix parsing is only done if you explicitly set the second argument to 0
:
>>> int('0x1', 0)
1
Because no testing is done for __trunc__
the base=0
prefix parsing case is just as fast as setting base
explicitly to any other supported value:
$ python2.7 -m timeit "int('1')"
1000000 loops, best of 3: 0.472 usec per loop
$ python2.7 -m timeit "int('1', 10)"
1000000 loops, best of 3: 0.268 usec per loop
$ python2.7 bin/python -m timeit "int('1', 0)"
1000000 loops, best of 3: 0.271 usec per loop
$ python2.7 bin/python -m timeit "int('0x1', 0)"
1000000 loops, best of 3: 0.261 usec per loop