-1

I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in operator. My simple testing would conclude that it does not:

%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop

%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher 
# value does take longer.

%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop

Is there a detailed explanation of why 999 takes longer than 0. Is the in operator like a loop?

Also, is there a way to tell the in operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?

Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in?

MattR
  • 4,014
  • 5
  • 30
  • 57
  • `Is the in operator like a loop?` Yes, when you call `in` with a list, a C loop is invoked, I believe. – cs95 Jan 04 '18 at 22:24
  • Also, `in` is a conditional operator, not a logical operator, so "short circuiting" does not apply here. – cs95 Jan 04 '18 at 22:25
  • @cᴏʟᴅsᴘᴇᴇᴅ, ah! okay. I can't find that in the docs when I Google it. do you have any citations on this? – MattR Jan 04 '18 at 22:26
  • 3
    `in list` will iterate through the list until it finds the element. If that's what you mean, it is short-circuited: it doesn't carry on looking through the list once the element has been found. – khelwood Jan 04 '18 at 22:29
  • 1
    I think [this answer](https://stackoverflow.com/a/30081318/289011) might be interesting too (it is slightly different, to what you're asking since you're converting to `list`, but it is a very interesting explanation on how `range` works) – BorrajaX Jan 04 '18 at 22:32
  • @khelwood, Thank you! – MattR Jan 04 '18 at 22:32
  • @BorrajaX No, because OP calls `list` on it. – cs95 Jan 04 '18 at 22:33
  • I made an edit before your comment @cᴏʟᴅsᴘᴇᴇᴅ Also, it is still interesting when it comes to range? Besides, isn't this what comments are for? – BorrajaX Jan 04 '18 at 22:34

3 Answers3

4

Short circuiting does occur. The in operator calls the __contains__ method, which in turn is implemented differently per class (in your case list). Searching for 999 takes around double the time as searching for 0, since half of the work is creating the list, and the other half is iterating through it, which is short circuited in the case of 0.

Hetzroni
  • 1,859
  • 1
  • 11
  • 27
  • ah! so the double in time would be due to `list(range(x))` being set up? – MattR Jan 04 '18 at 22:35
  • Yes. This is the case. – Hetzroni Jan 04 '18 at 22:36
  • `which in turn is implemented differently per class` - could you elaborate on this? I feel like there is something important packed inside this statement! – MattR Jan 04 '18 at 22:40
  • When creating a new class, Python supports operator overloading. Statements such as `a = A() + 5` can work by implementing `A.__add__(self, other)`. Similarly, when Python runs into `x in y`, it checks if `y`s class implemented the `__contains__ method` and if so, calls `y.__contains__(x)`. If, on the other hand, the method wasn't implemented, a `TypeError` exception will be thrown. – Hetzroni Jan 04 '18 at 22:44
  • @StefanPochmann, your comment is correct, my explanation was a simplified version. If `__contains__` isn't impleneted, then it'll try iterating through the object, looking for the requested value. This means it's enough to implement `__iter__`, or even just `__getitem__`. See here for details: https://docs.python.org/3/reference/datamodel.html#object.__contains__. – Hetzroni Jan 04 '18 at 22:53
2

The implementation of in for list objects is found in list_contains. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.

The loop involved is:

for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
    cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                       Py_EQ);

If cmp is 1 (the value returned from PyObject_RichCompareBool for a match), the for loop condition (cmp == 0 && i < Py_SIZE(a)) becomes false and terminates.

For list objects, which are built-in, what is called for in is a C function (for CPython). For other implementations of Python, this can be a different language using different language constructs.

For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.


You could also come to this conclusion by timing:

l = [*range(1000)]    
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The furthest the element the more you need to scan. If it didn't short-circuit, all in operations would result in similar timings.

Dimitris Fasarakis Hilliard
  • 119,766
  • 27
  • 228
  • 224
  • I believe I created a Red Herring when i wrapped `list()` around `range()`. with `l` already defined, this clears it up! Does `in` call different "functions" for different data types based on the latter half of the `in` operator? (please excuse my bad nomenclature if I've used the wrong terms) – MattR Jan 04 '18 at 22:46
  • @MattR yes, it does. You can take a look at [Membership test operations](https://docs.python.org/3.7/reference/expressions.html#membership-test-operations) in the Reference Manual for the full run-down of what is called (which is too much for a comment to include). – Dimitris Fasarakis Hilliard Jan 04 '18 at 22:50
  • I really appreciate the links to the documentation. I'll read up some more. – MattR Jan 04 '18 at 22:56
  • You're welcome @MattR. Thankfully Python's docs are quite good and readable. It's worth reading through them. – Dimitris Fasarakis Hilliard Jan 04 '18 at 22:59
  • It will also be educating to time the operations for the "range" object alone, without converting it to a list. – jsbueno Jan 05 '18 at 12:58
0

Here's another look with a hashed object, set:

from time import time

qlist = list(range(1000))
qset = set(qlist)

start = time()
for i in range(1000):
    0 in qlist
print time() - start

start = time()
for i in range(1000):
    999 in qlist
print time() - start

start = time()
for i in range(1000):
    0 in qset
print time() - start

start = time()
for i in range(1000):
    999 in qset
print time() - start

Output:

0.000172853469849    0 in list
0.0399038791656    999 in list
0.000147104263306i   0 in set
0.000195980072021  999 in set

As others have said, the list implementation must do a sequential search. Set inclusion uses a hashed value, and is on par with finding the item in the first element checked.

Prune
  • 72,213
  • 14
  • 48
  • 72
  • `in` is faster in a set than a list, but what does it show regarding short-circuiting? – Hetzroni Jan 04 '18 at 22:39
  • I appreciate the optimization piece. – MattR Jan 04 '18 at 22:39
  • 1
    @Loquacious: this merely highlights the effect of early loop exit with a list; set **in** doesn't short-circuit at all, because the reference is direct. – Prune Jan 04 '18 at 22:53