5

Look at following code and please explain why the str.count('') method and len(str) function is giving two different outputs.

a=''
print(len(a))
print(a.count(''))

Output:

0
1
Dimitris Fasarakis Hilliard
  • 119,766
  • 27
  • 228
  • 224
liberal
  • 91
  • 1
  • 7

3 Answers3

16

str.count() counts non-overlapping occurrences of the substring:

Return the number of non-overlapping occurrences of substring sub.

There is exactly one such place where the substring '' occurs in the string '': right at the start. So the count should return 1.

Generally speaking, the empty string will match at all positions in a given string, including right at the start and end, so the count should always be the length plus 1:

>>> (' ' * 100).count('')
101

That's because empty strings are considered to exist between all the characters of a string; for a string length 2, there are 3 empty strings; one at the start, one between the two characters, and one at the end.

So yes, the results are different and they are entirely correct.

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • 3
    I'm not sure if "There is exactly one such place where the substring '' occurs in the string '': right at the start." is the right way to put it. CPython just seems to special case the `len(substr) == 0` and [return `len(str)+1`](https://github.com/python/cpython/blob/master/Objects/stringlib/count.h#L16) – Dimitris Fasarakis Hilliard Oct 22 '16 at 13:08
  • 2
    @JimFasarakis-Hilliard: I'm not sure if you have fully read my answer, but I'm saying exactly what the code there does. – Martijn Pieters Oct 22 '16 at 13:10
  • 3
    It's rather unclear to me what `count()` should return when passed the empty string. It's a matter of convention how to count the occurrences, and I think it would be reasonable to just throw a `ValueError` for this case. "Non-overlapping" means that the intersection of two occurrences is the empty string, so if you have two occurrences of the empty string at index 0, they are actually non-overlapping. – Sven Marnach Oct 22 '16 at 13:11
  • 1
    Yes yes I see that you did point out the `len + 1` part; I'm just wandering if there's merit for pointing out that no counting *actually* takes place when `substr` is empty (but, it's an implementation detail, so, maybe not). – Dimitris Fasarakis Hilliard Oct 22 '16 at 13:14
  • 1
    @JimFasarakis-Hilliard: Just because the actual implementation doesn't *need* to count doesn't mean that that's how the number is determined. – Martijn Pieters Oct 22 '16 at 13:15
  • 2
    @MartijnPieters Actually, I think there is more to it. I don't think it's possible to implement this without making the empty string a special case one way or the other. You'd usually find the first place where a string matches, increase your counter, and then continue searching where the match ends. For the empty string, this would give an infinite loop, consistent with my argument above. – Sven Marnach Oct 22 '16 at 13:19
  • 2
    @SvenMarnach: sure, but I'm not talking about the implementation. I'm talking about the principle. – Martijn Pieters Oct 22 '16 at 13:22
  • 3
    @MartijnPieters And I'm just underlining my point that there is no "natural" answer to the question how often the empty string appears as non-overlapping substring of another string. _Infinitely often_ is just as valid an answer as _string length plus one_. – Sven Marnach Oct 22 '16 at 13:30
  • @SvenMarnach yes you could make an argument for either outcome. But I know which one I'd rather have to code. – Mark Ransom Aug 10 '20 at 21:47
  • @MarkRansom The alternative would be throwing an exception when counting the number of occurrences of the empty string, which I think would be reasonable. My main point here is that the choice made by the implentors of `count()` is arbitrary, not that it is necessarily wrong. – Sven Marnach Aug 11 '20 at 07:01
3

.count('') counts the number of locations of zero-length strings. You could also think of this as the number of possible cursor positions.

"test".count('')

 t e s t
^ ^ ^ ^ ^

Instead of counting the number of characters (like len(str)), you're counting the number of anti-characters.

Ian MacDonald
  • 11,433
  • 1
  • 21
  • 41
  • That explanation is nice and intuitive for strings that have at least 2 characters. Below, it becomes dark. – Guimoute May 06 '20 at 23:52
1

Documentation:

Return the number of non-overlapping occurrences of subsequence sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

If we have a look at the implementation, we find that it call the function stringlib_count (source) which just goes on and return the length of the string plus one, when the length of the sub is zero:

if (sub_len == 0)
    return (str_len < maxcount) ? str_len + 1 : maxcount;

(source)

Note: maxcount is set to largest positive value of size_t.


Of course, that is just a short cirtcuit. If we skip that check, the code goes on to call FASTSEARCH.

How is FASTSHEARCH implemented? It goes on a loop, checking for every position if the string matches the sub at that position.

Since it is looking for an empty string, it will say that it matches in every position (at every position, it finds no characters that differ, up to the length of the sub).

Remember that it is looking in the inclusive range from start to end. Meaning that it will look in every position in the string, that is:

  • The start (before the first character)
  • Between each character pair (after each character, before the next one)
  • The end (after the last character)

That is one position per character (before each character) plus one (the end). Or if you prefer, it is one position per character (after each character) plus one (the start). In either case, it will return the length of the string plus one. The developers short circuited it to avoid doing the loop.

Theraot
  • 18,248
  • 4
  • 45
  • 72