Why doesn't generating primes in Python using sets in this way work?

Question

I have a standard algorithm that I use to generate prime numbers (probably not the most efficient but it works as well as I need it to) and I thought I'd optimise it a little by using sets instead of lists. However when I use a set I get numbers such as 143 and 989 which aren't prime and I'm not sure why this is. Below is my standard code that I use to generate primes with ''' marking where I would use a set instead of a list.

import math


prime = [2, 3]                        # prime = set([1, 2])    


def prime_checker(a):
    b = int(math.sqrt(a)) + 1
    for j in prime:
        if a % j == 0:
            return False
        if j > b:
            break   

for i in range (5, 100000, 2):
    if prime_checker(i) != False:
        prime.append(i)               # prime.add(i)

For some reason this always seems to work if I only look for primes less than 100, but once I extend this bound I start getting non-primes.

in case you are interested: here is an answer for an efficient sieve of eratosthenes implementation in python: http://stackoverflow.com/a/31122596/4954037 . yes, there are other more sophisticated approaches than eratosthenes. — hiro protagonist, Dec 30 '16 at 09:52

niemmi · Answer 1 · 2016-12-30T08:00:54.933

3

When you're using set you're iterating over the primes in random order. That means that even first prime you process could be greater than b which causes the loop to terminate and number to added to result.

If you remove the early break you'll get the same result:

def prime_checker(a):
    #b = int(math.sqrt(a)) + 1
    for j in prime:
        if a % j == 0:
            return False
        #if j > b:
        #    break

edited Dec 30 '16 at 08:00

answered Dec 30 '16 at 07:46

niemmi

16,078
7
28
35

You should remove line 2 from the code as `b` does not get used without the second `if`. I would edit it but its too short. – SudoKid Dec 30 '16 at 07:55

Blckknght · Answer 2 · 2016-12-30T08:07:30.027

Sets are unordered, but (due to implementation details in the standard Python interpreter CPython), sets that only include sequential (or nearly sequential) small integers will often have their values in order, especially if that's the order they were added to the set.

However, as the contents of a set become more sparse (as a set of primes will), the chances of larger values occurring between the smaller items in iteration order increases. Your algorithm relies on the order of the primes in your prime sequence, since you stop looping after you see the first prime larger than sqrt(a). So as the set grows, it's more and more likely that your prime check will not work correctly.

It's not clear to me why you'd want to use a set anyway. You're not doing any membership testing against the prime list, so a list works just as well as a set. If you need a set for later, you may want to generate it after you've finished building the list (or use a different algorithm).

score 0 · Answer 3 · answered Dec 30 '16 at 07:50

0

To be a prime a number needs to be only dividable by itself and one - so adding a check for this will solve. You get a square root from 143 -> b = 12 + 1 = 13, but it's also divided by 11 and this is not checked, you terminate before verifying it.

answered Dec 30 '16 at 07:50

dmitryro

3,202
2
17
25

1

The `prime_checker` function relies on the `primes` list containing all the primes smaller than `sqrt(a)`. – Blckknght Dec 30 '16 at 08:04

score 0 · Answer 4 · answered Dec 30 '16 at 08:17

As you only add to the prime list after 2,3 are already there and run over 5-1000000, you need to

def prime_checker(a):
    for j in range(a): 
        if a % j == 0:
            return False
        #if j > b:
        #break   
    # we've iterated and verified
    return True

Otherwise you never check the numbers that are in prime, as they are not actually added.

score 0 · Answer 5 · edited May 23 '17 at 12:00

As others have said, Python sets generally don't maintain the order that the items were added in, so your loop condition may attempt to check divisibility by a prime that's too high for the current a. You can use a set here, but you need to use a range (or some other way of producing trial factors in order) and then you can test if those trial factors are in the set of primes.

Here's a modified version of your code that does this. It prints the contents of the primes set so you can see that they aren't ordered.

Note that I'm using a ** 0.5 to calculate the square root, rather than importing the sqrt function from the math library. Not only does this save doing an import, operators tend to be faster than function calls.

primes = {2, 3}

def prime_checker(a):
    if a in primes:
        return True
    for j in range(2, int(a ** 0.5) + 1):
        if j in primes and a % j == 0:
            return False
    return True

hi = 200
for i in range (5, hi, 2):
    if prime_checker(i):
        primes.add(i)

print(len(primes))
print(primes)
print(*sorted(primes))

output

46
{2, 3, 131, 5, 7, 137, 11, 139, 13, 17, 19, 149, 23, 151, 29, 157, 31, 163, 37, 167, 41, 43, 173, 47, 179, 53, 181, 59, 61, 191, 193, 67, 197, 71, 199, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127}
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199

The and operator short-circuits, which means that in

if j in primes and a % j == 0:

if the j in primes test fails then the a % j == 0 test won't be performed.

BTW, your prime_checker function returns False if a number fails the primality test, but it returns the default of None if the test suceeds; that's a bit weird. :) Both False and None are treated as false-ish, i.e. if v is False or None then if a: will fail.

The Python PEP-0008 style guide recommends that you do not explicitly test against True or False:

Don't compare boolean values to True or False using ==.

Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:

i.e., instead of

if some_condition == False:

you should do

if not some_condition:

Of course, that won't work in your code because of the False / None issue I mentioned above.

FWIW, there's some great prime-finding code at Fastest way to list all primes below N. That question is fairly old, so most of the answers are for Python 2, but they're easy enough to convert to Python 3.

why is `if greeting is True:` bad (no, worse!) according to pep8? — hiro protagonist, Dec 30 '16 at 09:49
@hiroprotagonist There's some discussion of that at http://stackoverflow.com/questions/4050335/strange-pep8-recommandation-on-comparing-boolean-values-to-true-or-false and the questions it links to. Briefly, the `is` test should **only** be used when you explicitly need to test _identity_ of objects, not for testing object's values. — PM 2Ring, Dec 30 '16 at 10:08
ok, thanks. (i always found this to be the most readable and unmistakable version and consistend with `is None`; i'll try to avoid it in the future). — hiro protagonist, Dec 30 '16 at 10:14
@hiroprotagonist No worries. Sure, `if greeting is True:` is consistent with `if greeting is None:` and doing identity testing can be faster than value testing. However, `if greeting is True:` breaks duck-typing, i.e., the test fails if `greeting` is some other truthy value, and you generally don't want that. — PM 2Ring, Dec 30 '16 at 10:21

score 0 · Answer 6 · answered Dec 30 '16 at 09:43

0

Another way you can check if a number is prime or not:

import math

def is_prime(n):
    if n % 2 == 0 and n > 2: 
        return False
    return all(n % i for i in range(3, int(math.sqrt(n)) + 1, 2))

answered Dec 30 '16 at 09:43

Mohammad Yusuf

13,560
7
38
68

Why doesn't generating primes in Python using sets in this way work?

6 Answers6