10

My current algorithm to check the primality of numbers in python is way to slow for numbers between 10 million and 1 billion. I want it to be improved knowing that I will never get numbers bigger than 1 billion.

The context is that I can't get an implementation that is quick enough for solving problem 60 of project Euler: I'm getting the answer to the problem in 75 seconds where I need it in 60 seconds. http://projecteuler.net/index.php?section=problems&id=60

I have very few memory at my disposal so I can't store all the prime numbers below 1 billion.

I'm currently using the standard trial division tuned with 6k±1. Is there anything better than this? Do I already need to get the Rabin-Miller method for numbers that are this large.

primes_under_100 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
def isprime(n):
    if n <= 100:
        return n in primes_under_100
    if n % 2 == 0 or n % 3 == 0:
        return False

    for f in range(5, int(n ** .5), 6):
        if n % f == 0 or n % (f + 2) == 0:
            return False
    return True

How can I improve this algorithm?

Precision: I'm new to python and would like to work with python 3+ only.


Final code

For those who are interested, using MAK's ideas, I generated the following code which is about 1/3 quicker, allowing me to get the result of the euler problem in less than 60 seconds!

from bisect import bisect_left
# sqrt(1000000000) = 31622
__primes = sieve(31622)
def is_prime(n):
    # if prime is already in the list, just pick it
    if n <= 31622:
        i = bisect_left(__primes, n)
        return i != len(__primes) and __primes[i] == n
    # Divide by each known prime
    limit = int(n ** .5)
    for p in __primes:
        if p > limit: return True
        if n % p == 0: return False
    # fall back on trial division if n > 1 billion
    for f in range(31627, limit, 6): # 31627 is the next prime
        if n % f == 0 or n % (f + 4) == 0:
            return False
    return True
Zero Piraeus
  • 47,176
  • 24
  • 135
  • 148
Olivier Grégoire
  • 28,397
  • 21
  • 84
  • 121
  • I knew it under the name Python 3 or Python 3.1, but it looks like Py3k references these versions. – Olivier Grégoire Dec 28 '10 at 10:18
  • shouldn't it be `f` and `f+4`...Could you confirm? why the `4`? – st0le Dec 29 '10 at 07:24
  • Well more clearly I use a base 7 + 6k instead of 5 + 6k so I need to use +0, +4, +6, +10, etc. instead of +0, +2, +6, +8. The advantage is that I don't test for 31625. – Olivier Grégoire Dec 29 '10 at 13:54
  • 3
    WARNING: The purely python example (his first snippet) does not work for all primes. The line `for f in range(5, int(n ** .5), 6):` should be `for f in range(5, int(n ** .5) + 1, 6):`; as it exits (too early) before it can show that the number is divisible by the square root of itself. – deceleratedcaviar Aug 17 '12 at 10:13
  • And? The second example works, is efficient the question has been proven to be useful. There's no reason to downvote on that basis only. I explicitly asked to "improve the algorithm" stating that I was new at Python programming. This means that I had issue with this one. Downvoting on that basis only is against what SO promotes (2 downvotes in 1 day since you posted your comment after 2 months of inactivity on this question, there _is_ a relation). This question was clearly proper for SO at the time I wrote it. Anyways, I'll fix the algorithm in the first snippet. Example numbers are welcome. – Olivier Grégoire Aug 19 '12 at 00:43
  • 1
    @ogregoire: I don't know whether Daniel actually downvoted you, but I did find his warning useful, I almost used the first snippet because I just needed a quick and dirty isprime function and the second didn't work for me out of the box on Python 2.x ;) – Joseph Garvin Jan 12 '13 at 16:41
  • Get yourself gmpy2, and use gmpy2.is_prime(n) – xylon97 Mar 28 '13 at 09:35
  • related: [Fastest way to list all primes below N](http://stackoverflow.com/q/2068372/4279) – jfs Sep 21 '15 at 18:09

5 Answers5

12

For numbers as large as 10^9, one approach can be to generate all primes up to sqrt(10^9) and then simply check the divisibility of the input number against the numbers in that list. If a number isn't divisible by any other prime less than or equal to its square root, it must itself be a prime (it must have at least one factor <=sqrt and another >= sqrt to not be prime). Notice how you do not need to test divisibility for all numbers, just up to the square root (which is around 32,000 - quite manageable I think). You can generate the list of primes using a sieve.

You could also go for a probabilistic prime test. But they can be harder to understand, and for this problem simply using a generated list of primes should suffice.

MAK
  • 24,585
  • 9
  • 50
  • 82
5

For solving Project Euler problems I did what you suggest in your question: Implement the Miller Rabin test (in C# but I suspect it will be fast in Python too). The algorithm is not that difficult. For numbers below 4,759,123,141 it is enough to check that a number is a strong pseudo prime to the bases 2, 7, 61. Combine that with trial division by small primes.

I do not know how many of the problems you have solved so far, but having a fast primality test at your disposal will be of great value for a lot of the problems.

Tomasz Gandor
  • 5,982
  • 2
  • 45
  • 46
Peter van der Heijden
  • 10,577
  • 1
  • 37
  • 55
  • Okay, in that case, what do you call small primes? What is the limit I should set? – Olivier Grégoire Dec 28 '10 at 10:16
  • @Frór: You would have to experiment to find an optimal value, but I would start by trying all primes below 100 or so. IIRC it might even be that I ended up skipping trial division for all values except the bases (in this case 2, 7, 61). – Peter van der Heijden Dec 28 '10 at 10:19
  • 1
    [Python: Proved correct up to large N](https://rosettacode.org/wiki/Miller%E2%80%93Rabin_primality_test#Python:_Proved_correct_up_to_large_N) – P i Sep 03 '17 at 20:22
  • Well, 4,759,123,141 is very little... Can be checked by even dividing by odd numbers up to sqrt in no time. But thanks for the link @Pi - I still don't understand why there is no `np.miller_rabin` function (or `scipy` if this is too scientific). – Tomasz Gandor Nov 27 '18 at 18:10
1

You can first divide your n only by your primes_under_100.

Also, precompute more primes.

Also, you're actually store your range() result in memory - use irange() instead and use this memory for running Sieve of Eratosthenes algorithm.

crazylammer
  • 1,082
  • 8
  • 7
1

Well, I have a follow-up to my comment under (very good) Peter Van Der Heijden's answer about there being nothing good for really large primes (numbers in general) in the "popular" Python libraries. Turns out I was wrong - there is one in sympy (great library for symbolic algebra, among others):

https://docs.sympy.org/latest/modules/ntheory.html#sympy.ntheory.primetest.isprime

Of course, it may yield false positives above 10**16, but this is already much better than anything else I could get doing nothing (except maybe pip install sympy ;) )

Tomasz Gandor
  • 5,982
  • 2
  • 45
  • 46
  • SymPy 1.1 (July 2017) switched to BPSW, so there are no false positives for any 64-bit inputs. In some cases it will use deterministic Miller-Rabin but they have also been verified up to 2^64. For 64-bit inputs, the only way to be "better" than this is optimizing pre-tests to make it go faster. For larger inputs there isn't a compelling benefit for most purposes to do more (a long discussion). – DanaJ Nov 28 '18 at 18:10
  • 1
    Thanks for the update! I first read old sources of sympy 0.x, and then linked to the newest docs. It doesn't change my point that sympy is great, it's just better than I thought ;) – Tomasz Gandor Nov 28 '18 at 23:07
  • Indeed, for most purposes I think this is the right answer. The OP is doing Project Euler so this is probably not right for the earlier problems, but maybe nice for later ones and of course any other practical use. – DanaJ Nov 30 '18 at 06:08
-2
def isprime(num):
if (num==3)or(num==2):
    return(True)
elif (num%2 == 0)or(num%5 == 0):
    return (False)
elif ((((num+1)%6 ==0) or ((num-1)%6 ==0)) and (num>1)):
    return (True)
else:
    return (False)

I think this code is the fastest ..