For very small numbers, say 32 bits unsigned, testing all divisors up to the square root is a very decent approach. Some optimizations can be made to avoid trying all divisors, but these yield marginal improvements. The complexity remains $O(\sqrt n)$.

On the other hand, much faster primality tests are available, but they are pretty sophisticated and deploy their efficiency for much longer numbers.

Is there an intermediate solution, i.e. a relatively simple algorithm, that is of practical use for, say, 64 bits unsigned, with a target running time under 1 ms ?

I am not after micro-optimization of the exhaustive division method. I am after a better working principle, of a reasonable complexity (and of the deterministic type).

**Update:**

Using a Python version of the Miller-Rabin test from Rosetta code, the time for the prime $2^{64}-59=18446744073709551557$ is $0.7$ ms. (Though this is not a sufficient test because nothing says we are in a worst case.)

http://rosettacode.org/wiki/Miller%E2%80%93Rabin_primality_test#Python:_Proved_correct_up_to_large_N

And I guess that this code can be improved for speed.