This is mostly elaborating on user2566092's good answer. First note that at this time we're looking at $n < 10^{22}$ or so, probably $n < 2^{64}$, as otherwise even the fast prime counting methods start taking a very long time. Two things are important for the performance: a good estimate, and a good prime count method. Having a fast counting method for the endpoint is of course important also, but less than the other two. I don't believe doing more than one prime count should ever be necessary -- if it turns out faster, I believe that indicates the estimate or sieving need improvement.

For the estimate, the 2nd order Cipolla 1902 estimate from Wikipedia is not a bad start, but it's substantially worse than some other methods. You can add a small third order correction to the Cipolla formula which will reduce the error in about 1/4 at $10^{15}$. Much better is using an inverse Riemann R function, which is 3 orders of magnitude smaller error at $10^{15}$ -- off by only 11 million vs. 37 billion. I've found what works best for me is ${\rm Li}^{-1}(n) + {\rm Li}^{-1}(\sqrt{n})/4$, which will nearly always underestimate (nice for sieving) and gets almost as close as inverse R, while also being a little easier to calculate. Experiment with different estimates.

Once we've found an estimate, we run a fast prime count, e.g. Lehmer's method, LMO, or extended LMO. The latter has open source implementations able to calculate $10^{14}$ in a couple seconds, and $10^{16}$ in half a minute (timings will vary based on machine of course), with very little memory use.

Once we've done the prime count, we just need to count primes backwards or forwards to make up for the error. If we used a good estimate, the number should be quite small in relation to the number we did a prime count for. At this point it should also be noted that there are fast deterministic primality tests for numbers under $2^{64}$. Either BPSW, a 7-base Miller-Rabin test, or a 3-base hashed Miller-Rabin test will be completely accurate for all 64-bit numbers. I noted earlier that I use a tight but low estimate, so it rarely ever has to count backwards -- using repeated prev_prime calls works fairly well. For forwards search, which is the usual case, I do segment sieves and counts of the resulting blocks of bits. There are a number of optimizations possible here.

To give you an idea, it looks like nth_prime$(10^{12})$ takes 0.8sec, and nth_prime$(10^{15})$ about 1min 2sec on my computer, without using any tables. That's substantially faster than sieving primes up to ~3.7e16.

If you were going to count one at a time, a sieve would be the better choice. Even at large values using a traditional sieve, you can use the sieve to efficiently reduce segments so only a few candidates need to be run through a primality test. In my opinion, the Sieve of Eratosthenes is superior to the Sieve of Atkin, even in the rare cases when the latter is properly implemented. Segmenting is important for large sizes. If you just want to get the job done rather than implement it yourself, look at primesieve. It's *really* fast, well maintained, and has nice interfaces for C and C++.

There is one other alternative, a hybrid table approach. This is used by the Nth Prime Page, and there was some discussion of putting it in SAGE years ago. Here rather than using a prime count function, we just store large tables of prime counts or nth prime positions, do the lookup, then sieve the difference. The downside is that it doesn't scale well to large values (e.g. the Nth Prime Page is limited to $10^{12}$). I use a method like this for twin prime counts, and even with only ~100 values it can save a *lot* of time. If your values aren't too large, and you don't want to implement all the work for a fast computed nth prime, this may be something to consider. You just need some tables (compiled or loaded on demand) and fast segment sieving.