4

I was playing around with generating Hamming numbers in Haskell, trying to improve on the obvious (pardon the naming of the functions)

mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
                               EQ -> x : mergeUniq xs ys
                               LT -> x : mergeUniq xs (y:ys)
                               GT -> y : mergeUniq (x:xs) ys

powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
  where
    expand factor = (factor *) <$> powers

I noticed that I can avoid the (slower) arbitrary precision Integer if I represent the numbers as the triple of the 2-, 3- and 5-exponents like data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int }, where the number is understood to be 2k2 * 3k3 * 5k5. The comparison of two Powers then becomes

instance Ord Power where
  p1 `compare` p2 = toComp (p1 `divP` gcdP) `compare` toComp (p2 `divP` gcdP)
    where
    divP p1 p2 = Power { k2 = k2 p1 - k2 p2, k3 = k3 p1 - k3 p2, k5 = k5 p1 - k5 p2 }
    gcdP = Power { k2 = min (k2 p1) (k2 p2), k3 = min (k3 p1) (k3 p2), k5 = min (k5 p1) (k5 p2) }
    toComp Power { .. } = fromIntegral k2 * log 2 + fromIntegral k3 * log 3 + fromIntegral k5 * log 5

So, very roughly speaking, to compare p₁ = 2i₁ * 3j₁ * 5k₁ and p₂ = 2i₂ * 3j₂ * 5k₂ we compare the logarithms of p₁ and p₂, which presumably fit Double. But actually we do even better: we first compute their GCD (by finding the mins of the corresponding exponents pairs — only Int arithmetic so far!), divide p₁ and p₂ by the GCD (by subtracting the mins from the corresponding exponents — also only Int arithmetic), and compare the logarithms of the results.

But, given that we go through Doubles, there will be loss of precision eventually. And this is the ground for my questions:

  1. When will the finite precision of Doubles bite me? That is, how to estimate the order of i, j, k for which the results of comparisons of 2i * 3j * 5k with numbers with "similar" exponents will become unreliable?
  2. How does the fact that we go through dividing by the GCD (which presumably lowers the exponents considerably for this task) modify the answer to the previous question?

I did an experiment, comparing the numbers produced this way with the numbers produced via going through arbitrary precision arithmetic, and all Hamming numbers up to the 1'000'000'000th match exactly (which took me about 15 minutes and 600 megs of RAM to verify). But that's obviously not a proof.

Will Ness
  • 62,652
  • 8
  • 86
  • 167
0xd34df00d
  • 1,364
  • 1
  • 5
  • 17
  • 1
    Is your question 1 what is the smallest number x of the form 2^i•3^j•5^k such that there is another number y in that form, and x < y, such that converting log x and log y to the nearest `Double` values yields X and Y such that Y ≤ X, and hence x is not distinguishable from y by comparing logarithms in `Double`? And question 2 is similar except that each exponent of 2, 3, or 5 is non-zero in at most one of x or y? What base is used for the logarithm? (The effect of the base may be tiny, but it can have rounding errors that might affect where the first failure occurs.) – Eric Postpischil Mar 22 '20 at 19:55
  • What is the magnitude of the billionth Hamming number? – Eric Postpischil Mar 22 '20 at 20:00
  • Or, rather, we do not directly have the logarithms of x and y in `Double`, but we have them calculated using `Double` arithmetic from the logarithms of 2, 3, and 5 (each multiplied by the exponents and them summed)? Do you have the logarithms of 2, 3, and 5 as the nearest representable values in `Double` (some math libraries may have greater errors, although logarithms are easier to calculate than some of the transcendental functions)? – Eric Postpischil Mar 22 '20 at 20:18
  • @EricPostpischil a [literal answer](https://stackoverflow.com/a/37833043/849891) to your [rhetorical question](https://stackoverflow.com/questions/60803224/hamming-numbers-and-double-precision#comment107578388_60803224). :) – Will Ness Mar 22 '20 at 21:11
  • 1
    the answer is, if memory serves (but do check [the RosettaCode page](https://rosettacode.org/wiki/Hamming_numbers#Direct_calculation_through_triples_enumeration)), somewhere in the trillionths's, or likely even higher. your GCD trick is nice but unfortunately there *will* be some triplets to compare that have no common factors, so in the end my guess is it won't matter. I mention this issue somewhere IIRC either here on SO in [some](https://stackoverflow.com/search?q=%5Bsmooth-numbers%5D+user%3A849891) [answer](https://stackoverflow.com/search?q=[hamming-numbers]+user%3A849891), or on Rosetta. – Will Ness Mar 22 '20 at 21:34
  • I see that this issue is discussed in depth in posts on Rosetta by another user, under the one I linked. check it out. :) – Will Ness Mar 22 '20 at 21:56
  • @EricPostpischil I think your formulation of q1 is exactly what I meant! Re q2, I think that's also correct. The logarithms are natural, and yes, we don't calculate the logs of x and y directly, but we rely on `log(2)`, `log(3)` and `log(5)`. Re nearest representable values, how do I verify? – 0xd34df00d Mar 22 '20 at 22:05
  • 2
    [this answer](https://stackoverflow.com/a/12041774/849891) directly answers your question. it mentions 14 significant digits are used in calculating the trillionth hamming number. – Will Ness Mar 22 '20 at 22:09
  • @WillNess thanks for lots of pointers! The last one indeed answers what's the limit, but I'm still not sure how do you arrive at that estimate? So this is more of a question about reasoning about floating point really (something that I do really badly). And yeah, I think I haven't proved but I've convinced myself there will be coprime numbers needing to be compared — thanks for making me think about that! – 0xd34df00d Mar 22 '20 at 22:15
  • IIRC I just ran the code, and printed out the band's log values' *differences* (or found the minimum one maybe). so no reasoning, just empirical observations. but we *could* think about proper estimation here too. we should be able to estimate the density of the hamming numbers at a given magnitude; and compare it to the [density of floating point double precision numbers](https://stackoverflow.com/questions/7006510/density-of-floating-point-number-magnitude-of-the-number) -- from the standard. what's it called, IEEE (which I know next to nothing about :) ) – Will Ness Mar 22 '20 at 22:22
  • ... that's the *average* density of the *logarithms* of the hamming numbers at a given magnitude, of course. but,,,, who's to guarantee *some* pair of consecutive H. numbers won't be much closer than that?? this question might be more suited for the math.SE actually. – Will Ness Mar 22 '20 at 23:16
  • @WillNess: Ah, geez, if you go around pointing to the answers to problems, that deprives us of the opportunity to work on them. Isn’t that what Stack Overflow is for? – Eric Postpischil Mar 22 '20 at 23:57
  • @EricPostpischil that answer is only partial, observational. the true proof is still needed. I'll give a 200 bounty to such an answer (sans Force Majeure of course, these days one never knows). please ping me if you post one. :) – Will Ness Mar 23 '20 at 07:37

2 Answers2

2

Empirically, it's above about 10 trillionths Hamming number, or higher.

Using your nice GCD trick won't help us here, because some neighboring Hamming numbers are bound to have no common factors between them.

update: trying it online on ideone and elsewhere, we get

4T  5.81s 22.2MB  -- 16 digits used.... still good
                  --  (as evidenced by the `True` below), but really pushing it.
((True,44531.6794,7.275957614183426e-11),(16348,16503,873),"2.3509E+13405")
-- isTruly  max        min logval           nth-Hamming       approx.
--  Sorted   logval      difference          as i,j,k          value
--            in band      in band                             in decimal
10T   11.13s 26.4MB
((True,60439.6639,7.275957614183426e-11),(18187,23771,1971),"1.4182E+18194")
13T   14.44s 30.4MB    ...still good
((True,65963.6432,5.820766091346741e-11),(28648,21308,1526),"1.0845E+19857")

---- same code on tio:
10T   16.77s
35T   38.84s 
((True,91766.4800,5.820766091346741e-11),(13824,2133,32112),"2.9045E+27624")
70T   59.57s
((True,115619.1575,5.820766091346741e-11),(13125,13687,34799),"6.8310E+34804")

---- on home machine:
100T: 368.13s
((True,130216.1408,5.820766091346741e-11),(88324,876,17444),"9.2111E+39198")

140T: 466.69s
((True,145671.6480,5.820766091346741e-11),(9918,24002,42082),"3.4322E+43851")

170T: 383.26s         ---FAULTY---
((False,155411.2501,0.0),(77201,27980,14584),"2.80508E+46783")
Will Ness
  • 62,652
  • 8
  • 86
  • 167
0

I guess that you could use adaptive arbitrary precision to compute the log.

If you choose log base 2, then log2(2^i) is trivial. That eliminates 1 factor and log2 has the advantage of being easier to compute than natural logarithm (https://en.wikipedia.org/wiki/Binary_logarithm gives an algorithm for example, there is also Shanks...).

For log2(3) and log2(5), you would develop just enough terms to distinguish both operands. I don't know if it would lead to more operations than directly exponentiating 3^j and 5^k in large integer arithmetic and counting high bit... But those could be pre-tabulated up to required number of digits.

aka.nice
  • 8,465
  • 1
  • 25
  • 39