Integer distance

Question

As a single operation between two positive integers we understand multiplying one of the numbers by some prime number or dividing it by such (provided it can be divided by this prime number without without the remainder). The distance between a and b denoted as d(a,b) is a minimal amount of operations needed to transform number a into number b. For example, d(69,42)=3.

Keep in mind that our function d indeed has characteristics of the distance - for any positive ints a, b and c we get:

a) d(a,a)==0

b) d(a,b)==d(b,a)

c) the inequality of a triangle d(a,b)+d(b,c)>=d(a,c) is fulfilled.

You'll be given a sequence of positive ints a_1, a_2,...,a_n. For every a_i of them output such a_j (j!=i) that d(a_i, a_j) is as low as possible. For example, the sequenceof length 6: {1,2,3,4,5,6} should output {2,1,1,2,1,2}.

This seems really hard to me. What I think would be useful is:

a) if a_i is prime, we are unable to make anything less than a_i (unless it's 1) so the only operation alllowed is multiplication. Therefore, if we have 1 in our set, for every prime number d(this_number, 1) is the lowest.

b) also, for 1 d(1, any_prime_number) is the lowest.

c) for a non-prime number we check if we have any of its factors in our set or multiplication of its factors

That's all I can deduce, though. The worst part is I know it will take eternity for such algorithm to run and check all the possibilities... Could you please try to help me with it? How should this be done?

You might do better here: http://math.stackexchange.com/ – Hot Licks Nov 13 '11 at 13:12 — Hot Licks, Nov 13 '11 at 13:12

Vlad · Accepted Answer · 2011-11-14T19:22:27.567

5

Indeed, you can represent any number N as 2^n1 * 3^n2 * 5^n3 * 7^n4 * ... (most of the n's are zeroes).

This way you set a correspondence between a number N and infinite sequence (n1, n2, n3, ...).

Note that your operation is just adding or subtracting 1 at exactly one of the appropriate sequence's places.

Let N and M be two numbers, and their sequences be (n1, n2, n3, ...) and (m1, m2, m3, ...). The distance between the two numbers is indeed nothing but |n1 - m1| + |n2 - m2| + ...

So, in order to find out the closest number, you need to calculate the sequences for all the input numbers (this is just decomposing them into primes). Having this decomposition, the calculation is straightforward.

Edit:
In fact, you don't need the exact position of your prime factor: you just need to know, which is the exponent for each of the prime divisors.

Edit:
this is the simple procedure for converting the number into the chain representation:

#include <map>

typedef std::map<unsigned int, unsigned int> ChainRepresentation;
// maps prime factor -> exponent, default exponent is of course 0

void convertToListRepresentation(int n, ChainRepresentation& r)
{
    // find a divisor
    int d = 2;

    while (n > 1)
    {
        for (; n % d; d++)
        {
            if (n/d < d) // n is prime
            {
                r[n]++;
                return;
            }
        }

        r[d]++;
        n /= d;
    }
}

Edit:
... and the code for distance:

#include <set>

unsigned int chainDistance(ChainRepresentation& c1, ChainRepresentation& c2)
{
    if (&c1 == &c2)
        return 0; // protect from modification done by [] during self-comparison

    int result = 0;

    std::set<unsigned int> visited;
    for (ChainRepresentation::const_iterator it = c1.begin(); it != c1.end(); ++it)
    {
        unsigned int factor = it->first;
        unsigned int exponent = it->second;
        unsigned int exponent2 = c2[factor];
        unsigned int expabsdiff = (exponent > exponent2) ?
                       exponent - exponent2 : exponent2 - exponent;
        result += expabsdiff;
        visited.insert(factor);
    }

    for (ChainRepresentation::const_iterator it = c2.begin(); it != c2.end(); ++it)
    {
        unsigned int factor = it->first;
        if (visited.find(factor) != visited.end())
            continue;
        unsigned int exponent2 = it->second;
        // unsigned int exponent = 0;
        result += exponent2;
    }

    return result;
}

edited Nov 14 '11 at 19:22

answered Nov 13 '11 at 13:12

Vlad

33,616
5
74
185

Thank you! But how should I factorize them? Is there some easy to understand and implement factorization algorithm? – Lucas T Nov 13 '11 at 13:16
Well, for the big numbers the factorization can be complicated, but for the small numbers (say, less than 1000000) it's simple: (1) you try all the numbers between 2 and sqrt(N): if N is divisible by the current number, you found one prime divisor. (2) if you didn't find any divisors, your N is prime, and the factorization is just (0, 0, ..., 1, 0, ...). (3) if you found a divisor k, repeat the operation for N/k, finding the other divisors. – Vlad Nov 13 '11 at 13:19
Hmm... But how should I store those? I am given an upper bound of 100 000 numbers each being not greater than a million. Having a separate array for every one with a cell for every prime number (possible factor) being less than it will be huge. – Lucas T Nov 13 '11 at 13:22
Oh, you added that I just need to know their exponents. Indeed, it's true, but they have to be in a specific cells so that the algorithm knows which should be substracted from which, don't they? So we still need as many cells as there are primes lower than 1 000 000 for every of 100 000 numbers. – Lucas T Nov 13 '11 at 13:28
@LucasT: You don't need to store the factorizations in an array. You can use an ordered list of prime->exponent pairs. This way the factorization lengths are bounded by the inverse of the factorial functions, or 9 for N = 1000000 – hugomg Nov 13 '11 at 13:31
Sorry, but I'm not too fluent in STL which I believe an ordered list is a part of. Can't this be realized with some simpler data structure? – Lucas T Nov 13 '11 at 13:34
An array of length 9, containing pairs (structs) with the prime and exponent. Ordered by prime factor. see, no STl – hugomg Nov 13 '11 at 13:37
So realizing this with a "normal" array (I mean, without some "structs" you mentioned) is impossible? I mean, possible but can be done only in such a way that it takes vast amounts of memory? – Lucas T Nov 13 '11 at 13:40
1

Lets take 42=2*3*7 we can encode it as [1, 1, 0, 1, ...] or as [(2,1), (3,1), (7,1)]. In the first list you need 78000 slots to accout for all primes less then a million. In the second case you only need to store at most 9 pairs. – hugomg Nov 13 '11 at 13:49
@Lucas: well, you can have _two_ arrays :) or just a `std::map` (maps prime number to its exponent) if you are ok with STL – Vlad Nov 13 '11 at 15:10
:) Yeah but doesn't having two arrays make a step back to having a cell for every prime? We would need array[2][first_number], array[2][second_number] etc. and then [3][first_number], [3][second_number] and so on, wouldn't we? – Lucas T Nov 13 '11 at 15:13
Could you please help? I really have no idea on how to implement this not to use huge amount of memory... – Lucas T Nov 13 '11 at 15:50
@LucasT: You can use the `map` if you don't want to use 2 arrays. But it's possible to use 2 arrays without needing a cell per prime number: Suppose 42 is represented as the pair `p1[] = [2, 3, 7]` and `m1[] = [1, 1, 1]`, and 69 is represented as the pair `p2[] = [3, 23]` and `m2[] = [1, 1]`. Then in order to figure out the distance between 42 and 69, you need to do a *list merge* on `p1[]` and `p2[]`. This takes only linear time since they're both already sorted in increasing order. – j_random_hacker Nov 14 '11 at 13:53
@Lucas: anyway, you should better use `std::map`. It's much simpler than merging arrays manually. I'll include some code in the answer. – Vlad Nov 14 '11 at 18:47

score 2 · Answer 2 · edited May 23 '17 at 10:31

For the given limits: 100_000 numbers not greater than a million the most-straightforward algorithm works (1e10 calls to distance()):

For each number in the sequence print its closest neighbor (as defined by minimal distance):

solution = []
for i, ai in enumerate(numbers):
    all_except_i = (aj for j, aj in enumerate(numbers) if j != i)
    solution.append(min(all_except_i, key=lambda x: distance(x, ai)))
print(', '.join(map(str, solution)))

Where distance() can be calculated as (see @Vlad's explanation):

def distance(a, b):
    """
    a = p1**n1 * p2**n2 * p3**n3 ...
    b = p1**m1 * p2**m2 * p3**m3 ...

    distance = |m1-n1| + |m2-n2| + |m3-n3| ...
    """
    diff = Counter(prime_factors(b))
    diff.subtract(prime_factors(a))
    return sum(abs(d) for d in diff.values())

Where prime_factors() returns prime factors of a number with corresponding multiplicities {p1: n1, p2: n2, ...}:

uniq_primes_factors = dict(islice(prime_factors_gen(), max(numbers)))

def prime_factors(n):
    return dict(multiplicities(n, uniq_primes_factors[n]))

Where multiplicities() function given n and its factors returns them with their corresponding multiplicities (how many times a factor divides the number without a remainder):

def multiplicities(n, factors):
    assert n > 0
    for prime in factors:
        alpha = 0 # multiplicity of `prime` in `n`
        q, r = divmod(n, prime)
        while r == 0: # `prime` is a factor of `n`
            n = q
            alpha += 1
            q, r = divmod(n, prime)
        yield prime, alpha

prime_factors_gen() yields prime factors for each natural number. It uses Sieve of Eratosthenes algorithm to find prime numbers. The implementation is based on gen_primes() function by @Eli Bendersky:

def prime_factors_gen():
    """Yield prime factors for each natural number."""
    D = defaultdict(list) # nonprime -> prime factors of `nonprime`
    D[1] = [] # `1` has no prime factors
    for q in count(1): # Sieve of Eratosthenes algorithm
        if q not in D: # `q` is a prime number
            D[q + q] = [q]
            yield q, [q]
        else: # q is a composite
            for p in D[q]: # `p` is a factor of `q`: `q == m*p`
                # therefore `p` is a factor of `p + q == p + m*p` too
                D[p + q].append(p)
            yield q, D[q]
            del D[q]

See full example in Python.

Output

2, 1, 1, 2, 1, 2

Thank you very much but would it be a big problem for you to show this in C++, for example? It's the very first time i ever see Python, honestly... — Lucas T, Nov 13 '11 at 15:17

score 1 · Answer 3 · answered Nov 13 '11 at 13:15

Without bounds on how large your numbers can be and how many numbers can be on the input, we can't really deduce it will take "an eternity" to complete. I am tempted to suggest the most "obvious" solution I can think of

Given the factorization of the numbers it is very easy to find their distance
```
60 = (2^2)*(3^1)*(5^1)*(7^0)
42 = (2^1)*(3^1)*(5^0)*(7^1)
distance = 3
```
Calculating this factorization using the naive trial division should take at most O(sqrt(N)) time per number, where N is the number being factorized.
Given the factorizations, you only have O(n^2) combinations to worry about, where n is the ammount of numbers. If you store all the factorizations so that you only compute them once, this step shouldn't take that long unless you have a really large amount of numbers.

You do wonder if there is a faster algorithm though. Perhaps it is possible to do some greatest common divisor trick to avoid computing large factorizations and perhaps we can use some graph algorithms to find the distances in a smarter way.

Thank you. I am given an upper bound of 100 000 numbers each being not greater than a million. — Lucas T, Nov 13 '11 at 13:23
Hmm, this means factorizing them should be a piece of cake but we need to be smarter on the find-shortest distance part. Let me think a bit. — hugomg, Nov 13 '11 at 13:29

score 0 · Answer 4 · answered Nov 13 '11 at 13:18

0

Haven't really thought this through, but it seems to me that to get from prime A to prime B you multiply A * B and then divide by A.

If you thus break the initial non-prime A & B into their prime factors, factor out the common prime factors, and then use the technique in the first paragraph to convert the unique primes, you should be following a minimal path to get from A to B.

answered Nov 13 '11 at 13:18

Hot Licks

44,830
15
88
146

Don't actually get it: A * B / A = B – BlackBear Nov 13 '11 at 15:18

Integer distance

4 Answers4

Output