3

Possible Duplicate:
Mapping two integers to one, in a unique and deterministic way

I'm trying to create unique identificator for pair of two integers (Ruby) :

f(i1,i2) = f(i2, i1) = some_unique_value

So, i1+i2, i1*i2, i1^i2 -not unique as well as (i1>i2) ? "i1" + "i2" : "i2" + "i1".

I think following solution will be ok:

(i1>i2) ? "i1" + "_" + "i2" : "i2" + "_" + "i1"

but:

  1. I have to save result in DB and index it. So I prefer it to be an integer and as small as it possible.
  2. Is Zlib.crc32(f(i1,i2)) can guaranty uniqueness?

Thanks.

UPD:

Actually, I'm not sure the result MUST be integer. Maybe I can convert it to decimal: (i1>i2) ? i1.i2 : i2.i1

?

Community
  • 1
  • 1
S2201
  • 1,278
  • 2
  • 15
  • 32

5 Answers5

6

What you're looking for is called a Pairing function.

The following illustration from the German wikipedia page clearly shows how it works:

http://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Pairing-function.svg/350px-Pairing-function.svg.png

Implemented in Ruby:

def cantor_pairing(n, m)
    (n + m) * (n + m + 1) / 2 + m
end

(0..5).map do |n|
  (0..5).map do |m|
    cantor_pairing(n, m)
  end
end
=> [[ 0,  2,  5,  9, 14, 20],
    [ 1,  4,  8, 13, 19, 26],
    [ 3,  7, 12, 18, 25, 33],
    [ 6, 11, 17, 24, 32, 41],
    [10, 16, 23, 31, 40, 50],
    [15, 22, 30, 39, 49, 60]]

Note that you will need to store the result of this pairing in a datatype with as many bits as both your input numbers put together. (If both input numbers are 32-bit, you will need a 64-bit datatype to be able to store all possible combinations, obviously.)

Lars Haugseth
  • 13,872
  • 2
  • 40
  • 46
2

No, Zlib.crc32(f(i1,i2)) is not unique for all integer values of i1 and i2.

If i1 and i2 are also 32bit numbers then there are many more combinations of them than can be stored in a 32bit number, which is returned by CRC32.

maerics
  • 133,300
  • 39
  • 246
  • 273
2

CRC32 is not unique, and wouldn't be good to use as a key. Assuming you know the maximum value of your integers i1 and i2:

unique_id = (max_i2+1)*i1 + i2

If your integers can be negative, or will never be below a certain positive integer, you'll need the max and min values:

(max_i2-min_i2+1) * (i1-min_i1) + (i2-min_i2)

This will give you the absolute smallest number possible to identify both integers.

marcus erronius
  • 3,531
  • 1
  • 14
  • 29
  • oh, wait, I *did* make a small mistake; updating the code – marcus erronius Dec 13 '12 at 21:10
  • The max value in my case is a Mysql BIGINT max value. How can I save the key that I'll get in DB if it exceeds max BIGINT value? – S2201 Dec 13 '12 at 21:10
  • Will you ever have numbers at the top of the bigint range? If so, you're just going to have to go with a string concatenation method like "#{i1}_#{i2}", because there's no way of optimizing around that. But if there's some sort of smaller maximum range, then there's always a way. – marcus erronius Dec 13 '12 at 21:15
  • that is, will your values ever really get as big as 9223372036854775807? 9 quintillion? If they only get into the millions or low billions this method will work. – marcus erronius Dec 13 '12 at 21:23
  • I don't think that I'll reach max of BIGINT but then I don't know my max value. So maybe string with some delimiter is optimal solution... – S2201 Dec 13 '12 at 21:23
1

Well, no 4-byte hash will be unique when its input is an arbitrary binary string of more than 4 bytes. Your strings are from a highly restricted symbol set, so collisions will be fewer, but "no, not unique".

There are two ways to use a smaller integer than the possible range of values for both of your integers:

  1. Have a system that works despite occasional collisions
  2. Check for collisions and use some sort of rehash

The obvious way to solve your problem with a 1:1 mapping requires that you know the maximum value of one of the integers. Just multiply one by the maximum value and add the other, or determine a power of two ceiling, shift one value accordingly, then OR in the other. Either way, every bit is reserved for one or the other of the integers. This may or may not meet your "as small as possible" requirement.

Your ###_### string is unique per pair; if you could just store that as a string you win.

DigitalRoss
  • 135,013
  • 23
  • 230
  • 316
0

Here's a better, more space efficient solution:. My answer on it here

Community
  • 1
  • 1
nawfal
  • 62,042
  • 48
  • 302
  • 339
  • cantor_pair(123123123123123,321321321321321) = 98765432098765778111444778111 < szudzik_fn(123123123123123,321321321321321) = 103247391535679740596452308164... so it not looks like more efficient. – S2201 Dec 14 '12 at 09:37
  • @Savash interesting. Apparently when the gulf between `a` and `b` widens then cantor function wins. But that's a specific case. I do not think one method can be better for every single point. But for a range of values, say 0 to 10000 for that matter, Suzuki wins. So it's better to use Suzuki's function in general. Also note that Suzuki function packs the result in a tight space where as Cantor function is spread over a large area.. – nawfal Dec 14 '12 at 13:25