2

I have URLs that look like this:

http://domain.com/object/23/

I would prefer the 23 to not be sequential and fairly random. I've seen may other posts on Stack Overflow asking the same thing but my requirements are a bit different than what I have seen.

Many of my the people using the site are competitors and it would be easy for them to poke around with some numbers to get competitive information. I'm not doing this for security and I understand that security through obscurity is a waste of time. I'm just looking for a quick way to keep people from poking around.

I'm doing this with python/SQLAlchemy with a Postgres database. I've looked at UUIDs primary key's but they seem like a large performance hit as I have alot of joins happening. I could also do UUID's in an additional column then do all joins based off of the sequential integral primary key.

Most tables that need this would have less than 1000 records. But 1 table would have a few million records. Without that table I would just use uuid and be done with it. But since I do I don't really think uuid is a great choice.

The real question is what are my other options then.

  1. Use sequential numeric primary key, but encrypt/decrpyt them on the fly when outside of the database with some light weight algorithm

  2. Separate column and use a sha1 hash (or other hash) on the primary_key + secret_key that is generated on the rows creation. I could then just find the row via this hash and the do all the joins on the normal pk.

Performance is the most important thing here, while still maintaining some level of randomness with low chance of collision. What are the best options for the encryption/decryption for #1 or what is the best hash algo for #2. Is there a way more obvious than either of these 2? With a few million rows is uuid not going to slow me down too much and thats the solution?

Kuberchaun
  • 26,103
  • 6
  • 45
  • 56
  • 1
    "*they seem like a large performance hit*". Did you test that assumption? How big was the performance hit? – a_horse_with_no_name Mar 30 '11 at 06:59
  • What makes you think generating a UUID would be a performance hit? The cost to generate is basically the cost of generating a random 128-bit number. The cost of comparing integers of this size is neglible. – matt b Apr 03 '11 at 03:31
  • The performance hit comes with the randomness of UUIDs: http://stackoverflow.com/a/1151733/23368 – Daniel Rikowski Jul 05 '12 at 07:48

3 Answers3

2

For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes.

Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) int type. Disclaimer: I have not tried using this function myself.

To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do:

ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int);

And voila!

pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> select * from foo;
  foo_id   
------------
 1241588087
 1500453386
 1755259484
(4 rows)
intgr
  • 17,870
  • 4
  • 56
  • 66
1

Hashing/encryption seem like overkill for what you are trying to achieve. How about a separate column storing a random number between 10000 and 99999 (or 100000, 999999 etc) then links like this:

http://domain.com/object/23-74938/

where the '74938' is the random bit

1

I'd go with something like your option 2: use HMAC with SHA1 for example, to produce a hash value that can be verified for each request without even going near the database. I produce tamper-proof "tickets" like this quite often. e.g.

use Digest::HMAC_SHA1 qw(hmac_sha1_hex);
$id = shift;
print "$id.", substr(hmac_sha1_hex($id, "s3kr1t"), 0, 8), "\n"

And to verify:

use Digest::HMAC_SHA1 qw(hmac_sha1_hex);
$tkt = shift;
($id,$sum) = ($tkt =~ /(.+)\.([0-9a-f]+)/)
        or die "Invalid tkt (bad format): $tkt\n";
$sum eq substr(hmac_sha1_hex($id, "s3kr1t"), 0, 8)
        or die "Invalid tkt (MAC mismatch): $tkt\n";
print $id, "\n"

This produces a "ticket" of "23.3b30e326" for id 23.

araqnid
  • 108,587
  • 20
  • 147
  • 127