522

How safe is it to use UUID to uniquely identify something (I'm using it for files uploaded to the server)? As I understand it, it is based off random numbers. However, it seems to me that given enough time, it would eventually repeat it self, just by pure chance. Is there a better system or a pattern of some type to alleviate this issue?

Jason
  • 15,436
  • 20
  • 71
  • 112
  • 14
    For a large enough value of "enough time" :) –  Jul 20 '09 at 18:13
  • 112
    "How unique is UUID?" Universally unique, I believe. ;) – Miles Jul 20 '09 at 18:25
  • 32
    And unless you plan on developing on Venus, a GUID should suffice. – skaffman Jul 20 '09 at 22:21
  • 8
    "unique" means *never collide*. If it has any potential to collide, *it's not unique*. Therefore by definition, UUID is not unique, and safe only if you're prepared for potential collisions regardless of chance of collisions. Otherwise, your program is simply incorrect. You can say UUID as "almost unique" but it doesn't mean it's "unique". – eonil Jul 25 '19 at 04:21
  • 2
    UUIDs are unique "for practical purposes" - the fact that there is a infinitesimally small chance of a duplicate value being generated doesn't make programs relying on this incorrect except in the very rare situation where the volume of IDs being generated starts to make that possibility statistically significant. – Nathan Griffiths Nov 12 '19 at 03:33

12 Answers12

521

Very safe:

the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.

Caveat:

However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.

Source: The Random UUID probability of duplicates section of the Wikipedia article on Universally unique identifiers (link leads to a revision from December 2016 before editing reworked the section).

Also see the current section on the same subject on the same Universally unique identifier article, Collisions.

Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • 31
    I like this part from Wikipedia: However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. So what is the real chance of duplicate noting this sentence. We can not create real random numbers on computer, can we? – mans Aug 29 '14 at 10:46
  • 6
    Actually, a lot of work has gone into finding ways to introduce as much entropy ("real randomness", I guess you'd call it) as possible into random number APIs. See http://en.wikipedia.org/wiki/Entropy_%28computing%29 – broofa Dec 06 '14 at 13:48
  • 7
    That's actually a higher probability of collision than I'd imagined. Birthday paradox at, I guess. – Cameron Jan 10 '17 at 17:51
  • How do I generate a UUID in Node.js using "sufficient entropy"? – linus_hologram Feb 24 '20 at 20:17
  • @linus_hologram - It's not really a JS problem. The source of entropy is typically configured at the operating system level. – Stephen C Apr 21 '20 at 00:06
  • You can reduce the chance of having _any_ collisions by starting with a _very_ random UUID, and then using consecutive UUIDs (and if others use the same strategy). Birthday paradox works less against you. However the _expected_ number of collissions is unchanged. And if you one collision, you have many. – gnasher729 Aug 06 '20 at 22:00
  • @MartijnPieters, could someone edit to add this useful information: are we speaking about 128-bit or 256-bit UUID? It changes everything in the maths here. – Basj Nov 20 '20 at 08:24
  • @Basj: Read the linked article: *Out of a total of 128 bits, Version 4 UUIDs have 6 reserved bits (4 for the version and 2 other reserved bits), so randomly generated UUIDs have 122 random bits.* – Martijn Pieters Nov 20 '20 at 08:43
171

If by "given enough time" you mean 100 years and you're creating them at a rate of a billion a second, then yes, you have a 50% chance of having a collision after 100 years.

rein
  • 31,307
  • 23
  • 77
  • 106
  • 206
    But only after using up 256 exabytes of storage for those IDs. – Bob Aman Sep 10 '09 at 07:04
  • 22
    Funny thing is, you could generate 2 in a row that were identical, of course at mind-boggling levels of coincidence, luck and divine intervention, yet despite the unfathomable odds, it's still possible! :D Yes, it won't happen. just saying for the amusement of thinking about that moment when you created a duplicate! Screenshot video! – scalabl3 Oct 20 '15 at 19:11
  • 5
    Is the uniqueness purely because of randomness? Or there are other factors? (e.g. time stamp, ip, etc) – Weishi Z Jun 05 '16 at 15:54
  • 2
    If something depends on random numbers not being the same, then those random numbers aren't random. If they are truly random then there is no way of knowing what the chance is of getting a duplicate. – The Tahaan Oct 25 '16 at 11:48
  • 16
    @TheTahaan That's not what random means. It doesn't mean "totally unpredictable" -- usually they follow some kind of distribution. If you flip 10 coins, the chance of getting 2 heads, followed by 3 tails, followed by 5 heads, is pretty low (2^-10, about 0.001). It's truly random, but we absolutely _can_ know the _chance_ of getting a particular outcome. We just can't say in advance whether it _will_ happen. – Richard Rast Dec 07 '16 at 15:07
  • You're using a lousy implementation. No guarantees exist for bad implementations. :-P – Bob Aman Aug 08 '17 at 22:27
  • 5
    Just to explain what this implementation did wrong, they're using a version 1 UUID, which relies on a combination of timestamp and mac address for its uniqueness. However if you generate UUIDs fast enough, the timestamp won't have incremented yet. In this scenario, your UUID generation algorithm is supposed to track the last timestamp used and increment it by 1. They clearly failed to take that step. However all version 1 UUIDs correctly generated by the same machine in a short period will exhibit obvious similarities, but ought to still be unique. – Bob Aman Aug 08 '17 at 22:32
  • @BobAman I know it's almost 10 years ago but shouldn't that be 500+ exabytes? given that a UUID = 16 bytes – Memet Olsen Oct 24 '18 at 09:21
  • 1
    @MemetOlsen Haven't rechecked my numbers, but people often overestimate. I believe I did use birthday paradox formula to come up w/ that number. – Bob Aman Nov 08 '18 at 21:59
120

There is more than one type of UUID, so "how safe" depends on which type (which the UUID specifications call "version") you are using.

  • Version 1 is the time based plus MAC address UUID. The 128-bits contains 48-bits for the network card's MAC address (which is uniquely assigned by the manufacturer) and a 60-bit clock with a resolution of 100 nanoseconds. That clock wraps in 3603 A.D. so these UUIDs are safe at least until then (unless you need more than 10 million new UUIDs per second or someone clones your network card). I say "at least" because the clock starts at 15 October 1582, so you have about 400 years after the clock wraps before there is even a small possibility of duplications.

  • Version 4 is the random number UUID. There's six fixed bits and the rest of the UUID is 122-bits of randomness. See Wikipedia or other analysis that describe how very unlikely a duplicate is.

  • Version 3 is uses MD5 and Version 5 uses SHA-1 to create those 122-bits, instead of a random or pseudo-random number generator. So in terms of safety it is like Version 4 being a statistical issue (as long as you make sure what the digest algorithm is processing is always unique).

  • Version 2 is similar to Version 1, but with a smaller clock so it is going to wrap around much sooner. But since Version 2 UUIDs are for DCE, you shouldn't be using these.

So for all practical problems they are safe. If you are uncomfortable with leaving it up to probabilities (e.g. your are the type of person worried about the earth getting destroyed by a large asteroid in your lifetime), just make sure you use a Version 1 UUID and it is guaranteed to be unique (in your lifetime, unless you plan to live past 3603 A.D.).

So why doesn't everyone simply use Version 1 UUIDs? That is because Version 1 UUIDs reveal the MAC address of the machine it was generated on and they can be predictable -- two things which might have security implications for the application using those UUIDs.

Hoylen
  • 13,400
  • 5
  • 28
  • 15
  • 1
    Defaulting to a version 1 UUID has serious issues when they're generated by the same server for many people. The version 4 UUID is my default since you can quickly write something to generate one in any language or platform (including javascript). – Justin Bozonier Apr 24 '13 at 14:58
  • 1
    @Hoylen Well explained! but is this much exaggeration required? – Dinoop paloli Sep 16 '14 at 11:32
  • 2
    *Theoretically*, it is uniquely assigned by the manufacturer. – OrangeDog Nov 06 '15 at 11:04
  • 8
    One need not generate 10 million version 1 UUIDs in a second to encounter a duplicate; one must merely generate a batch of 16,384 UUIDs within the span of a single "tick" in order to overflow the sequence number. I have seen this happen with an implementation that relied, naively, on a clock source that (1) had μs-level granularity, and (2) was not guaranteed to be monotonic (system clocks are not). Be careful whose UUID generation code you use, and be **especially wary** with time-based UUID generators. They are difficult to get right, so **subject them to load tests** before using them. – Mike Strobel Mar 20 '17 at 18:08
21

The answer to this may depend largely on the UUID version.

Many UUID generators use a version 4 random number. However, many of these use Pseudo a Random Number Generator to generate them.

If a poorly seeded PRNG with a small period is used to generate the UUID I would say it's not very safe at all. Some random number generators also have poor variance. i.e. favouring certain numbers more often than others. This isn't going to work well.

Therefore, it's only as safe as the algorithms used to generate it.

On the flip side, if you know the answer to these questions then I think a version 4 uuid should be very safe to use. In fact I'm using it to identify blocks on a network block file system and so far have not had a clash.

In my case, the PRNG I'm using is a mersenne twister and I'm being careful with the way it's seeded which is from multiple sources including /dev/urandom. Mersenne twister has a period of 2^19937 − 1. It's going to be a very very long time before I see a repeat uuid.

So pick a good library or generate it yourself and make sure you use a decent PRNG algorithm.

hookenz
  • 30,814
  • 37
  • 149
  • 251
13

Quoting from Wikipedia:

Thus, anyone can create a UUID and use it to identify something with reasonable confidence that the identifier will never be unintentionally used by anyone for anything else

It goes on to explain in pretty good detail on how safe it actually is. So to answer your question: Yes, it's safe enough.

Dave Vogt
  • 14,490
  • 6
  • 34
  • 50
12

I concur with the other answers. UUIDs are safe enough for nearly all practical purposes1, and certainly for yours.

But suppose (hypothetically) that they aren't.

Is there a better system or a pattern of some type to alleviate this issue?

Here are a couple of approaches:

  1. Use a bigger UUID. For instance, instead of a 128 random bits, use 256 or 512 or ... Each bit you add to a type-4 style UUID will reduce the probability of a collision by a half, assuming that you have a reliable source of entropy2.

  2. Build a centralized or distributed service that generates UUIDs and records each and every one it has ever issued. Each time it generates a new one, it checks that the UUID has never been issued before. Such a service would be technically straight-forward to implement (I think) if we assumed that the people running the service were absolutely trustworthy, incorruptible, etcetera. Unfortunately, they aren't ... especially when there is the possibility of governments' security organizations interfering. So, this approach is probably impractical, and may be3 impossible in the real world.


1 - If uniqueness of UUIDs determined whether nuclear missiles got launched at your country's capital city, a lot of your fellow citizens would not be convinced by "the probability is extremely low". Hence my "nearly all" qualification.

2 - And here's a philosophical question for you. Is anything ever truly random? How would we know if it wasn't? Is the universe as we know it a simulation? Is there a God who might conceivably "tweak" the laws of physics to alter an outcome?

3 - If anyone knows of any research papers on this problem, please comment.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
  • 3
    I just want to point out that the method number 2 basically defeats the main purpose of using UUID and you might as well just use a classic numbered ID at that point. – Petr Vnenk May 19 '20 at 10:33
  • I disagree. The flaw in sequential numbered IDs is that they are too easy to guess. You should be able to implement method 2 in a way that makes the UUIDs difficult to guess. – Stephen C May 19 '20 at 10:42
  • 1
    But even for what you're saying you can basically use any other random string/number and just check for duplicates, you don't have any reason to use UUID instead of say 6-characters long random string. – Petr Vnenk Jul 03 '20 at 13:01
  • Well, yes and no. It depends on the context in which the ids are required to be unique. If they are only required to be unique in a closed system, then it is feasible to use short random strings and store them all in a database (or something) to check for duplicates. But that doesn't give you guaranteed *universal* uniqueness. And if the number of unique ids generated over the lifetime of the system is large enough you will run into scaling problems, assuming that the unique ids are required to be unique over time ... not just at a point in time. – Stephen C Jul 03 '20 at 14:47
9

UUID schemes generally use not only a pseudo-random element, but also the current system time, and some sort of often-unique hardware ID if available, such as a network MAC address.

The whole point of using UUID is that you trust it to do a better job of providing a unique ID than you yourself would be able to do. This is the same rationale behind using a 3rd party cryptography library rather than rolling your own. Doing it yourself may be more fun, but it's typically less responsible to do so.

Parappa
  • 7,218
  • 2
  • 32
  • 37
8

Been doing it for years. Never run into a problem.

I usually set up my DB's to have one table that contains all the keys and the modified dates and such. Haven't run into a problem of duplicate keys ever.

The only drawback that it has is when you are writing some queries to find some information quickly you are doing a lot of copying and pasting of the keys. You don't have the short easy to remember ids anymore.

Posthuma
  • 256
  • 1
  • 8
8

Here's a testing snippet for you to test it's uniquenes. inspired by @scalabl3's comment

Funny thing is, you could generate 2 in a row that were identical, of course at mind-boggling levels of coincidence, luck and divine intervention, yet despite the unfathomable odds, it's still possible! :D Yes, it won't happen. just saying for the amusement of thinking about that moment when you created a duplicate! Screenshot video! – scalabl3 Oct 20 '15 at 19:11

If you feel lucky, check the checkbox, it only checks the currently generated id's. If you wish a history check, leave it unchecked. Please note, you might run out of ram at some point if you leave it unchecked. I tried to make it cpu friendly so you can abort quickly when needed, just hit the run snippet button again or leave the page.

Math.log2 = Math.log2 || function(n){ return Math.log(n) / Math.log(2); }
  Math.trueRandom = (function() {
  var crypt = window.crypto || window.msCrypto;

  if (crypt && crypt.getRandomValues) {
      // if we have a crypto library, use it
      var random = function(min, max) {
          var rval = 0;
          var range = max - min;
          if (range < 2) {
              return min;
          }

          var bits_needed = Math.ceil(Math.log2(range));
          if (bits_needed > 53) {
            throw new Exception("We cannot generate numbers larger than 53 bits.");
          }
          var bytes_needed = Math.ceil(bits_needed / 8);
          var mask = Math.pow(2, bits_needed) - 1;
          // 7776 -> (2^13 = 8192) -1 == 8191 or 0x00001111 11111111

          // Create byte array and fill with N random numbers
          var byteArray = new Uint8Array(bytes_needed);
          crypt.getRandomValues(byteArray);

          var p = (bytes_needed - 1) * 8;
          for(var i = 0; i < bytes_needed; i++ ) {
              rval += byteArray[i] * Math.pow(2, p);
              p -= 8;
          }

          // Use & to apply the mask and reduce the number of recursive lookups
          rval = rval & mask;

          if (rval >= range) {
              // Integer out of acceptable range
              return random(min, max);
          }
          // Return an integer that falls within the range
          return min + rval;
      }
      return function() {
          var r = random(0, 1000000000) / 1000000000;
          return r;
      };
  } else {
      // From http://baagoe.com/en/RandomMusings/javascript/
      // Johannes Baagøe <baagoe@baagoe.com>, 2010
      function Mash() {
          var n = 0xefc8249d;

          var mash = function(data) {
              data = data.toString();
              for (var i = 0; i < data.length; i++) {
                  n += data.charCodeAt(i);
                  var h = 0.02519603282416938 * n;
                  n = h >>> 0;
                  h -= n;
                  h *= n;
                  n = h >>> 0;
                  h -= n;
                  n += h * 0x100000000; // 2^32
              }
              return (n >>> 0) * 2.3283064365386963e-10; // 2^-32
          };

          mash.version = 'Mash 0.9';
          return mash;
      }

      // From http://baagoe.com/en/RandomMusings/javascript/
      function Alea() {
          return (function(args) {
              // Johannes Baagøe <baagoe@baagoe.com>, 2010
              var s0 = 0;
              var s1 = 0;
              var s2 = 0;
              var c = 1;

              if (args.length == 0) {
                  args = [+new Date()];
              }
              var mash = Mash();
              s0 = mash(' ');
              s1 = mash(' ');
              s2 = mash(' ');

              for (var i = 0; i < args.length; i++) {
                  s0 -= mash(args[i]);
                  if (s0 < 0) {
                      s0 += 1;
                  }
                  s1 -= mash(args[i]);
                  if (s1 < 0) {
                      s1 += 1;
                  }
                  s2 -= mash(args[i]);
                  if (s2 < 0) {
                      s2 += 1;
                  }
              }
              mash = null;

              var random = function() {
                  var t = 2091639 * s0 + c * 2.3283064365386963e-10; // 2^-32
                  s0 = s1;
                  s1 = s2;
                  return s2 = t - (c = t | 0);
              };
              random.uint32 = function() {
                  return random() * 0x100000000; // 2^32
              };
              random.fract53 = function() {
                  return random() +
                      (random() * 0x200000 | 0) * 1.1102230246251565e-16; // 2^-53
              };
              random.version = 'Alea 0.9';
              random.args = args;
              return random;

          }(Array.prototype.slice.call(arguments)));
      };
      return Alea();
  }
}());

Math.guid = function() {
    return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c)    {
      var r = Math.trueRandom() * 16 | 0,
          v = c == 'x' ? r : (r & 0x3 | 0x8);
      return v.toString(16);
  });
};
function logit(item1, item2) {
    console.log("Do "+item1+" and "+item2+" equal? "+(item1 == item2 ? "OMG! take a screenshot and you'll be epic on the world of cryptography, buy a lottery ticket now!":"No they do not. shame. no fame")+ ", runs: "+window.numberofRuns);
}
numberofRuns = 0;
function test() {
   window.numberofRuns++;
   var x = Math.guid();
   var y = Math.guid();
   var test = x == y || historyTest(x,y);

   logit(x,y);
   return test;

}
historyArr = [];
historyCount = 0;
function historyTest(item1, item2) {
    if(window.luckyDog) {
       return false;
    }
    for(var i = historyCount; i > -1; i--) {
        logit(item1,window.historyArr[i]);
        if(item1 == history[i]) {
            
            return true;
        }
        logit(item2,window.historyArr[i]);
        if(item2 == history[i]) {
            
            return true;
        }

    }
    window.historyArr.push(item1);
    window.historyArr.push(item2);
    window.historyCount+=2;
    return false;
}
luckyDog = false;
document.body.onload = function() {
document.getElementById('runit').onclick  = function() {
window.luckyDog = document.getElementById('lucky').checked;
var val = document.getElementById('input').value
if(val.trim() == '0') {
    var intervaltimer = window.setInterval(function() {
         var test = window.test();
         if(test) {
            window.clearInterval(intervaltimer);
         }
    },0);
}
else {
   var num = parseInt(val);
   if(num > 0) {
        var intervaltimer = window.setInterval(function() {
         var test = window.test();
         num--;
         if(num < 0 || test) {
    
         window.clearInterval(intervaltimer);
         }
    },0);
   }
}
};
};
Please input how often the calulation should run. set to 0 for forever. Check the checkbox if you feel lucky.<BR/>
<input type="text" value="0" id="input"><input type="checkbox" id="lucky"><button id="runit">Run</button><BR/>
Tschallacka
  • 24,188
  • 10
  • 79
  • 121
5

For UUID4 I make it that there are approximately as many IDs as there are grains of sand in a cube-shaped box with sides 360,000km long. That's a box with sides ~2 1/2 times longer than Jupiter's diameter.

Working so someone can tell me if I've messed up units:

  • volume of grain of sand 0.00947mm^3 (Guardian)
  • UUID4 has 122 random bits -> 5.3e36 possible values (wikipedia)
  • volume of that many grains of sand = 5.0191e34 mm^3 or 5.0191e+25m^3
  • side length of cubic box with that volume = 3.69E8m or 369,000km
  • diameter of Jupiter: 139,820km (google)
lost
  • 1,528
  • 1
  • 14
  • 24
  • 1
    Actually I guess this assumes 100% packing so maybe I should add a factor for that! – lost Jan 09 '20 at 12:58
  • This is actually very helpful and has made me realize it's probably okay and there are other things to worry about. lmao – wongz Mar 17 '21 at 03:14
3

I don't know if this matters to you, but keep in mind that GUIDs are globally unique, but substrings of GUIDs aren't.

Grant Wagner
  • 23,293
  • 6
  • 50
  • 60
  • 1
    Keep in mind that the reference linked here talks about Version 1 UUIDs (which take information about the generating computer etc. into the id). Most other Answers talk about Version 4 (which are totally random). The above linked Wikipedia article http://en.wikipedia.org/wiki/Universally_unique_identifier explains the different kinds of UUIDs. – kratenko Apr 04 '14 at 15:18
0

I should mention I bought two external Seagate drives on Amazon, and they had the same device UUID, but differing PARTUUID. Presumably the cloning software they used to format the drives just copied the UUID as well.

Obviously UUID collisions are much more likely to happen due to a flawed cloning or copying process than from random coincidence. Bear that in mind when calculating UUID risks.