0

Why std::Hash has equal result for different strings? I used msvc2010sp1 and I was suprised when saw this result:

int _tmain(int argc, _TCHAR* argv[])
  {
  std::string sUniqId ("IndexBuf");
  std::stringstream sStream;

  sStream << 10;
  std::string sUniqId10 (sUniqId);
  sUniqId10.append (sStream.str());
  size_t uHashStr = std::hash<std::string>()(sUniqId10);

  sStream.str("");
  sStream << 11;
  std::string sUniqId11 (sUniqId);
  sUniqId11.append(sStream.str());
  size_t uHashStr1 = std::hash<std::string>()(sUniqId11);

  sStream.str("");
  sStream << 12;
  std::string sUniqId12 (sUniqId);
  sUniqId12.append(sStream.str());
  size_t uHashStr2 = std::hash<std::string>()(sUniqId12);

  cout <<"str:  " << sUniqId10.c_str() << "\t" << "Hash1: " << uHashStr  << endl; 
  cout <<"str2: " << sUniqId11.c_str() << "\t" << "Hash2: " << uHashStr1 << endl;
  cout <<"str3: " << sUniqId12.c_str() << "\t" << "Hash3: " << uHashStr2 << endl;

  return 0;
  }

output:

str:  IndexBuf10        Hash1: 1286096800
str2: IndexBuf11        Hash2: 1286096800
str3: IndexBuf12        Hash3: 1286096800

Anybody know why this occur?

p.s. This example work correctly for msvc2013 update1

Marc Mutz - mmutz
  • 22,883
  • 10
  • 72
  • 86
angevad
  • 167
  • 1
  • 1
  • 11
  • 4
    Duplicate? http://stackoverflow.com/q/7968674/420683 – dyp Mar 18 '14 at 17:54
  • 1
    @dyp No, they fixed that, the VS2013 implementation loops over the entire string. angevad: I cannot reproduce your results using VS2013 Update 1. Both the 32 and 64 bit compilers produce different hashes for the 3 strings. – Praetorian Mar 18 '14 at 18:00
  • @Praetorian OP: "This example **work correctly** for msvc2013 update1" – dyp Mar 18 '14 at 18:03
  • @dyp :) I should learn to pay more attention to the verbiage in questions. Looks like you've found the correct dupe then. – Praetorian Mar 18 '14 at 18:05
  • 1
    The point of a hash is to be a many-to-one mapping (an infinite number of inputs but a finite number of outputs). Collisions (different inputs producing the same output) are inevitable - especially with a 32-bit hash that has only ~4 billion possible outputs. – nobody Mar 18 '14 at 18:06
  • @AndrewMedico: that's true, but it's still poor QoI for the last character of the string to be ignored in generating the hash. Perhaps the question should be, "why do *these particular* strings have the same hash?" rather than in effect being, "what is the Pigeonhole Principle?". – Steve Jessop Mar 18 '14 at 18:07
  • Note to answerers: Even though the hash is not required to be injective, that QoI is really poor. I guess few would expect different short strings to produce the same hash. – dyp Mar 18 '14 at 18:08
  • Here is a perfectly correct implementation of `std::hash` which fulfills the requirements: `namespace std {template struct hash{ constexpr size_t operator(T const&) const { return 1; } }; }` Judge for yourself whether what's technically allowable is practically correct (in a sense of being useful), too. :-) – Damon Mar 18 '14 at 18:25

2 Answers2

1

Hashes are not required to be unique. For example, many algorithms first hash to select a "bucket" which is a linked-list of the actual items. Most likely the hash algorithm changed between versions.

jschroedl
  • 4,614
  • 3
  • 27
  • 42
1

Hash Functions don't have to be bijective (a one-to-one correspondence where each element in the domain is uniquely mapped to an element in the codomain). They should be surjective (every element in the codomain has a corresponding element in the domain), but it is not necessary that they are injective, as it seems you are implying.

Chris Dargis
  • 5,203
  • 3
  • 33
  • 59
  • The don't need to be surjective either. There's nothing particularly wrong with a hash function that for some reason never outputs `SIZE_MAX`, it's just very slightly wasteful of the available hash space. – Steve Jessop Mar 18 '14 at 18:10
  • @SteveJessop: I can't imagine a use for hash function that behaves that way, but I suppose you are right. I've updated my answer to say `should` be surjective. – Chris Dargis Mar 18 '14 at 18:15
  • Well, you might want a hash function that avoids some magic number used to indicate empty slots, but if that's what you need you can always fix up the result of the hash yourself. So it's not that anyone needs a hash with that property, it's that hashes with that property are not thereby faulty. There isn't even anything particularly wrong with a 64 bit hash whose MSB is always 0, although of course that's far more wasteful. Really it's a 63 bit hash wearing a deceptively tall hat ;-) – Steve Jessop Mar 18 '14 at 18:28