11

I'm using an std::unordered_map<key,value> in my implementation. i will be using any of the STL containers as the key. I was wondering if it is possible to create a generic hash function for any container being used.

This question in SO offers generic print function for all STL containers. While you can have that, why cant you have something like a Hash function that defines everything ? And yeah, a big concern is also that it needs to fast and efficient.

I was considering doing a simple hash function that converts the values of the key to a size_t and do a simple function like this.

Can this be done ?

PS : Please don't use boost libraries. Thanks.

Community
  • 1
  • 1
0x0
  • 2,467
  • 5
  • 31
  • 65
  • What will be the content of the containers used as keys? – Luc Touraille Aug 01 '11 at 13:57
  • When would you consider two keys to be equal? What if they had a different order of elements? What if the elements would not be comparable? How would you efficiently compare elements that by definition are only supposed to be less-then-comparable at best? – Fozi Aug 01 '11 at 14:03
  • 1
    That is a good question. That was supposed to be my next question. Thanks for pointing it out. Now I assume that all my elements are ordered. If they are not ordered except for key type `std::set` it is a problem. Since it is an integer can we make a "smart" hash function that mimicks `std::set`? For example, <1,2,3> should yield the same hash as <3,2,1>. – 0x0 Aug 01 '11 at 14:37

1 Answers1

16

We can get an answer by mimicking Boost and combining hashes.

Warning: Combining hashes, i.e. computing a hash of many things from many hashes of the things, is not a good idea generally, since the resulting hash function is not "good" in the statistical sense. A proper hash of many things should be build from the entire raw data of all the constituents, not from intermediate hashes. But there currently isn't a good standard way of doing this.

Anyway:

First off, we need the hash_combine function. For reasons beyond my understanding it's not been included in the standard library, but it's the centrepiece for everything else:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

Using this, we can hash everything that's made up from hashable elements, in particular pairs and tuples (exercise for the reader).

However, we can also use this to hash containers by hashing their elements. This is precisely what Boost's "range hash" does, but it's straight-forward to make that yourself by using the combine function.

Once you're done writing your range hasher, just specialize std::hash and you're good to go:

namespace std
{
  template <typename T, class Comp, class Alloc>
  struct hash<std::set<T, Comp, Alloc>>
  {
    inline std::size_t operator()(const std::set<T, Comp, Alloc> & s) const
    {
      return my_range_hash(s.begin(), s.end());
    }
  };

  /* ... ditto for other containers */
}

If you want to mimic the pretty printer, you could even do something more extreme and specialize std::hash for all containers, but I'd probably be more careful with that and make an explicit hash object for containers:

template <typename C> struct ContainerHasher
{
  typedef typename C::value_type value_type;
  inline size_t operator()(const C & c) const
  {
    size_t seed = 0;
    for (typename C::const_iterator it = c.begin(), end = c.end(); it != end; ++it)
    {
      hash_combine<value_type>(seed, *it);
    }
    return seed;
  }
};

Usage:

std::unordered_map<std::set<int>, std::string, ContainerHasher<std::set<int>>> x;
Kerrek SB
  • 428,875
  • 83
  • 813
  • 1,025
  • This is excellent. I've a doubt in your second piece of code. You used `return my_range_hash(s.begin(), s.end());`. Where have you defined the `my_range_hash` function ? Sorry if this is a dumb question ? – 0x0 Aug 01 '11 at 14:32
  • Another thing about the second piece of code is that you cannot add a specialization to namespace std for **all** types, you have to have some user defined type in here. The code is good, but too general. – Bo Persson Aug 01 '11 at 14:44
  • 1
    @Sunil: I've left `my_range_hash` for you to implement -- it'll look just like the `operator()` in my `ContainerHasher`! – Kerrek SB Aug 01 '11 at 15:01
  • @Bo: What's "too general"? Do you mean that it's not possible to define a partial specialization that will capture *all* containers that match the `is_container` typetrait, or do you mean that `std::set` is too general? I'm actually worrying now that the [pretty printer](https://raw.github.com/louisdx/cxx-prettyprint/master/prettyprint.hpp) is illegally adding overloads to `std`, perhaps that should have been done with ADL, too... – Kerrek SB Aug 01 '11 at 15:03
  • 1
    @Kerrek - You can only add specializations for user defined types to namespace std. If you have `typename T` it matches not only your types, but all types already in namespace std and also all **my** types, which I might not want. – Bo Persson Aug 01 '11 at 15:36
  • The (boost independent) c++11 code to use tuples of hashable elements in unordered_map and unordered_set can be found here: [generic-hash-for-tuples-in-unordered-map-unordered-set](http://stackoverflow.com/questions/7110301/generic-hash-for-tuples-in-unordered-map-unordered-set) – Leo Goodstadt Jan 29 '14 at 18:04
  • Given the question's "Generic Hash function for all STL-containers", it's worthy of prominent mention that this code won't work when you want containers with the same elements in another order to be deemed equal (because `hash_combine()` for `a` then `b` `!=` `hash_combine` for `b` then `a`), e.g. for `unordered_map`, `multimap`, and when an application doesn't happen to consistently order content in containers like `std::vector` and `std::list` (where `a` -> `b` may or may not be considered equal to `b` -> `a` depending on the app's needs). – Tony Delroy Jul 24 '14 at 07:01
  • @TonyD: Sure. What matters is that the hash value is compatible with equality. For sequence containers, the sequence order is relevant, so this is fine, but for unordered containers you'll have to come up with something else. A "range hash" is not the same as an "unordered collection hash", indeed. – Kerrek SB Jul 24 '14 at 09:48
  • @TonyD: Actually what's much worse is that hash combining is not a good idea in general, since the statistical properties of the resulting hash value are poor and don't meet the requirements of a "good hash function". – Kerrek SB Jul 24 '14 at 09:49
  • @KerrekSB: "compatible with equality" is a very reasonable choice of for `vector` et al but may not be what's wanted and would be better documented. Re `hash_combine` - many "hash(string)" functions begin by effectively hashing a fixed number of characters (1, 2, 4...?) followed by repetitive combining with the next N or a hash thereof... it's hard to choose an *optimal* combine algorithm if you're not also aware of the hashing logic producing the values it's combining, but the better the latter the easier combine is. VC++ `std::hash`-style's so bad it's easy to surpass anyway. – Tony Delroy Jul 24 '14 at 10:15