0

I have Hashtable<String, Integer> ht.

How to find the values' (Integer) median in this hashtable efficiently?

WindChaser
  • 880
  • 1
  • 9
  • 25
  • What have you tried? Is the question how to find a median of a bunch of numbers, or how to get the values of the map? – yshavit Apr 18 '15 at 01:38
  • I can think about is to iterate all the elements and record them, then sort them, finally calculate the median. But I believe there are better ways. – WindChaser Apr 18 '15 at 01:44
  • So is this really about [finding the median of an unsorted list](http://stackoverflow.com/questions/10662013/finding-the-median-of-an-unsorted-array)? – yshavit Apr 18 '15 at 01:45

2 Answers2

2

There is no meaningful ordering in a hash table: the whole point of a hash table is to scatter values uniformly in buckets, according to their key. Finding an element giving the key is very fast, near constant time (i.e. O(1)) but inequality based algorithms, say finding all the elements e such that key(e) < K for a given key value K, in general require a table scan, which is O(N).

You can load all the keys (and only the keys) in an array and then use an (O(N)) algorithm to find the key corresponding to the median. Once you have the median key, you can use it to retrieve the median element from your hash table.

Note that O(N) is demonstrably the best you can do to find the median of an un-ordered set. If you need to often find the median of the set, then an ordered representation, e.g. based on balanced trees, is the way to go. Red-black trees are normally used to implement such ordered maps. Key lookups will be O(log(N)), which is slower than O(1) but still pretty fast, but the set is already ordered and finding the median is easy, and usually provided as a built in operation.

The fast median finding algorithm I know is based on the same pivoting strategy used in Quicksort. Here is another one I just found:

http://www.cs.cornell.edu/courses/cs2110/2009su/Lectures/examples/MedianFinding.pdf

Liondance
  • 46
  • 4
  • Of course, sorting the keys is another way to find the median key, but that has O(N log(N)), which is slower than O(N). It is a well known result in computer science that you can find the median value in an unsorted array in O(N) steps. That is not obvious at first. – Liondance Apr 18 '15 at 02:00
0

You can use the The Apache Commons Mathematics Library

There is a complete API for all Mathematical tools that you might need such as the median, mean, standard deviation, etc...

Hope that helped.

Tarek
  • 587
  • 4
  • 11