8

I would like to find and reuse (if possible) a map implementation which has the following attributes:

  1. While the number of entries is small, say < 32, underlying storage should be done in an array like this [key0, val0, key1, val1, ...] This storage scheme avoids many small Entry objects and provides for extremely fast look ups (even tho they are sequential scans!) on modern CPU's due to the CPU's cache not being invalidated and lack of pointer indirection into heap.

  2. The map should maintain insertion order for key/value pairs regardless of the number of entries similar to LinkedHashMap

We are working on an in-memory representations of huge (millions of nodes/edges) graphs in Scala and having such a Map would allow us to store Node/Edge attributes as well as Edges per node in a much more efficient way for 99%+ of Nodes and Edges which have few attributes or neighbors while preserving chronological insertion order for both attributes and edges.

If anyone knows of a Scala or Java map with such characteristics I would be much obliged.

Thanx

Charles
  • 48,924
  • 13
  • 96
  • 136
Alex Kravets
  • 524
  • 3
  • 12
  • 1
    For reference, I am noting that the OP did not find my solution satisfactory and requested that I remove it. In brief, the idea was to place everything in indexed arrays, Fortran-style, but then write nice wrappers around this structure so that it was pleasant to deal with. The advantage of this method is that it is incredibly fast (due to primarily just using primitives) and naturally preserves insertion order (because you just add 1 to your index when you need a new entry). Much graph work in Fortran and C has been done this way, but I agree that I have not identified the desired map. – Rex Kerr Oct 06 '10 at 15:06
  • Since you are already thinking about implementation, why don't you write your own? It can't be that hard to write a wrapper around an array or a LinkedHashMap. – starblue Oct 06 '10 at 19:20
  • 1
    You are using your collection for a special case. therefore you should not bother of such normal way of saving. it would be interesting to create your own datastrukture, to get a higher performance. you can optimize your strukture for your case, because it seems that you know very much of your graph. so you should think about trees, lists, whatever, to get the highest possible performance out of it. maybe you get a runtine performance of O(n*logn) or less....;) – Erhard Dinhobl Oct 07 '10 at 09:25
  • I will end up writing my own Adaptive Map implementation (in Scala of course). But I (perhaps naively) assumed that the pattern of having many maps with only few of them having many entries would occur often enough in practice for someone to have solved that problem by now :-). – Alex Kravets Oct 07 '10 at 21:35
  • "While the number of entries is small, say < 32, underlying storage should be done in an array like this [key0, val0, key1, val1, ...]" – Alexey Romanov Oct 10 '10 at 07:14

3 Answers3

1

While I'm not aware of any implementations that exactly fit your requirements, you may be interested in peeking at Flat3Map (source) in the Jakarta Commons library.

Unfortunately, the Jakarta libraries are rather outdated (e.g., no support for generics in the latest stable release, although it is promising to see that this is changing in trunk) and I usually prefer the Google Collections, but it might be worth your time to see how Apache implemented things.

Flat3Map doesn't preserve the order of the keys, unfortunately, but I do have a suggestion in regard to your original post. Instead of storing the keys and values in a single array like [key0, val0, key1, val1, ...], I recommend using parallel arrays; that is, one array with [key0, key1, ...] and another with [val0, val1, ...]. Normally I am not a proponent of parallel arrays, but at least this way you can have one array of type K, your key type, and another of type V, your value type. At the Java level, this has its own set of warts as you cannot use the syntax K[] keys = new K[32]; instead you'll need to use a bit of typecasting.

Community
  • 1
  • 1
ide
  • 17,118
  • 4
  • 56
  • 101
  • Now this *is* a type of answer that I was looking for. In my previous work, I found that "flat" maps (as apache ppl call them) become slower than standard hash maps only after 32 or even 64 entries, probably due to modern CPU's having very good on core caches and pointer indirection into the heap causing memory stalls. Ideally the switch from a "flat" to a standard map would happen based on a configurable threshold. I would upvote this answer but that will remove the question from the Unaswered queue :-) I want to keep the question prominent for a little while longer. Thanks for your answer. – Alex Kravets Oct 10 '10 at 12:30
1

Have you measured with profiler if LinkedHashMap is too slow for you? Maybe you don't need that new map - premature optimization is the root of all evil.. Anyway for processing millions or more pieces of data in a second, even best-optimized map can be too slow, because every method call decreases performance as well in that cases. Then all you can do is to rewrite your algorithms from Java collections to arrays (i.e. int -> object maps).

iirekm
  • 6,915
  • 3
  • 33
  • 42
  • The issue is not the speed or rather not just the speed, it's also the number of small Emtry objects allocated, retained and GC'ed. – Alex Kravets Oct 12 '10 at 20:22
  • But allocation time adds up to the slowness - the more objects you allocate the slower program is, so it all reduces to performance measuring by the profiler. – iirekm Oct 13 '10 at 07:38
  • Today where most computers have 4GB of memory memory usage optimizations rarely have sense. However when have, usually it's best to use the Flyweight pattern. An example can be found in TreeModel from Java Swing. instead of node.getAttribute(key) = node.attributeMap.get(key) use something like node.getAttribute(key) = graph.attributeModel.getAttribute(node) – iirekm Oct 13 '10 at 07:56
  • Allocation adds to the slowness - I agree, that's why I dont' want to allocate all those unnecessary Map.Entry objects that simply hold tree pointers :-) – Alex Kravets Oct 13 '10 at 21:19
  • +1 for the "premature optimization is the root of all evil". :D – Eric-Karl Oct 14 '10 at 01:13
0

Under java you can maintain a 2d array(spreadsheet). I wrote a program which basically defines a 2 d array with 3 coloumns of data, and 3 coloumns for looking up the data. the three coloumns are testID, SubtestID and Mode. This allows me to basically look up a value by testid and mode or any combination, or I can also reference by static placement. The table is loaded into memory at startup and referenced by the program. It is infinately expandable and new values can be added as needed.

If you are interested, I can post a code source example tonight.

Another idea may be to maintain a database in your program. Databases are designed to organize large amounts of data.

Adam Outler
  • 1,591
  • 4
  • 18
  • 22
  • This answer does not address my specific narrow question of having an Adaptive Map. We did consider other graph representations, but for lots of technical reasons I can't go into, we must maintain a "localized" design where graph Nodes, Edges, etc (all Atoms really) have to have their own attribute map objects. Again, I want to avoid a common pattern of having many tiny Map.Entry-like objects for small (< 32 entry maps) to save on memory and to maintain on-CPU cache locality (i.e. scanning through a small array is always faster in practice than following a chain of heap pointers). – Alex Kravets Oct 07 '10 at 21:38