15

I have a program where I need to make 100,000 to 1,000,000 random-access reads to a List-like object in as little time as possible (as in milliseconds) for a cellular automata-like program. I think the update algorithm I'm using is already optimized (keeps track of active cells efficiently, etc). The Lists do need to change size, but that performance is not as important. So I am wondering if the performance from using Arrays instead of ArrayLists is enough to make a difference when dealing with that many reads in such short spans of time. Currently, I'm using ArrayLists.

Edit: I forgot to mention: I'm just storing integers, so another factor is using the Integer wrapper class (in the case of ArrayLists) versus ints (in the case of arrays). Does anyone know if using ArrayList will actually require 3 pointer look ups (one for the ArrayList, one for the underlying array, and one for the Integer->int) where as the array would only require 1 (array address+offset to the specific int)? Would HotSpot optimize the extra look ups away? How significant are those extra look ups?

Edit2: Also, I forgot to mention I need to do random access writes as well (writes, not insertions).

Bryan Head
  • 11,575
  • 4
  • 28
  • 46
  • It is notoriously difficult to devise meaningful micro-benchmarks in Java. The problems have been described in many blog posts and in the paper "Statistically Rigorous Java Performance Evaluation" -- if you haven't already, you might want to google around and read up on it if it really matters this much. – Chris Vest Jul 25 '09 at 20:38

12 Answers12

11

Now that you've mentioned that your arrays are actually arrays of primitive types, consider using the collection-of-primitive-type classes in the Trove library.

@viking reports significant (ten-fold!) speedup using Trove in his application - see comments. The flip-side is that Trove collection types are not type compatible with Java's standard collection APIs. So Trove (or similar libraries) won't be the answer in all cases.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096
  • 1
    I'd just like to say that your answer sped a portion of my program (that we run hundreds of thousands of times per execution) up from 147 seconds to 14 seconds simply by substituting a Trove ArrayList for a Java ArrayList. Saved my day. – viking Feb 21 '13 at 21:24
10

Try both, but measure.

Most likely you could hack something together to make the inner loop use arrays without changing all that much code. My suspicion is that HotSpot will already inline the method calls and you will see no performance gain.

Also, try Java 6 update 14 and use -XX:+DoEscapeAnalysis

Kevin Peterson
  • 6,933
  • 5
  • 33
  • 43
3

ArrayLists are slower than Arrays, but most people consider the difference to be minor. In your case could matter though, since you're dealing with hundreds of thousands of them.

By the way, duplicate: Array or List in Java. Which is faster?

Community
  • 1
  • 1
James Skidmore
  • 44,302
  • 30
  • 104
  • 135
  • 1
    Apologies; I checked to see if this question had been asked before and missed that. However, he's talking about storing thousands of Strings whereas I'm talking about a million or so ints. – Bryan Head Jul 25 '09 at 21:26
3

I would go with Kevin's advise.

Stay with the lists first and measure your performance if your programm is to slow compare it to a version with an array. If that gives you a measurable performance boost go with the arrays, if not stay with the lists because they will make your life much much easier.

Janusz
  • 176,216
  • 111
  • 293
  • 365
  • Ya, I've been using ArrayLists, but a lot of people have been requesting speed improvements. – Bryan Head Jul 25 '09 at 21:30
  • Same thing :) get a profiler to measure the speed of your program and look for the real bottlenecks and then optimize them. Many people I know recommend the Netbeans Profiler for Java. – Janusz Jul 25 '09 at 22:10
3

There will be an overhead from using an ArrayList instead of an array, but it is very likely to be small. In fact, the useful bit of data in the ArrayList can be stored in registers, although you will probably use more (List size for instance).

You mention in your edit that you are using wrapper objects. These do make a huge difference. If you are typically using the same value repeatedly, then a sensible cache policy may be useful (Integer.valueOf gives the same results for -128 to 128). For primitives, primitive arrays usually win comfortably.

As a refinement, you might want to make sure the adjacent cells tend to be adjacent in the array (you can do better than rows of columns with a space filling curve).

Tom Hawtin - tackline
  • 139,906
  • 30
  • 206
  • 293
2

One possibility would be to re-implement ArrayList (it's not that hard), but expose the backing array via a lock/release call cycle. This gets you convenience for your writes, but exposes the array for a large series of read/write operations that you know in advance won't impact the array size. If the list is locked, add/delete is not allowed - just get/set.

for example:

  SomeObj[] directArray = myArrayList.lockArray();
  try{
    // myArrayList.add(), delete() would throw an illegal state exception
    for (int i = 0; i < 50000; i++){
      directArray[i] += 1;
    }
  } finally {
    myArrayList.unlockArray();
  }

This approach continues to encapsulate the array growth/etc... behaviors of ArrayList.

Kevin Day
  • 15,263
  • 8
  • 35
  • 66
  • This is clever and not too hard. Especially because I'm using ints, the re-implementation could get a speed boost from using primitives instead of wrapper classes. Do most jvms optimize away the performance loss in using wrapper classes for primitives? – Bryan Head Jul 25 '09 at 21:32
  • AFAIK, no they don't. In fact I don't think that they could. The fact that you are talking about "int[]" versus "ArrayList" significantly changes the answers. – Stephen C Jul 26 '09 at 01:50
  • @stephen C - Exactly... arrays clearly win when dealing with primitives due to the object wrapper overhead required by ArrayList. – jsight Jul 26 '09 at 05:05
2

Java uses double indirection for its objects so they can be moved about in memory and have its references still be valid, this means every reference lookup is equivalent to two pointer lookups. These extra lookups cannot be optimised away completely.

Perhaps even worse is your cache performance will be terrible. Accessing values in cache is goings to be many times faster than accessing values in main memory. (perhaps 10x) If you have an int[] you know the values will be consecutive in memory and thus load into cache readily. However, for Integer[] the Integers individual objects can appear randomly across your memory and are much more likely to be cache misses. Also Integer use 24 bytes which means they are much less likely to fit into your caches than 4 byte values.

If you update an Integer, this often results in a new object created which is many orders of magnitude than updating an int value.

Peter Lawrey
  • 498,481
  • 72
  • 700
  • 1,075
  • Rubbish. No reasonable Java implementation has used handles for many years (IIRC, very early versions of HotSpot did reintroduce handles, but that was around 1.2.2 - best part of a decade ago). – Tom Hawtin - tackline Jul 26 '09 at 00:26
  • Are all uses of the wrapper classes optimized away then? What is the performance penalty of using Integer instead of int? – Bryan Head Jul 26 '09 at 07:30
  • Hi @Tom, you may be right but I would be interested in how this is achieved. Do you know of any documents which explain how this is achieved without double indirection? – Peter Lawrey Jul 26 '09 at 11:30
  • Have a look at this presentation, http://www.azulsystems.com/events/javaone_2009/session/2009_J1_HardwareCrashCourse.pdf page 65, Perhaps I mis-interpreted what it means. – Peter Lawrey Jul 26 '09 at 11:38
2

If you're creating the list once, and doing thousands of reads from it, the overhead from ArrayList may well be slight enough to ignore. If you're creating thousands of lists, go with the standard array. Object creation in a loop quickly goes quadratic, simply because of all the overhead of instantiating the member variables, calling the constructors up the inheritance chain, etc.

Because of this -- and to answer your second question -- stick with standard ints rather than the Integer class. Profile both and you'll quickly (or, rather, slowly) see why.

rtperson
  • 11,306
  • 4
  • 27
  • 36
1

If you're not going to be doing a lot more than reads from this structure, then go ahead and use an array as that would be faster when read by index.

However, consider how you're going to get the data in there, and if sorting, inserting, deleting, etc, are a concern at all. If so, you may want to consider other collection based structures.

Sev
  • 14,133
  • 8
  • 52
  • 74
  • Adding to the end and deleting both need to happen, but optimizations like doing adding n elements at the same so that the array only needs to be copied once are easy. Oh, I'm doing reads and writes btw. – Bryan Head Jul 25 '09 at 21:28
1

Primitives are much (much much) faster. Always. Even with JIT escape analysis, etc. Skip wrapping things in java.lang.Integer. Also, skip the array bounds check most ArrayList implementations do on get(int). Most JIT's can recognize simple loop patterns and remove the loop, but there isn't much reason to much with it if you're worried about performance.

You don't have to code primitive access yourself - I'd bet you could cut over to using IntArrayList from the COLT library - see http://acs.lbl.gov/~hoschek/colt/ - "Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java") - in a few minutes of refactoring.

1

The options are:
1. To use an array
2. To use the ArrayList which internally uses an array

It is obvious the ArrayList introduces some overhead (look into ArrayList source code). For the 99% of the use cases this overhead can be easily ignored. However if you implement time sensitive algorithms and do tens of millions of reads from a list by index then using bare arrays instead of lists should bring noticeable time savings. USE COMMON SENSE.

Please take a look here: http://robaustin.wikidot.com/how-does-the-performance-of-arraylist-compare-to-array I would personally tweak the test to avoid compiler optimizations, e.g. I would change "j = " into "j += " with the subsequent use of "j" after the loop.

oᴉɹǝɥɔ
  • 1,466
  • 1
  • 16
  • 25
0

An Array will be faster simply because at a minimum it skips a function call (i.e. get(i)).

If you have a static size, then Arrays are your friend.

Will Hartung
  • 107,347
  • 19
  • 121
  • 195