371

I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?

Since arrays keep all the data in a contiguous chunk of memory (unlike Lists), would the use of an array to store thousands of strings cause problems ?

Jonas
  • 97,987
  • 90
  • 271
  • 355
euphoria83
  • 13,314
  • 16
  • 58
  • 70
  • 6
    "Since arrays keep all the data in a contiguous chunk of memory" do you have any sort of citation to back this up for Java? – matt b Apr 04 '09 at 06:01
  • 2
    No matt. I know this for C. I am guessing Java would use the same method. – euphoria83 Apr 04 '09 at 06:05
  • 1
    I doubt that it would keep them in a single chunk of memory. – Fortyrunner Apr 04 '09 at 06:15
  • 4
    Even if it's a single block of memory, it'd still only be around 1000 * 4 = 4kb worth, which is not a lot of memory. – CookieOfFortune Apr 04 '09 at 06:17
  • 1
    Is this a case of premature optimization? The correct answer is "It depends". It depends on what JVM you are running, it depends on how your JIT compiler is optimizing the code, it depends on which platform you're running on. Arrays may be faster in JET or Avian , but lists might be faster on HotSpot or IcedTea. In short, if this is the degree of control you need over performance, you need to use a much lower level language like assembly or C. Java is not suitable. – Mikkel Løkke Oct 24 '13 at 09:13
  • The question (in the comments about memory) can't be answered because List is an interface not an implementation. ArrayList will always keep the data in a contiguous block of memory (it stores an array, hence the name). Other implementations will do other things. – Vectorjohn May 13 '16 at 00:38
  • 3
    @mattb That's what 'array' means throughout CS. No citation necessary. The numerous references in the [JLS](https://docs.oracle.com/javase/specs/jls/se8/html/jls-10.html) and [JVM Spec]() to array lengths are only comprehensible if arrays are contiguous. – user207421 Jun 09 '16 at 20:29
  • 1
    They are surely not stored in continuous chunk of memory. Java object variables are pointers to the actual objects (unless they are local and put on the stack by JVM), so the array of String is, actually, an array of _pointers_ to String objects which can be placed anywhere in the heap. – Pavlus Feb 02 '17 at 15:05
  • @Cee: please withdraw your bad edit. You don't reply or answer within a question – Hovercraft Full Of Eels Nov 11 '18 at 03:53

32 Answers32

379

I suggest that you use a profiler to test which is faster.

My personal opinion is that you should use Lists.

I work on a large codebase and a previous group of developers used arrays everywhere. It made the code very inflexible. After changing large chunks of it to Lists we noticed no difference in speed.

Fortyrunner
  • 12,375
  • 4
  • 29
  • 53
  • 2
    @Fortyrunner - From your experience, are there any such choices in Java between abstraction and raw data forms that do make a significant difference in performance ? – euphoria83 Apr 04 '09 at 17:53
  • 4
    One of the issues with performance measurement is that you constantly have to retest against new versions of Java. I am working on a problem at the moment where someone used an int throughout for a key in a map (to save space/time). We now need to change all lines to a new object - its painful. – Fortyrunner Apr 04 '09 at 19:30
  • 11
    So.. I now try and stay away from raw data. It rarely makes a noticeable difference. Hotspot is an amazing piece of technology and you should never try and second guess. Just try to write simple, maintainable code and Hotspot will do the rest. – Fortyrunner Apr 04 '09 at 19:31
  • 4
    Remember that profiler results are only valid for the Java platform you're running the profiler against. Which may be a different one than your customers. – Mikkel Løkke Oct 24 '13 at 09:14
  • A very good point. I'm lucky that I get to test on a platform identical to the one I ship on, so its not so relevant for me. Lists are so much easier to deal with syntactically though. The things that slow my application down are calls to databases and external services, lists vs arrays makes a negligible difference. – Fortyrunner Oct 25 '13 at 22:42
  • @Fortyrunner that is not lucky it is just normality - if you don't test especially performance test on the same platform as production that is incompetence – mmmmmm Feb 01 '14 at 20:33
  • 4
    Effective Java recommends Lists for they help with API interoperability, and also more secure with type safety. – juanmf Dec 13 '14 at 15:04
  • I used the one that came with Netbeans (thats before VisualVM was bundled). I've used YourKit and it was very good. – Fortyrunner May 27 '15 at 21:52
  • Although... unless you are doing some seriously low latency stuff and a recent JVM.. I really doubt whether there is any real performance difference. – Fortyrunner May 27 '15 at 21:54
  • yes, List is more flexible than array, indeed. So we should choose List first everywhere, unless we need to change it to array for performance(and the change is very small in real world coding. – ZhaoGang Oct 13 '16 at 11:30
172

The Java way is that you should consider what data abstraction most suits your needs. Remember that in Java a List is an abstract, not a concrete data type. You should declare the strings as a List, and then initialize it using the ArrayList implementation.

List<String> strings = new ArrayList<String>();

This separation of Abstract Data Type and specific implementation is one the key aspects of object oriented programming.

An ArrayList implements the List Abstract Data Type using an array as its underlying implementation. Access speed is virtually identical to an array, with the additional advantages of being able to add and subtract elements to a List (although this is an O(n) operation with an ArrayList) and that if you decide to change the underlying implementation later on you can. For example, if you realize you need synchronized access, you can change the implementation to a Vector without rewriting all your code.

In fact, the ArrayList was specifically designed to replace the low-level array construct in most contexts. If Java was being designed today, it's entirely possible that arrays would have been left out altogether in favor of the ArrayList construct.

Since arrays keep all the data in a contiguous chunk of memory (unlike Lists), would the use of an array to store thousands of strings cause problems ?

In Java, all collections store only references to objects, not the objects themselves. Both arrays and ArrayList will store a few thousand references in a contiguous array, so they are essentially identical. You can consider that a contiguous block of a few thousand 32-bit references will always be readily available on modern hardware. This does not guarantee that you will not run out of memory altogether, of course, just that the contiguous block of memory requirement is not difficult to fufil.

cygil
  • 3,456
  • 1
  • 16
  • 10
  • Adding may of course involve reallocating the backing array, so if performance is important and the size of the array is known in advance, one should consider using ArrayList#ensureCapacity. – JesperE Apr 04 '09 at 06:41
  • 6
    Don't you pay the cost of dynamic binding here? – Uri Apr 05 '09 at 17:20
  • 2
    I'd guess adding is not O(n) in ArrayList, there should be some ammortization effect when adding more than once, e.g. capacity is doubled instead of increased by just 1. – zedoo Jul 25 '12 at 22:03
  • @zedoo I think they meant add and subtract in the middle. – MalcolmOcean Aug 21 '12 at 22:46
  • *" If Java was being designed today, it's entirely possible that arrays would have been left out altogether in favor of the ArrayList construct."* ... I seriously doubt that this would be true. If it were the *JVM* being rewritten today, then what you've said is certainly possible. But with the JVM we have, arrays are a foundational type in Java. – scottb Aug 08 '19 at 13:43
  • You don't need the type specifier on the right hand side (DRY): List strings = new ArrayList<>(); – geowar Dec 17 '20 at 12:53
115

Although the answers proposing to use ArrayList do make sense in most scenario, the actual question of relative performance has not really been answered.

There are a few things you can do with an array:

  • create it
  • set an item
  • get an item
  • clone/copy it

General conclusion

Although get and set operations are somewhat slower on an ArrayList (resp. 1 and 3 nanosecond per call on my machine), there is very little overhead of using an ArrayList vs. an array for any non-intensive use. There are however a few things to keep in mind:

  • resizing operations on a list (when calling list.add(...)) are costly and one should try to set the initial capacity at an adequate level when possible (note that the same issue arises when using an array)
  • when dealing with primitives, arrays can be significantly faster as they will allow one to avoid many boxing/unboxing conversions
  • an application that only gets/sets values in an ArrayList (not very common!) could see a performance gain of more than 25% by switching to an array

Detailed results

Here are the results I measured for those three operations using the jmh benchmarking library (times in nanoseconds) with JDK 7 on a standard x86 desktop machine. Note that ArrayList are never resized in the tests to make sure results are comparable. Benchmark code available here.

Array/ArrayList Creation

I ran 4 tests, executing the following statements:

  • createArray1: Integer[] array = new Integer[1];
  • createList1: List<Integer> list = new ArrayList<> (1);
  • createArray10000: Integer[] array = new Integer[10000];
  • createList10000: List<Integer> list = new ArrayList<> (10000);

Results (in nanoseconds per call, 95% confidence):

a.p.g.a.ArrayVsList.CreateArray1         [10.933, 11.097]
a.p.g.a.ArrayVsList.CreateList1          [10.799, 11.046]
a.p.g.a.ArrayVsList.CreateArray10000    [394.899, 404.034]
a.p.g.a.ArrayVsList.CreateList10000     [396.706, 401.266]

Conclusion: no noticeable difference.

get operations

I ran 2 tests, executing the following statements:

  • getList: return list.get(0);
  • getArray: return array[0];

Results (in nanoseconds per call, 95% confidence):

a.p.g.a.ArrayVsList.getArray   [2.958, 2.984]
a.p.g.a.ArrayVsList.getList    [3.841, 3.874]

Conclusion: getting from an array is about 25% faster than getting from an ArrayList, although the difference is only on the order of one nanosecond.

set operations

I ran 2 tests, executing the following statements:

  • setList: list.set(0, value);
  • setArray: array[0] = value;

Results (in nanoseconds per call):

a.p.g.a.ArrayVsList.setArray   [4.201, 4.236]
a.p.g.a.ArrayVsList.setList    [6.783, 6.877]

Conclusion: set operations on arrays are about 40% faster than on lists, but, as for get, each set operation takes a few nanoseconds - so for the difference to reach 1 second, one would need to set items in the list/array hundreds of millions of times!

clone/copy

ArrayList's copy constructor delegates to Arrays.copyOf so performance is identical to array copy (copying an array via clone, Arrays.copyOf or System.arrayCopy makes no material difference performance-wise).

Community
  • 1
  • 1
assylias
  • 297,541
  • 71
  • 621
  • 741
  • 1
    Nice analysis. However, with respect to your comment "when dealing with primitives, arrays can be significantly faster as they will allow one to avoid many boxing/unboxing conversions", you *can* have your cake and eat it too, with a primitive-array-backed List implementation; e.g.: https://github.com/scijava/scijava-common/blob/master/src/main/java/org/scijava/util/IntArray.java#L41. I'm actually quite surprised such a thing hasn't made it into core Java. – ctrueden Sep 27 '13 at 17:12
  • 2
    @ctrueden yes the comment applied to the standard JDK ArrayList. trove4j is a well known library that supports primitive lists. Java 8 brings some improvements with several primitive-specialised Streams. – assylias Sep 27 '13 at 17:15
  • 1
    I don't know how jmh benchmarks work but do they take into account the JIT compiling that can happen? The performance of a Java application can vary over time as the JVM compiles your code. – Hoffmann Oct 07 '14 at 14:24
  • 1
    @Hoffmann Yes - it includes a warmup phase which is excluded from the measurement. – assylias Oct 07 '14 at 14:58
106

You should prefer generic types over arrays. As mentioned by others, arrays are inflexible and do not have the expressive power of generic types. (They do however support runtime typechecking, but that mixes badly with generic types.)

But, as always, when optimizing you should always follow these steps:

  • Don't optimize until you have a nice, clean, and working version of your code. Changing to generic types could very well be motivated at this step already.
  • When you have a version that is nice and clean, decide if it is fast enough.
  • If it isn't fast enough, measure its performance. This step is important for two reasons. If you don't measure you won't (1) know the impact of any optimizations you make and (2) know where to optimize.
  • Optimize the hottest part of your code.
  • Measure again. This is just as important as measuring before. If the optimization didn't improve things, revert it. Remember, the code without the optimization was clean, nice, and working.
Kelmikra
  • 187
  • 9
JesperE
  • 59,843
  • 19
  • 133
  • 192
25

I'm guessing the original poster is coming from a C++/STL background which is causing some confusion. In C++ std::list is a doubly linked list.

In Java [java.util.]List is an implementation-free interface (pure abstract class in C++ terms). List can be a doubly linked list - java.util.LinkedList is provided. However, 99 times out of 100 when you want a make a new List, you want to use java.util.ArrayList instead, which is the rough equivalent of C++ std::vector. There are other standard implementations, such as those returned by java.util.Collections.emptyList() and java.util.Arrays.asList().

From a performance standpoint there is a very small hit from having to go through an interface and an extra object, however runtime inlining means this rarely has any significance. Also remember that String are typically an object plus array. So for each entry, you probably have two other objects. In C++ std::vector<std::string>, although copying by value without a pointer as such, the character arrays will form an object for string (and these will not usually be shared).

If this particular code is really performance-sensitive, you could create a single char[] array (or even byte[]) for all the characters of all the strings, and then an array of offsets. IIRC, this is how javac is implemented.

Tom Hawtin - tackline
  • 139,906
  • 30
  • 206
  • 293
  • 2
    Thanx for the answer. But no, I am not confusing the C++ list with Java's interface List. I asked the question in such a way because I wanted to compare the performance of List implementations like ArrayList and Vector with raw arrays. – euphoria83 Apr 04 '09 at 18:24
  • Both ArrayList and Vector "keep all the data in a contiguous chunk of memory". – Tom Hawtin - tackline Apr 04 '09 at 18:56
13

I agree that in most cases you should choose the flexibility and elegance of ArrayLists over arrays - and in most cases the impact to program performance will be negligible.

However, if you're doing constant, heavy iteration with little structural change (no adds and removes) for, say, software graphics rendering or a custom virtual machine, my sequential access benchmarking tests show that ArrayLists are 1.5x slower than arrays on my system (Java 1.6 on my one year-old iMac).

Some code:

import java.util.*;

public class ArrayVsArrayList {
    static public void main( String[] args ) {

        String[] array = new String[300];
        ArrayList<String> list = new ArrayList<String>(300);

        for (int i=0; i<300; ++i) {
            if (Math.random() > 0.5) {
                array[i] = "abc";
            } else {
                array[i] = "xyz";
            }

            list.add( array[i] );
        }

        int iterations = 100000000;
        long start_ms;
        int sum;

        start_ms = System.currentTimeMillis();
        sum = 0;

        for (int i=0; i<iterations; ++i) {
          for (int j=0; j<300; ++j) sum += array[j].length();
        }

        System.out.println( (System.currentTimeMillis() - start_ms) + " ms (array)" );
        // Prints ~13,500 ms on my system

        start_ms = System.currentTimeMillis();
        sum = 0;

        for (int i=0; i<iterations; ++i) {
          for (int j=0; j<300; ++j) sum += list.get(j).length();
        }

        System.out.println( (System.currentTimeMillis() - start_ms) + " ms (ArrayList)" );
        // Prints ~20,800 ms on my system - about 1.5x slower than direct array access
    }
}
Wrench
  • 3,060
  • 1
  • 28
  • 46
AbePralle
  • 820
  • 11
  • 11
  • I found this an interesting answer, but I'd wonder if it's even worse if the ArrayList is not initialized with an initial size in memory. Generally the benefit of using ArrayList over a native array in a sense is that you won't know and you don't have to worry. ArrayLists are by default created with initial length 10 and then are resized. I think the resizing is expensive. I haven't tried benchmarking it obviously. – Zak Patterson Nov 12 '12 at 21:46
  • 5
    This micro benchmark has flaws (no warm up, operations not in a separate method so the arraylist part is never optimised by the JIT etc.) – assylias Nov 25 '12 at 18:21
  • I agree with assylias. The results of this benchmark should not be trusted. – Stephen C Apr 28 '13 at 11:36
  • @StephenC I have added a proper micro benchmark (that shows that get operations are comparable). – assylias May 15 '13 at 12:46
13

Well firstly it's worth clarifying do you mean "list" in the classical comp sci data structures sense (ie a linked list) or do you mean java.util.List? If you mean a java.util.List, it's an interface. If you want to use an array just use the ArrayList implementation and you'll get array-like behaviour and semantics. Problem solved.

If you mean an array vs a linked list, it's a slightly different argument for which we go back to Big O (here is a plain English explanation if this is an unfamiliar term.

Array;

  • Random Access: O(1);
  • Insert: O(n);
  • Delete: O(n).

Linked List:

  • Random Access: O(n);
  • Insert: O(1);
  • Delete: O(1).

So you choose whichever one best suits how you resize your array. If you resize, insert and delete a lot then maybe a linked list is a better choice. Same goes for if random access is rare. You mention serial access. If you're mainly doing serial access with very little modification then it probably doesn't matter which you choose.

Linked lists have a slightly higher overhead since, like you say, you're dealing with potentially non-contiguous blocks of memory and (effectively) pointers to the next element. That's probably not an important factor unless you're dealing with millions of entries however.

Community
  • 1
  • 1
cletus
  • 578,732
  • 155
  • 890
  • 933
11

I wrote a little benchmark to compare ArrayLists with Arrays. On my old-ish laptop, the time to traverse through a 5000-element arraylist, 1000 times, was about 10 milliseconds slower than the equivalent array code.

So, if you're doing nothing but iterating the list, and you're doing it a lot, then maybe it's worth the optimisation. Otherwise I'd use the List, because it'll make it easier when you do need to optimise the code.

n.b. I did notice that using for String s: stringsList was about 50% slower than using an old-style for-loop to access the list. Go figure... Here's the two functions I timed; the array and list were filled with 5000 random (different) strings.

private static void readArray(String[] strings) {
    long totalchars = 0;
    for (int j = 0; j < ITERATIONS; j++) {
        totalchars = 0;
        for (int i = 0; i < strings.length; i++) {
            totalchars += strings[i].length();

        }
    }
}

private static void readArrayList(List<String> stringsList) {
    long totalchars = 0;
    for (int j = 0; j < ITERATIONS; j++) {
        totalchars = 0;
        for (int i = 0; i < stringsList.size(); i++) {
            totalchars += stringsList.get(i).length();
        }
    }
}
Chris May
  • 670
  • 3
  • 6
  • @ Chris May : Great work ! What are the actual running times for both ? Can you tell me the size of the strings you were using ? Also, as the use of 'String s : stringsList' made it take longer, this is my primary fear in using the higher abstractions in Java in general. – euphoria83 Apr 04 '09 at 16:28
  • It doesn't really matter how long the strings are for this mcirobenchmark. There is no gc, and the `char[]` is not touched (this is not C). – Tom Hawtin - tackline Apr 04 '09 at 19:01
  • Typical times for me were ~25ms for the array version, ~35ms for the ArrayList version. The strings were 15-20 chars long. As Tom says, the string size doesn't make much of a difference, with a ~100-char string the timings were about same. – Chris May Apr 05 '09 at 13:35
  • 3
    How did you measure? Naive measuring in Java micro benchmarks usually generates more misinformation than information. Beware of above statement. – jmg Jul 27 '11 at 05:53
7

No, because technically, the array only stores the reference to the strings. The strings themselves are allocated in a different location. For a thousand items, I would say a list would be better, it is slower, but it offers more flexibility and it's easier to use, especially if you are going to resize them.

CookieOfFortune
  • 12,980
  • 8
  • 36
  • 56
7

If you have thousands, consider using a trie. A trie is a tree-like structure that merges the common prefixes of the stored string.

For example, if the strings were

intern
international
internationalize
internet
internets

The trie would store:

intern
 -> \0
 international
 -> \0
 -> ize\0
 net
 ->\0
 ->s\0

The strings requires 57 characters (including the null terminator, '\0') for storage, plus whatever the size of the String object that holds them. (In truth, we should probably round all sizes up to multiples of 16, but...) Call it 57 + 5 = 62 bytes, roughly.

The trie requires 29 (including the null terminator, '\0') for storage, plus sizeof the trie nodes, which are a ref to an array and a list of child trie nodes.

For this example, that probably comes out about the same; for thousands, it probably comes out less as long as you do have common prefixes.

Now, when using the trie in other code, you'll have to convert to String, probably using a StringBuffer as an intermediary. If many of the strings are in use at once as Strings, outside the trie, it's a loss.

But if you're only using a few at the time -- say, to look up things in a dictionary -- the trie can save you a lot of space. Definitely less space than storing them in a HashSet.

You say you're accessing them "serially" -- if that means sequentially an alphabetically, the trie also obviously gives you alphabetical order for free, if you iterate it depth-first.

tpdi
  • 32,745
  • 10
  • 74
  • 117
6

Since there are already a lot of good answers here, I would like to give you some other information of practical view, which is insertion and iteration performance comparison : primitive array vs Linked-list in Java.

This is actual simple performance check.
So, the result will depend on the machine performance.

Source code used for this is below :

import java.util.Iterator;
import java.util.LinkedList;

public class Array_vs_LinkedList {

    private final static int MAX_SIZE = 40000000;

    public static void main(String[] args) {

        LinkedList lList = new LinkedList(); 

        /* insertion performance check */

        long startTime = System.currentTimeMillis();

        for (int i=0; i<MAX_SIZE; i++) {
            lList.add(i);
        }

        long stopTime = System.currentTimeMillis();
        long elapsedTime = stopTime - startTime;
        System.out.println("[Insert]LinkedList insert operation with " + MAX_SIZE + " number of integer elapsed time is " + elapsedTime + " millisecond.");

        int[] arr = new int[MAX_SIZE];

        startTime = System.currentTimeMillis();
        for(int i=0; i<MAX_SIZE; i++){
            arr[i] = i; 
        }

        stopTime = System.currentTimeMillis();
        elapsedTime = stopTime - startTime;
        System.out.println("[Insert]Array Insert operation with " + MAX_SIZE + " number of integer elapsed time is " + elapsedTime + " millisecond.");


        /* iteration performance check */

        startTime = System.currentTimeMillis();

        Iterator itr = lList.iterator();

        while(itr.hasNext()) {
            itr.next();
            // System.out.println("Linked list running : " + itr.next());
        }

        stopTime = System.currentTimeMillis();
        elapsedTime = stopTime - startTime;
        System.out.println("[Loop]LinkedList iteration with " + MAX_SIZE + " number of integer elapsed time is " + elapsedTime + " millisecond.");


        startTime = System.currentTimeMillis();

        int t = 0;
        for (int i=0; i < MAX_SIZE; i++) {
            t = arr[i];
            // System.out.println("array running : " + i);
        }

        stopTime = System.currentTimeMillis();
        elapsedTime = stopTime - startTime;
        System.out.println("[Loop]Array iteration with " + MAX_SIZE + " number of integer elapsed time is " + elapsedTime + " millisecond.");
    }
}

Performance Result is below :

enter image description here

boraseoksoon
  • 1,792
  • 1
  • 16
  • 23
6

I came here to get a better feeling for the performance impact of using lists over arrays. I had to adapt code here for my scenario: array/list of ~1000 ints using mostly getters, meaning array[j] vs. list.get(j)

Taking the best of 7 to be unscientific about it (first few with list where 2.5x slower) I get this:

array Integer[] best 643ms iterator
ArrayList<Integer> best 1014ms iterator

array Integer[] best 635ms getter
ArrayList<Integer> best 891ms getter (strange though)

- so, very roughly 30% faster with array

The second reason for posting now is that no-one mentions the impact if you do math/matrix/simulation/optimization code with nested loops.

Say you have three nested levels and the inner loop is twice as slow you are looking at 8 times performance hit. Something that would run in a day now takes a week.

*EDIT Quite shocked here, for kicks I tried declaring int[1000] rather than Integer[1000]

array int[] best 299ms iterator
array int[] best 296ms getter

Using Integer[] vs. int[] represents a double performance hit, ListArray with iterator is 3x slower than int[]. Really thought Java's list implementations were similar to native arrays...

Code for reference (call multiple times):

    public static void testArray()
    {
        final long MAX_ITERATIONS = 1000000;
        final int MAX_LENGTH = 1000;

        Random r = new Random();

        //Integer[] array = new Integer[MAX_LENGTH];
        int[] array = new int[MAX_LENGTH];

        List<Integer> list = new ArrayList<Integer>()
        {{
            for (int i = 0; i < MAX_LENGTH; ++i)
            {
                int val = r.nextInt();
                add(val);
                array[i] = val;
            }
        }};

        long start = System.currentTimeMillis();
        int test_sum = 0;
        for (int i = 0; i < MAX_ITERATIONS; ++i)
        {
//          for (int e : array)
//          for (int e : list)          
            for (int j = 0; j < MAX_LENGTH; ++j)
            {
                int e = array[j];
//              int e = list.get(j);
                test_sum += e;
            }
        }

        long stop = System.currentTimeMillis();

        long ms = (stop - start);
        System.out.println("Time: " + ms);
    }
Xult
  • 61
  • 1
  • 2
  • Please read https://www.oracle.com/technical-resources/articles/java/architect-benchmarking.html before attempting to write benchmarks. (TL;DR: don't use timed loops, use something like JMH *after* reading it's documentation carefully.) – René Jan 04 '21 at 10:11
5

UPDATE:

As Mark noted there is no significant difference after JVM warm up (several test passes). Checked with re-created array or even new pass starting with new row of matrix. With great probability this signs simple array with index access is not to be used in favor of collections.

Still first 1-2 passes simple array is 2-3 times faster.

ORIGINAL POST:

Too much words for the subject too simple to check. Without any question array is several times faster than any class container. I run on this question looking for alternatives for my performance critical section. Here is the prototype code I built to check real situation:

import java.util.List;
import java.util.Arrays;

public class IterationTest {

    private static final long MAX_ITERATIONS = 1000000000;

    public static void main(String [] args) {

        Integer [] array = {1, 5, 3, 5};
        List<Integer> list = Arrays.asList(array);

        long start = System.currentTimeMillis();
        int test_sum = 0;
        for (int i = 0; i < MAX_ITERATIONS; ++i) {
//            for (int e : array) {
            for (int e : list) {
                test_sum += e;
            }
        }
        long stop = System.currentTimeMillis();

        long ms = (stop - start);
        System.out.println("Time: " + ms);
    }
}

And here is the answer:

Based on array (line 16 is active):

Time: 7064

Based on list (line 17 is active):

Time: 20950

Any more comment on 'faster'? This is quite understood. The question is when about 3 time faster is better for you than flexibility of List. But this is another question. By the way I checked this too based on manually constructed ArrayList. Almost the same result.

Roman Nikitchenko
  • 11,946
  • 7
  • 65
  • 101
  • 2
    `3` times faster true, but insignificantly so. `14ms` is not a long time – 0x6C38 Aug 19 '13 at 00:54
  • 1
    Benchmark is not considering JVM warm up. Change main() to test() and call test from main repeatedly. By the 3rd or 4th run of test, it runs many times faster. At that point, I'm seeing that array is about 9 times faster than array. – Mike Oct 24 '13 at 03:50
4

list is slower than arrays.If you need efficiency use arrays.If you need flexibility use list.

Warrior
  • 37,935
  • 44
  • 133
  • 211
4

Remember that an ArrayList encapsulates an array, so there is little difference compared to using a primitive array (except for the fact that a List is much easier to work with in java).

The pretty much the only time it makes sense to prefer an array to an ArrayList is when you are storing primitives, i.e. byte, int, etc and you need the particular space-efficiency you get by using primitive arrays.

Nuoji
  • 3,348
  • 2
  • 19
  • 33
4

Array vs. List choice is not so important (considering performance) in the case of storing string objects. Because both array and list will store string object references, not the actual objects.

  1. If the number of strings is almost constant then use an array (or ArrayList). But if the number varies too much then you'd better use LinkedList.
  2. If there is (or will be) a need for adding or deleting elements in the middle, then you certainly have to use LinkedList.
Emre
  • 656
  • 5
  • 12
3

It you can live with a fixed size, arrays will will be faster and need less memory.

If you need the flexibility of the List interface with adding and removing elements, the question remains which implementation you should choose. Often ArrayList is recommended and used for any case, but also ArrayList has its performance problems if elements at the beginning or in the middle of the list must be removed or inserted.

You therefore may want to have a look at http://java.dzone.com/articles/gaplist-%E2%80%93-lightning-fast-list which introduces GapList. This new list implementation combines the strengths of both ArrayList and LinkedList resulting in very good performance for nearly all operations.

3

If you know in advance how large the data is then an array will be faster.

A List is more flexible. You can use an ArrayList which is backed by an array.

TofuBeer
  • 58,140
  • 15
  • 111
  • 160
  • The ArrayList has an ensureCapacity() method which preallocates the backing array to the specified size. – JesperE Apr 04 '09 at 06:39
  • Or you can specify the size at construction time. Also "faster" here means "a few microseconds to allocate two memory areas instead of one" – Aaron Digulla Apr 04 '09 at 09:10
2

List is the preferred way in java 1.5 and beyond as it can use generics. Arrays cannot have generics. Also Arrays have a pre defined length, which cannot grow dynamically. Initializing an array with a large size is not a good idea. ArrayList is the the way to declare an array with generics and it can dynamically grow. But if delete and insert is used more frequently, then linked list is the fastest data structure to be used.

Shehan Simen
  • 586
  • 10
  • 18
2

Arrays recommended everywhere you may use them instead of list, especially in case if you know items count and size would not be changing.

See Oracle Java best practice: http://docs.oracle.com/cd/A97688_16/generic.903/bp/java.htm#1007056

Of course, if you need add and remove objects from collection many times easy use lists.

Nik
  • 6,525
  • 7
  • 48
  • 73
  • The documentation you linked to is more than 10 years old, i.e. applies to java 1.3. Major performance improvements have been made since then... – assylias Oct 07 '14 at 15:01
  • @assylias see answers above, they contains performance tests, that says that arrays are faster – Nik Oct 08 '14 at 04:14
  • 3
    I know I wrote one of them. But I don't think that "*arrays are recommended everywhere you can use them instead of lists*" is a good advice. ArrayList should be the default choice in most situations unless you are dealing with primitives and your code is performance sensitive. – assylias Oct 08 '14 at 06:37
2

A lot of microbenchmarks given here have found numbers of a few nanoseconds for things like array/ArrayList reads. This is quite reasonable if everything is in your L1 cache.

A higher level cache or main memory access can have order of magnitude times of something like 10nS-100nS, vs more like 1nS for L1 cache. Accessing an ArrayList has an extra memory indirection, and in a real application you could pay this cost anything from almost never to every time, depending on what your code is doing between accesses. And, of course, if you have a lot of small ArrayLists this might add to your memory use and make it more likely you'll have cache misses.

The original poster appears to be using just one and accessing a lot of contents in a short time, so it should be no great hardship. But it might be different for other people, and you should watch out when interpreting microbenchmarks.

Java Strings, however, are appallingly wasteful, especially if you store lots of small ones (just look at them with a memory analyzer, it seems to be > 60 bytes for a string of a few characters). An array of strings has an indirection to the String object, and another from the String object to a char[] which contains the string itself. If anything's going to blow your L1 cache it's this, combined with thousands or tens of thousands of Strings. So, if you're serious - really serious - about scraping out as much performance as possible then you could look at doing it differently. You could, say, hold two arrays, a char[] with all the strings in it, one after another, and an int[] with offsets to the starts. This will be a PITA to do anything with, and you almost certainly don't need it. And if you do, you've chosen the wrong language.

Alex Hayward
  • 216
  • 1
  • 4
2

Depending on the implementation. it's possible that an array of primitive types will be smaller and more efficient than ArrayList. This is because the array will store the values directly in a contiguous block of memory, while the simplest ArrayList implementation will store pointers to each value. On a 64-bit platform especially, this can make a huge difference.

Of course, it's possible for the jvm implementation to have a special case for this situation, in which case the performance will be the same.

JRalph
  • 329
  • 2
  • 4
2

None of the answers had information that I was interested in - repetitive scan of the same array many many times. Had to create a JMH test for this.

Results (Java 1.8.0_66 x32, iterating plain array is at least 5 times quicker than ArrayList):

Benchmark                    Mode  Cnt   Score   Error  Units
MyBenchmark.testArrayForGet  avgt   10   8.121 ? 0.233  ms/op
MyBenchmark.testListForGet   avgt   10  37.416 ? 0.094  ms/op
MyBenchmark.testListForEach  avgt   10  75.674 ? 1.897  ms/op

Test

package my.jmh.test;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MyBenchmark {

    public final static int ARR_SIZE = 100;
    public final static int ITER_COUNT = 100000;

    String arr[] = new String[ARR_SIZE];
    List<String> list = new ArrayList<>(ARR_SIZE);

    public MyBenchmark() {
        for( int i = 0; i < ARR_SIZE; i++ ) {
            list.add(null);
        }
    }

    @Benchmark
    public void testListForEach() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( String str : list ) {
                if( str != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

    @Benchmark
    public void testListForGet() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( int j = 0; j < ARR_SIZE; j++ ) {
                if( list.get(j) != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

    @Benchmark
    public void testArrayForGet() {
        int count = 0;
        for( int i = 0; i < ITER_COUNT; i++ ) {
            for( int j = 0; j < ARR_SIZE; j++ ) {
                if( arr[j] != null )
                    count++;
            }
        }
        if( count > 0 )
            System.out.print(count);
    }

}
Xtra Coder
  • 2,891
  • 2
  • 30
  • 52
2

"Thousands" is not a large number. A few thousand paragraph-length strings are on the order of a couple of megabytes in size. If all you want to do is access these serially, use an immutable singly-linked List.

Apocalisp
  • 33,619
  • 8
  • 100
  • 150
1

It depends on how you have to access it.

After storing, if you mainly want to do search operation, with little or no insert/delete, then go for Array (as search is done in O(1) in arrays, whereas add/delete may need re-ordering of the elements).

After storing, if your main purpose is to add/delete strings, with little or no search operation, then go for List.

Vikram
  • 3,624
  • 8
  • 33
  • 56
1

ArrayList internally uses array object to add(or store) the elements. In other words, ArrayList is backed by Array data -structure.The array of ArrayList is resizable (or dynamic).

Array is faster than ArrayList because ArrayList internally uses an array. if we can directly add elements in Array and indirectly add an element in Array through ArrayList always directly mechanism is faster than an indirect mechanism.

There is two overloaded add() methods in ArrayList class:

  1. add(Object): adds an object to the end of the list.
  2. add(int index, Object ): inserts the specified object at the specified position in the list.

How the size of ArrayList grows dynamically?

public boolean add(E e)        
{       
     ensureCapacity(size+1);
     elementData[size++] = e;         
     return true;
}

An important point to note from the above code is that we are checking the capacity of the ArrayList, before adding the element. ensureCapacity() determines what is the current size of occupied elements and what is the maximum size of the array. If the size of the filled elements (including the new element to be added to the ArrayList class) is greater than the maximum size of the array then increase the size of the array. But the size of the array can not be increased dynamically. So what happens internally is new Array is created with the capacity

Till Java 6

int newCapacity = (oldCapacity * 3)/2 + 1;

(Update) From Java 7

 int newCapacity = oldCapacity + (oldCapacity >> 1);

also, data from the old array is copied into the new array.

Having overhead methods in ArrayList that's why Array is faster than ArrayList.

Abhinav
  • 500
  • 8
  • 18
Vipin Jain
  • 5,031
  • 1
  • 21
  • 35
1

Arrays - It would always be better when we have to achieve faster fetching of results

Lists- Performs results on insertion and deletion since they can be done in O(1) and this also provides methods to add, fetch and delete data easily. Much easier to use.

But always remember that the fetching of data would be fast when the index position in array where the data is stored - is known.

This could be achieved well by sorting the array. Hence this increases the time to fetch the data (ie; storing the data + sorting the data + seek for the position where the data is found). Hence this increases additional latency to fetch the data from the array even they may be good at fetching the data sooner.

Hence this could be solved with trie data structure or ternary data structure. As discussed above the trie data structure would be very efficient in searching for the data the search for a particularly word can be done in O(1) magnitude. When time matters ie; if you have to search and retrieve data quickly you may go with trie data structure.

If you want your memory space to be consumed less and you wish to have a better performance then go with ternary data structure. Both these are suitable for storing huge number of strings (eg; like words contained in dictionary).

Tom Taylor
  • 2,378
  • 1
  • 27
  • 48
1

Don't get into the trap of optimizing without proper benchmarking. As others have suggested use a profiler before making any assumption.

The different data structures that you have enumerated have different purposes. A list is very efficient at inserting elements in the beginning and at the end but suffers a lot when accessing random elements. An array has fixed storage but provides fast random access. Finally an ArrayList improves the interface to an array by allowing it to grow. Normally the data structure to be used should be dictated by how the data stored will be access or added.

About memory consumption. You seem to be mixing some things. An array will only give you a continuous chunk of memory for the type of data that you have. Don't forget that java has a fixed data types: boolean, char, int, long, float and Object (this include all objects, even an array is an Object). It means that if you declare an array of String strings [1000] or MyObject myObjects [1000] you only get a 1000 memory boxes big enough to store the location (references or pointers) of the objects. You don't get a 1000 memory boxes big enough to fit the size of the objects. Don't forget that your objects are first created with "new". This is when the memory allocation is done and later a reference (their memory address) is stored in the array. The object doesn't get copied into the array only it's reference.

potyl
  • 1,534
  • 10
  • 9
1

I don't think it makes a real difference for Strings. What is contiguous in an array of strings is the references to the strings, the strings themselves are stored at random places in memory.

Arrays vs. Lists can make a difference for primitive types, not for objects. IF you know in advance the number of elements, and don't need flexibility, an array of millions of integers or doubles will be more efficient in memory and marginally in speed than a list, because indeed they will be stored contiguously and accessed instantly. That's why Java still uses arrays of chars for strings, arrays of ints for image data, etc.

PhiLho
  • 38,673
  • 6
  • 89
  • 128
1

Array is faster - all memory is pre-allocated in advance.

Yakov Fain
  • 10,540
  • 4
  • 29
  • 35
0

Which one to use depends on the problem. We need to look at the Big O.

Big O Values of List and Array

image source: https://github.com/egonSchiele/grokking_algorithms

Oguzhan Cevik
  • 380
  • 3
  • 14
-4

A List is more flexible.... so better to List than array

RV.
  • 2,596
  • 7
  • 36
  • 49