0

First and foremost-- I have a file of strings. The smallest file is about 20 strings. The largest file is currently 12,000 strings of varying lengths (anywhere from one character to about 80). I suspect I may have up to a 60,000 string file in the future.

Initially I made a standard array of strings with a default size of 200 and doubled the size and copied the array to a new array if needed (while reading the file into the array). This method was pretty fast. However, the readability and extra coding for methods like search or contains was not appealing. I tried a List interface instead-- and read the file in using the typical list.add(line) until there were no more lines.

My question is: What is the default size of an ArrayList<> and does this method result in too many allocations/resizes? Is there any performance points I should know about these two methods and which would be better?

  • 1
    List is an interface.... – Jesus Ramos Jul 27 '11 at 01:38
  • 1
    The title of your question doesn't match with what you have asked. As List is just an interface to ArrayList and others. – Vishal Jul 27 '11 at 01:42
  • Right. Are you talking about LinkedList, @Google? – Marvo Jul 27 '11 at 01:42
  • I think I fixed it now-- my main concern is using an ArrayList or a standard String array as I originally used. I hope this makes sense. –  Jul 27 '11 at 01:48
  • 1
    This may help on your decision, [arrayOrList](http://stackoverflow.com/questions/716597/array-or-list-in-java-which-is-faster). In my opinion I would use a List, Its easier to maintain and gives you more flexibility in general. – Hassek Jul 27 '11 at 01:53

4 Answers4

3

ArrayList defaults to size 10. The amortized cost is not very expensive, even if you start with size 1. You could turn the cost down to nearly 0 if you initialize it with a high capacity:

List myList = new ArrayList<String>(100000);

Also, you should realize that the List interface doesn't intrinsically have any performance standards. Its implementations like LinkedList and ArrayList do.

Edit: I'm lazy and would never use a straight array. ArrayList pretty much is the array with all of the functions like add() and remove() built in. The traditional list implementation, the ArrayList, is the alternative that I would usually consider, but if you are going to be searching the thing I'd suggest sorting it once after you're done loading it, and using an ArrayList to make use of that with binary search.

  • 1
    ArrayList gets `(oldCapacity * 3)/2 + 1;` every time it needs more space, so if you start at 100000 and it needs more capacity you will get just too much free spaces. – Hassek Jul 27 '11 at 02:01
  • @Hassek I thought it just doubled the old capacity. But that doesn't matter, because in the worst case you'll end up using 2*n space, if you have n inputs. Unless you know n beforehand, you'll end up wasting space no matter what the initial capacity is. –  Jul 27 '11 at 02:03
  • Thats true, I would just set a smaller number, as it is it doesn't waist that much computation time and saves memory. The problem of having to get the list bigger more frequently will persist that way, I guess it just depends or your needs :) – Hassek Jul 27 '11 at 02:09
  • Thanks, this answered one of my questions. –  Jul 27 '11 at 02:25
2

Most collections have a constructor that allows you to set an initial capacity. I know that ArrayList also has a method that allows you to increase the capacity of the list to a set minimum number, ensureCapacity, and that setting these appropriately can have a significant effect on the time cost of using the collection.

Hovercraft Full Of Eels
  • 276,051
  • 23
  • 238
  • 346
  • Are there any widely used methods to determine a good initial capacity? Also, can we define "how" the expansion size is determined? I think it would be best to do a kind of exponential increase. –  Jul 27 '11 at 01:54
  • Half of this question was answered by Hassek-- "ArrayList gets (oldCapacity * 3)/2 + 1; every time it needs more space" –  Jul 27 '11 at 02:26
0

I'm assuming what you're trying to differentiate between is using a LinkedList and an ArrayList.

And judging from your question, it looks like you care about the functions add and search.

If you're doing a lot of one off adding, LinkedList is going to be faster since it always has an O(1) cost for adds, while an array has to double periodically. Though as @bdares pointed out, you could just specify a large initial capacity, though you could end up with a lot of wasted memory doing this.

As far as contains goes, an ArrayList will be faster due to cache locality. Though both employ a linear search, the ArrayList will loop faster.

Might I suggest that if you don't care about the order you retrieve things, to go with a HashMap if you're looking to do a lot of contains calls. This will be significantly faster.

themaestro
  • 11,712
  • 19
  • 51
  • 72
  • Actually, I think the OP is curious about the performance of lists vs raw arrays. – Perception Jul 27 '11 at 01:52
  • Thanks, this helps a lot. I have been debating what to turn my standard array into. I tried the ArrayList and I have been toying around with other implementations. So yea-- I have been debating LinkedList and others as well. Thanks~ –  Jul 27 '11 at 01:52
0

This sounds like premature optimization to me (unless you're coding for mobile or very underpowered hardware). Short answer: always use ArrayList unless you have a very clear reason not to.

You're no doubt going to get responses talking about the costs of resizing, initial allocation sizes, etc... but in reality, loading / manipulating 60k strings is absolute peanuts in terms of processing time on today's hardware. Many old-school java people still have hangovers from the days in which object allocation and general memory operations were super slow.

In general, you can almost always get at least a slight performance boost by rolling your own implementation that is more "aware" of your problem domain than Java.util is, but the effort is rarely worth it. I'd just start with an ArrayList sized to say 60k elements (which is also absolute peanuts in terms of memory usage).

I recently worked on a project which managed complex data structures of 1-2 GB worth of millions of strings, and the standard out-of-the-box ArrayList and HashMap were more than sufficient.

jkraybill
  • 3,248
  • 23
  • 31
  • Actually, this is for Android. I have a working application and I have entered a 'refactor and optimize' stage. I didn't tag the problem as an Android problem because I thought it's pretty general but the fact it's for mobile seems like it could sway opinions. Nice post~ has me thinking a lot –  Jul 27 '11 at 02:20
  • Ah, well in that case, optimize the living phooey out of it. For most classes of usage, though, ArrayList doesn't incur much more overhead than a raw array (when properly sized); the logic that deals with the many strings is quite possibly a better candidate for optimization. Watch out for the default re-sizing strategy AL uses, though; if your code can't know in advance the numbers of elements it's dealing with, you may be better rolling your own re-sizing strategy. There are also other optimizations you can use if the list is read-only, like not resizing at all but just creating a 2nd array. – jkraybill Jul 27 '11 at 02:28
  • 1
    I can't know the exact size in advance but I can get a general idea depending on which activity is using class-- I know roughly if the file will be large or small. I may be able to add in a default size parameter to the constructor and give an educated guess when I instantiate the object. edit: I suppose I could tack the size of the file on the first line of the file.. that feels sloppy though-- looked down upon? –  Jul 27 '11 at 02:55
  • Not if doing that is a) cheap and b) helps make your mobile app run faster. Lots of protocols (e.g. HTTP) use such a mechanism. – jkraybill Jul 27 '11 at 04:07
  • Thanks-- I think I should go with that. –  Jul 27 '11 at 04:12