204

As I was programming, I haven't seen an instance where an array is better for storing information than another form thereof. I had indeed figured the added "features" in programming languages had improved upon this and by that replaced them. I see now that they aren't replaced but rather given new life, so to speak.

So, basically, what's the point of using arrays?

This is not so much why do we use arrays from a computer standpoint, but rather why would we use arrays from a programming standpoint (a subtle difference). What the computer does with the array was not the point of the question.

casperOne
  • 70,959
  • 17
  • 175
  • 239
Xesaniel
  • 62
  • 3
  • 5
  • 14
  • 2
    Why not considering what the computer does with array? We have a house numbering system because we have **STRAIGHT** streets. So is it for arrays. – lcn Aug 28 '13 at 05:04
  • What "*other data structures*" or "*another form*" do you mean? And for what purpose? – tevemadar Nov 02 '19 at 10:54

4 Answers4

784

Time to go back in time for a lesson. While we don't think about these things much in our fancy managed languages today, they are built on the same foundation, so let's look at how memory is managed in C.

Before I dive in, a quick explanation of what the term "pointer" means. A pointer is simply a variable that "points" to a location in memory. It doesn't contain the actual value at this area of memory, it contains the memory address to it. Think of a block of memory as a mailbox. The pointer would be the address to that mailbox.

In C, an array is simply a pointer with an offset, the offset specifies how far in memory to look. This provides O(1) access time.

  MyArray   [5]
     ^       ^
  Pointer  Offset

All other data structures either build upon this, or do not use adjacent memory for storage, resulting in poor random access look up time (Though there are other benefits to not using sequential memory).

For example, let's say we have an array with 6 numbers (6,4,2,3,1,5) in it, in memory it would look like this:

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================

In an array, we know that each element is next to each other in memory. A C array (Called MyArray here) is simply a pointer to the first element:

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
   ^
MyArray

If we wanted to look up MyArray[4], internally it would be accessed like this:

   0     1     2     3     4 
=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
                           ^
MyArray + 4 ---------------/
(Pointer + Offset)

Because we can directly access any element in the array by adding the offset to the pointer, we can look up any element in the same amount of time, regardless of the size of the array. This means that getting MyArray[1000] would take the same amount of time as getting MyArray[5].

An alternative data structure is a linked list. This is a linear list of pointers, each pointing to the next node

========    ========    ========    ========    ========
| Data |    | Data |    | Data |    | Data |    | Data |
|      | -> |      | -> |      | -> |      | -> |      | 
|  P1  |    |  P2  |    |  P3  |    |  P4  |    |  P5  |        
========    ========    ========    ========    ========

P(X) stands for Pointer to next node.

Note that I made each "node" into its own block. This is because they are not guaranteed to be (and most likely won't be) adjacent in memory.

If I want to access P3, I can't directly access it, because I don't know where it is in memory. All I know is where the root (P1) is, so instead I have to start at P1, and follow each pointer to the desired node.

This is a O(N) look up time (The look up cost increases as each element is added). It is much more expensive to get to P1000 compared to getting to P4.

Higher level data structures, such as hashtables, stacks and queues, all may use an array (or multiple arrays) internally, while Linked Lists and Binary Trees usually use nodes and pointers.

You might wonder why anyone would use a data structure that requires linear traversal to look up a value instead of just using an array, but they have their uses.

Take our array again. This time, I want to find the array element that holds the value '5'.

=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
   ^     ^     ^     ^     ^   FOUND!

In this situation, I don't know what offset to add to the pointer to find it, so I have to start at 0, and work my way up until I find it. This means I have to perform 6 checks.

Because of this, searching for a value in an array is considered O(N). The cost of searching increases as the array gets larger.

Remember up above where I said that sometimes using a non sequential data structure can have advantages? Searching for data is one of these advantages and one of the best examples is the Binary Tree.

A Binary Tree is a data structure similar to a linked list, however instead of linking to a single node, each node can link to two children nodes.

         ==========
         |  Root  |         
         ==========
        /          \ 
  =========       =========
  | Child |       | Child |
  =========       =========
                  /       \
            =========    =========
            | Child |    | Child |
            =========    =========

 Assume that each connector is really a Pointer

When data is inserted into a binary tree, it uses several rules to decide where to place the new node. The basic concept is that if the new value is greater than the parents, it inserts it to the left, if it is lower, it inserts it to the right.

This means that the values in a binary tree could look like this:

         ==========
         |   100  |         
         ==========
        /          \ 
  =========       =========
  |  200  |       |   50  |
  =========       =========
                  /       \
            =========    =========
            |   75  |    |   25  |
            =========    =========

When searching a binary tree for the value of 75, we only need to visit 3 nodes ( O(log N) ) because of this structure:

  • Is 75 less than 100? Look at Right Node
  • Is 75 greater than 50? Look at Left Node
  • There is the 75!

Even though there are 5 nodes in our tree, we did not need to look at the remaining two, because we knew that they (and their children) could not possibly contain the value we were looking for. This gives us a search time that at worst case means we have to visit every node, but in the best case we only have to visit a small portion of the nodes.

That is where arrays get beat, they provide a linear O(N) search time, despite O(1) access time.

This is an incredibly high level overview on data structures in memory, skipping over a lot of details, but hopefully it illustrates an array's strength and weakness compared to other data structures.

Chris Tang
  • 544
  • 7
  • 14
FlySwat
  • 160,042
  • 69
  • 241
  • 308
  • 1
    @Jonathan: You updated the diagram to point to the 5th element but you also changed MyArray[4] to MyArray[5] so it is still incorrect, change the index back to 4 and keep the diagram as-is and you should be good. – Robert Gamble Dec 25 '08 at 06:26
  • 54
    This is what bugs me about "community wiki" this post is worth "proper" rep – Quibblesome Dec 25 '08 at 18:18
  • This was a very nice answer, very well thought out and explained. I'm sure I am not the only one who has benefited from this. Very simply, great work. – Xesaniel Dec 26 '08 at 04:46
  • 9
    Nice answer. But the tree you describe is a binary search tree - a binary tree is just a tree where every node has at most two children. You can have a binary tree with the elements in any order. The binary search tree is organized as you describe. – gnud Jan 02 '09 at 20:37
  • 1
    Good explanation, but I can't help to nitpick... if you are allowed to reorder the items into a binary search tree, why can't you reorder the elements in the array so a binary search would work in it, too? You might go into more detail regarding O(n) insert/delete for a tree, but O(n) for an array. – markets Jan 03 '09 at 02:01
  • Alex B seem to have messed up the arrows. after his edit, the arrows have looked strange – Johannes Schaub - litb Jan 08 '09 at 19:10
  • @Mark Santesson: And an ordered array never gets unbalanced! Okay, insertion time is somewhat slower when inserting into an array than into a tree. But if you use pointer-arrays if the elements intself are big then you should get the best compromise in comparison of look-up time, random-access and insertion time. – mmmmmmmm Jun 13 '09 at 14:20
  • Since you were originally comparing arrays with linked lists, I would've stayed with that comparison - lists have constant-time prepend operations, for instance. – Xiong Chiamiov Aug 25 '10 at 21:19
  • 2
    Isn't the binary tree representation an O(log n) because the access time increases logarithmically in relation to the size of the data set? – Evan Plaice Feb 14 '11 at 11:38
74

For O(1) random access, which can not be beaten.

jason
  • 220,745
  • 31
  • 400
  • 507
  • 1
    Would you care to elaborate please? – Xesaniel Dec 25 '08 at 01:16
  • 6
    On which point? What is O(1)? What is random access? Why can't it be beaten? Another point? – jason Dec 25 '08 at 01:24
  • O(1) random access, is this the O(N) as described in Holland's reply? – Xesaniel Dec 25 '08 at 02:03
  • 3
    O(1) means constant time, for example if you want to get the n-esim element of an array, you just access it directly through its indexer (array[n-1]), with a linked list for example, you have to find the head, and then go to the next node sequentially n-1 times which is O(n), linear time. – Christian C. Salvadó Dec 25 '08 at 02:04
  • 8
    Big-O notation describes how the speed of an algorithm varies based on the size of its input. An O(n) algorithm will take twiceish as long to run with twice as many items and 8ish times as long to run with 8 times as many items. In other words the speed of an O(n) algorithm varies with the [cont...] – Gareth Dec 25 '08 at 02:06
  • 8
    size of its input. O(1) implies that the size of the input ('n') doesn't factor into the speed of the algorithm, it's a constant speed regardless of the input size – Gareth Dec 25 '08 at 02:07
  • 1
    O(1) random access means that it, for any k, it takes a constant amount of time to access the kth element of array, regardless of the length N of the array. When Holland refers to O(N) he is referring to the time it takes to search for a specific element in an array, assuming it is in random order. – jason Dec 25 '08 at 02:12
  • i like your answer, since it just explains what was asked for :) – Johannes Schaub - litb Dec 25 '08 at 10:26
  • 9
    I see your O(1), and raise you O(0). – Chris Conway Dec 26 '08 at 04:55
  • I do appreciate answers that satisfy the original question with the least amount of words. But if I understand correctly then we are talking about C arrays, not C++ vectors or Java ArrayLists. So you should add that their size is constant and specified at compile time. – wilhelmtell Jan 01 '09 at 07:08
  • I mean, there are the kinds of arrays that grow dynamically, but they are a completely different kind of beast than the C arrays. C arrays don't have those obscure expensive operations in the background you aren't aware of, rare as they may be. – wilhelmtell Jan 01 '09 at 07:15
  • C arrays, when compared to dynamically growing arrays just block you from exposing yourself to expensive implicit growth. Dynamic arrays have advantages in every other respect, but they may be occasionally expensive on insertions if you are not aware of the implicit growth feature. – wilhelmtell Jan 01 '09 at 07:20
  • @Jason: Random array access is physically O(log N) in the best case. You can treat it as O(1) only if you are ignoring an abstraction level. –  Oct 20 '11 at 11:26
24

Not all programs do the same thing or run on the same hardware.

This is usually the answer why various language features exist. Arrays are a core computer science concept. Replacing arrays with lists/matrices/vectors/whatever advanced data structure would severely impact performance, and be downright impracticable in a number of systems. There are any number of cases where using one of these "advanced" data collection objects should be used because of the program in question.

In business programming (which most of us do), we can target hardware that is relatively powerful. Using a List in C# or Vector in Java is the right choice to make in these situations because these structures allow the developer to accomplish the goals faster, which in turn allows this type of software to be more featured.

When writing embedded software or an operating system an array may often be the better choice. While an array offers less functionality, it takes up less RAM, and the compiler can optimize code more efficiently for look-ups into arrays.

I am sure I am leaving out a number of the benefits for these cases, but I hope you get the point.

Jason Jackson
  • 16,436
  • 8
  • 43
  • 73
  • 4
    Ironically, in Java you should use an ArrayList (or a LinkedList) instead of a Vector. This is to do with a vector being synchronised which is usually unnecessary overhead. – ashirley Jan 05 '09 at 11:02
1

A way to look at advantages of arrays is to see where is the O(1) access capability of arrays is required and hence capitalized:

  1. In Look-up tables of your application (a static array for accessing certain categorical responses)

  2. Memoization (already computed complex function results, so that you don't calculate the function value again, say log x)

  3. High Speed computer vision applications requiring image processing (https://en.wikipedia.org/wiki/Lookup_table#Lookup_tables_in_image_processing)

priya khokher
  • 492
  • 5
  • 12