2

I know that all arrays in .net are limited to 2 GB, under this premise, I try not to allocate more that n = ((2^31) - 1) / 8 doubles in an array. Nevertheless, that number of elements still doesn't seem to be valid. Anyone knows how can I determine at run time the maximum number of elements given sizeof(T)?

I know that whatever quantity approaching that number is just a lot of elements but, for all intents and purposes, let's say I need it.

Note: I'm in a 64-bit environment, with a target platform for my application of AnyCPU, and at least 3100 MB free in RAM.

Update: Thank you all for your contributions and sorry I was so quiet. I apologise for the inconvenience. I have not been able to rephrase my question but I can add that, what I am looking for is solving something like this:

template <class T>
array<T>^ allocateAnUsableArrayWithTheMostElementsPossible(){
    return gcnew array<T>( ... );
}

The results in my own answer are kinda satisfactory but not good enough. Furthermore, I haven't test it on another machine (Kind of hard finding another machine with more than 4 GB). Besides, I have been doing some research on my own and it seems there's no cheap way to calculate this at run time. Anyhow, that was just a plus, none of the user of what-I-am-trying-to-accomplish can expect to use the feature I am trying to implement without having the capacity.

So, in other words, I just want to understand why the maximum number of elements of an array don't add up to 2GB ceteris paribus. A top maximum is all I need for now.

Anzurio
  • 15,816
  • 3
  • 35
  • 49
  • @Abel, I appreciate your help, I've reread my post over and I can't rephrase it. What I am looking for is the maximum number of elements in an array in .NET. – Anzurio Dec 05 '09 at 03:15
  • Finding the amount of consecutive bytes, or finding the maximum available non-consecutive bytes you can do using my code (though it is rather costly). Integer-divide the result with `sizeof(T)`. I didn't know you were on C++.NET, but it shouldn't be too hard to translate C# to C++. I'm sorry that I don't understand your own answer and didn't see how it fit the story, but I'll try again. – Abel Dec 05 '09 at 14:52
  • The prev. comment was about an earlier attempt to answer, which explains how to find the size of the maximum available memory (either contiguous or scattered). See this entry in the history: http://stackoverflow.com/revisions/1842456/list#rev363bf0ff-a736-49de-90f9-201d215388bc – Abel Dec 10 '09 at 12:15

5 Answers5

2

Update: answer COMPLETELY rewritten. Original answer contained methods to find the largest possible addressable array on any system by divide and conquer, see history of this answer if you're interested. The new answer attempts to explain the 56 bytes gap.

In his own answer, AZ explained that the maximum array size is limited to less then the 2GB cap and with some trial and error (or another method?) finds the following (summary):

  • If the size of the type is 1, 2, 4 or 8 bytes, the maximum occupiable size is 2GB - 56 bytes;
  • If the size of the type is 16 bytes, the max is 2GB - 48 bytes;
  • If the size of the type is 32 bytes, the max is 2GB - 32 bytes.

I'm not entirely sure about the 16 bytes and 32 bytes situations. The total available size for the array might be different if it's an array of structs or a build-in type. I'll emphasize on 1-8 bytes type size (of which I'm not that sure either, see conclusion).

Data layout of an array

To understand why the CLR does not allow exactly 2GB / IntPtr.Size elements we need to know how an array is structured. A good starting point is this SO article, but unfortunately, some of the information seems false, or at least incomplete. This in-depth article on how the .NET CLR creates runtime objects proved invaluable, as well as this Arrays Undocumented article on CodeProject.

Taking all the information in these articles, it comes down to the following layout for an array in 32 bit systems:

Single dimension, built-in type
SSSSTTTTLLLL[...data...]0000
^ sync block
    ^ type handle
        ^ length array
                        ^ NULL 

Each part is one system DWORD in size. On 64 bit windows, this looks as follows:

Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLL[...data...]00000000
^ sync block
        ^ type handle
                ^ length array
                                    ^ NULL 

The layout looks slightly different when it's an array of objects (i.e., strings, class instances). As you can see, the type handle to the object in the array is added.

Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
^ sync block
        ^ type handle
                ^ length array
                        ^ type handle array element type
                                            ^ NULL 

Looking further, we find that a built-in type, or actually, any struct type, gets its own specific type handler (all uint share the same, but an int has a different type handler for the array then a uint or byte). All arrays of object share the same type handler, but have an extra field that points to the type handler of the objects.

A note on struct types: padding may not always be applied, which may make it hard to predict the actual size of a struct.

Still not 56 bytes...

To count towards the 56 bytes of the AZ's answer, I have to make a few assumptions. I assume that:

  1. the syncblock and type handle count towards the size of an object;
  2. the variable holding the array reference (object pointer) counts towards the size of an object;
  3. the array's null terminator counts towards the size of an object.

A syncblock is placed before the address the variable points at, which makes it look like it's not part of the object. But in fact, I believe it is and it counts towards the internal 2GB limit. Adding all these, we get, for 64 bit systems:

ObjectRef + 
Syncblock +
Typehandle +
Length +
Null pointer +
--------------
40  (5 * 8 bytes)

Not 56 yet. Perhaps someone can have a look with Memory View during debugging to check how the layout of an array looks like under 64 bits windows.

My guess is something along these lines (take your pick, mix and match):

  • 2GB will never be possible, as that is one byte into the next segment. The largest block should be 2GB - sizeof(int). But this is silly, as mem indexes should start at zero, not one;

  • Any object larger then 85016 bytes will be put on the LOH (large object heap). This may include an extra pointer, or even a 16 byte struct holding LOH information. Perhaps this counts towards the limit;

  • Aligning: assuming the objectref does not count (it is in another mem segment anyway), the total gap is 32 bytes. It's very well possible that the system prefers 32 byte boundaries. Take a new look at the memory layout. If the starting point needs to be on a 32 byte boundary, and it needs room for the syncblock before it, the syncblock will end up in the end of the first 32 bytes block. Something like this:

      XXXXXXXXXXXXXXXXXXXXXXXXSSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
    

    where XXX.. stands for skipped bytes.

  • multi dimensional arrays: if you create your arrays dynamically with Array.CreateInstance with 1 or more dimensions, a single dim array will be created with two extra DWORDS containing the size and the lowerbound of the dimension (even if you have only one dimension, but only if the lowerbound is specified as non-zero). I find this highly unlikely, as you would probably have mentioned this if this were the case in your code. But it would bring the total to 56 bytes overhead ;).

Conclusion

From all I gathered during this little research, I think that the Overhead + Aligning - Objectref is the most likely and most fitting conclusion. However, a "real" CLR guru might be able to shed some extra light on this peculiar subject.

None of these conclusions explain why 16 or 32 byte datatypes have a 48 and 32 byte gap respectively.

Thanks for a challenging subject, learned something along my way. Perhaps some people can take the downvote off when they find this new answer more related to the question (which I originally misunderstood, and apologies for the clutter this may have caused).

Community
  • 1
  • 1
Abel
  • 52,738
  • 19
  • 137
  • 227
  • The questioner is on x64 so this issue is *not* process address space headroom, it's .NET's limits after those issues have been removed. The questioner may or may not be comfortable with inducing GC into his workflow. – Ruben Bartelink Dec 04 '09 at 10:02
  • Well, the question title says *"[..]ACTUAL maximum [..] a .NET array [..] can be allocated?"*. With *actual*, I understand it is what you *can* use in your array, not what the system possibly has but that you cannot use. The code above gives, regardless the architecture, what **actually is available** to you. I don't see why that would not be an answer to the question. – Abel Dec 04 '09 at 11:34
  • Careful, I believe OutOfMemoryException's are uncatchable in future .Net versions. – user7116 Dec 04 '09 at 15:23
  • Yes and no. OOMs are not always uncatchable. In this case, because you *reserve* the memory, but not *use* it, it is easy to recover. Future versions of .NET have no plans on disallowing catching OOM. The problem is of a different kind: an OOM is usually very hard to recover from. I don't say the code is a good idea in practice, but it answers the "How" of the question. – Abel Dec 05 '09 at 14:40
  • To the down-voters: I don't mind, but please be so cordial to explain what's the misinformation in my code. – Abel Dec 05 '09 at 14:41
  • Rewritten answer completely to reflect my new understanding of the question. – Abel Dec 09 '09 at 01:54
  • Thank you very much Abel! I'm accepting your answer. It really helped me a lot and hope it didn't create to much inconvenience! :) – Anzurio Dec 09 '09 at 19:06
  • No inconvenience whatsoever, was a fun ride to find out these nitty gritty details about arrays. Any additional thoughts on the 16 or 24 bytes we couldn't fully explain? Have you tried Memory View of an array while debugging on 64 bits (just type the variable name in the memory view window)? – Abel Dec 10 '09 at 12:10
2

So, I ran a li'l program to find out some hard values and this is what I found:

  • Given a type T, f(sizeof(T)) = N + d

    • Where f is the real maximum size of an array of Ts.
    • N is the theoretical maximum size, that is: Int32::MaxValue / sizeof(T)
    • And d, is the difference between N and f(x).

Results:

  • f(1) = N - 56
  • f(2) = N - 28
  • f(4) = N - 14
  • f(8) = N - 7
  • f(16) = N -3
  • f(32) = N - 1

I can see that everytime the size dups, the difference between the real size and the theoretical size folds but not in powers of 2. Any ideas why?

Edit: d is amount of type T elements. To find d in bytes, do sizeof(T) * d.

Abel
  • 52,738
  • 19
  • 137
  • 227
Anzurio
  • 15,816
  • 3
  • 35
  • 49
  • Your li'l program sounds somewhat interesting (though OT), but even more if we can see it. *About difference:* `Int32.MaxValue` is an odd number. If `32` is the size in bytes of a value type in the last example, then `d(32) == Int32.MaxValue - (Int32.MaxValue / 32) * 32` which yields `31`, a power of two minus one for `d`. Btw, I have a hard time understanding how this is related, you asked for the *actual* max in your question, while your own answer is highly theoretical (and false to some extend, because it is implementation dependent: Mono (or even Micro?) have other maxes). – Abel Dec 04 '09 at 14:43
  • @abel, what's OT? I don't understand why this is not related and highly theoretical. I ask for the actual maximum "number of elements" (sorry that I quote this but in your comments you twice omitted it). I don't mean to be rude nor anything but, some of your comments kinda depicts me like a "duh questioner" which is not appreciated, or at least, so I feel. – Anzurio Dec 05 '09 at 03:33
  • @AZ, I very much apologize if you feel like that. I love this question, would I otherwise spend so much time? OT means "off topic", which is, of course, my opinion (not being native, I sometimes sound harsh, sorry). Either I didn't understand your code or I didn't understand your question, I wrote "OT" because I thought this was a side-track (an interesting one nonetheless). I figured you need to know how many bytes are available before you can divide that by the size of your elements, to find the total. I didn't see that in your code. Still, I'm wondering how to interpret it. What did I miss? – Abel Dec 05 '09 at 14:47
  • @Abel, I'm thrilled you liked my question. But in general, I think my question is simpler than you think. Say I have enough physical memory available (Right now 4 GB free), that'd make me think that I can easily allocate 2 GB in an 1D array, nevertheless, if I do something like new double[Int32::MaxValue / 8], it will throw a memory exception but (Int32::MaxValue / 8) - d, won't for d >= 7. That's 56 bytes shy of 2 GB. – Anzurio Dec 05 '09 at 16:52
  • And another note, just for sanity, I created two arrays of ((Int32::MaxValue / 8) - 7) doubles just to make sure that, when I tried the first time, I didn't run out of physical memory. Given that I could have these two arrays at the same time, made me think that the physical memory wasn't an issue :). – Anzurio Dec 05 '09 at 17:05
  • @AZ: you are absolutely right, I was totally on the wrong track. And I misunderstood `d`, which I took as amount of bytes, not `bytes * sizeof(T)`. I can explain both 56 bytes and 32 bytes cap, but not entirely satisfactorily. I'll need your help with the last details ;-). See my (again updated) answer. – Abel Dec 09 '09 at 00:32
  • Altogether I haven't come up with a magic function for you, but I've tried to explain how you could find the gap-size, which can help in building this function. Obviously, the conclusions may be different for different CLR functions, even between two updates of the same version. I tested everything with .NET CLR 3.5 SP1. – Abel Dec 09 '09 at 01:57
0

Your process space is limited to 2GB unless you're [compiled anycpu or x64] and running in an x64 process [on an x64 machine]. This is what you're probably actually running into. Calculating the headroom you have in the process is not an exact science by any means.

(Nitpickers corner: There is a /3GB switch and stacks of other edge cases which impact this. Also, the process needs to have virtual or physical space to be allocated into too. The point is that at the present time, most people will more often run into the OS per process limit than any .NET limit)

Ruben Bartelink
  • 55,135
  • 22
  • 172
  • 222
  • This was obviously a much more relevant answer prior to you editing in you x64 enviroment (I take it you've checked you're defiinitely being loaded as x64?) – Ruben Bartelink Dec 03 '09 at 15:34
  • 1
    @Ruben: As far as I'm aware, the CLR has a 2GB max object size limit regardless of whether you're running in 32-bit or 64-bit. Even if the OP has 3GB of free RAM, I'm not terribly surprised that they can't find 2GB of contiguous space to allocate an array. – LukeH Dec 03 '09 at 15:48
  • @Luke: My point was exactly that (and this was before x64 was mentioned by the OP) - the chances of there being 2GB free in a process is low (I was assuming a simple test and hence contiguity within the CLR process's Large Object Heap not being a concern) given there is only a max of 2GB of addressable space before putting in OS overhead, thread stacks and the CLR's usage. But in the question, he says he's x64 so assuming its an x64 process and there is free [phys + virtual] memory to meet the need, we're down to CLR constraints. My points were only relevant to x86. – Ruben Bartelink Dec 03 '09 at 16:00
  • Even with a 3 GB process, btw, your largest contiguous block is still something under 2 GB. Reason being, the libraries that are normally loaded to the top of the 2 GB space are still loaded there in 3 GB processes, thereby partitioning it into a chunk somewhat under 2 GB, and a chunk of about 1 GB. – DrPizza Dec 03 '09 at 16:13
  • @DrPizza: Nice, I didnt know that (and am glad I didnt!) I'm glad I put in the nitpickers corner though! – Ruben Bartelink Dec 03 '09 at 16:31
  • @Ruben: maybe not an exact science, but you can determine it for any particular situation rather easily, as I found out while experimenting. See my new answer for a code example of the "scientific" approach. – Abel Dec 03 '09 at 19:59
  • @Abel: My post is for x86 contexts. In x86 contexts, the limiting factor (assuming the .NET limit is >1.999 GB) is going to be the max contiguous block available to allocate, which is largely influenced by the headroom available in the process address space. This depends on what's used at any time. If you start a new thread, boom goes 1MB for the stack. If something else like say a IO completion port thread does something... If a GC happens, things might change. – Ruben Bartelink Dec 04 '09 at 09:58
  • 1
    What you're talking about is different. The questioner will not be able to say var myDoubles = new double[ MyMagicFunctionWhichAlwaysTellsMeHowMuchICanAllocate()]. Even if the right value was computed, there's a race condition in that the available space might be used up. Does this make any sense in terms of explaining my angle when saying that? – Ruben Bartelink Dec 04 '09 at 09:59
  • I see now what you're getting at. My (new) post, which you also commented, shows that you're better of using jagged arrays. You are not limited to a contiguous block of memory. Finding the maximum contiguous block of memory available is possible with Win32 API functions and rather trivial, but rather useless in the light of .NET. If you want to pass the 2GB limit, you need to split into jagged (or similar) arrays. You can use up all memory, far above the 2GB limit. With a contiguous block, this is hardly ever possible. – Abel Dec 04 '09 at 11:49
  • About the race condition: there's always a chance that others use up memory before you (re)claim it. My suggestion to the asker: let your array grow dynamically (see my answer) and don't GC at the OOM but just remove a few elements to have some room to play. – Abel Dec 04 '09 at 11:51
  • 2
    @Abel: Yes, jagged array cope with [relatively] low memory better and your example is interesting. But that's not the asker's context, he's on x64 and wants to understand exactly waht the .NET limit is *assuming he is on x64 and available process VM is not the bottleneck*. But I might be wrong. I'm sure @AZ will be stepping in with comments and votes any time now (his answer is the best IMO - its the only one anyone has voted for either!) – Ruben Bartelink Dec 04 '09 at 12:24
  • Missed that answer before. I don't see how it is related, but hopefully AZ comes around sometime/day to explain. If it is really that theoretical, the answer to the question becomes totally trivial, really, since we have to assume the 2GB object limit, take the object overhead, find the .NET internal mem alignment, and you're about done. But he mentions "at runtime", so theoretical limits are of little value then. Anyway, I think I'm totally lost at what the actual intend of the question is. – Abel Dec 04 '09 at 14:51
0

Update: my other answer contains the solution but I leave this in for the info about Mono, C#, the CLR links and the discussion thread

The maximum size of an array is limited by the size of an integer, not by the size of the objects it contains. But any object in .NET is limited to 2GB, period (thanks to Luke and see EDIT), which limits the total size of your array, which is the sum of the individual elements plus a bit of overhead.

The reason that it chokes your system is, well, the system's available memory. And the system of a win32 process only allows you to use 2GB of memory, of which your program and the CLR already use quite a bit even before you start your array. The rest you can use for your array:

int alot = 640000000;
byte[] xxx = new byte[1U << 31 - alot];

It depends on how your CLR is configured whether or not you run out of memory. For instance, under ASP.NET you are bound by default to 60% of the total available memory of the machine.

EDIT: This answer to a related post goes a bit deeper into the subject and the problems with 64 bit. It is possible on 64 bit systems, but only using workarounds. It points to this excellent blog post on the subject which explains BigArray<T>.

NOTE 1: other CLR's, i.e. Mono's, simply allow larger then 2GB objects.

NOTE 2: it is not the language that limits you. This compiles just fine in C#, but try and fine a machine that doesn't throw on it is a rather futuristic thought (and frankly, the field in the Array class holding the length is an int, which means this will always throw on 32 bit, but not necessarily, while extremely likely, on any 64 bit implementation):

int[] xxx = new int[0xFFFFFFFFFFFFFFFF];  // 2^64-1
Community
  • 1
  • 1
Abel
  • 52,738
  • 19
  • 137
  • 227
  • 1
    If its a ref type, that makes sense. Questioner has doubles, which are value types and get inlined into the array object – Ruben Bartelink Dec 03 '09 at 15:24
  • And even with references, you'll still be limited by the actual size taken up by the array - either 4*count or 8*count depending on the CLR. – Jon Skeet Dec 03 '09 at 15:26
  • 1
    @Abel: The size of the elements *does* limit the maximum number of elements, at least in theory. Although the current max number of elements is limited to `int.MaxValue`, you're also limited by .NET's 2GB max object size limit. So *theoretically* an array of `bool` or `byte` could have somewhere close to `2^31` elements before it hits the 2GB limit, whereas an array of `long` could only have a theoretical max of roughly `2^28` elements before it approaches 2GB. – LukeH Dec 03 '09 at 15:29
  • Agreed, you are. The question says "2GB is limit for .NET". This depends on the system. If the system is 32 bits, then yes. If it is 64 bits then no. And... when UAE (term) is enabled, more memory may be availalable, but I'm uncertain whether .NET supports UAE. – Abel Dec 03 '09 at 15:29
  • @Luke: is that really correct? I thought the object size limit is dependent on CLR and bitness of the system. – Abel Dec 03 '09 at 15:31
  • 1
    As far as I'm aware the max object size in .NET is restricted to 2GB. This was certainly the case in the past and I'm not aware of any recent changes, although I'm no expert and it's difficult to find any definitive information. – LukeH Dec 03 '09 at 15:38
  • @Luke: this seems rather definitive: http://stackoverflow.com/questions/1087982/single-objects-still-limited-to-2-gb-in-size-in-clr-4-0/1088044#1088044 (added it to my answer as well) – Abel Dec 03 '09 at 15:42
  • On note 2: It might compile but I am thinking it might throw an overflow exception. – Anzurio Dec 03 '09 at 16:04
  • @AZ: it _might_ throw an overflow? The system with that much memory has yet to be invented! ;-) – Abel Dec 03 '09 at 17:10
  • @Luke: just a note, while reading on on the subject, I found that the 2GB object size limit *does not count for arrays of ref types*. You won't need too much magic to change arrays of doubles into an array of refs (while keeping speed or computability, suggestion: jagged arrays). See also my new answer. – Abel Dec 03 '09 at 20:01
  • @Abel: The 2GB limit still applies to arrays of ref types. Each object in the array can (theoretically) be 2GB in size, but the array object itself is also limited to 2GB, so the max number of elements is limited by the system's reference size. So on 32-bit an array of refs can have up to `2^29` elements, and up to `2^28` elements on 64-bit. – LukeH Dec 03 '09 at 23:20
  • @Luke: maybe we misunderstood each other. Indeed, the references themselves count towards the limit. Using jagged arrays, you can have unlimited space. The referenced object can have a certain size, this does not count towards the array limit. The `total size == array items * pointer size * object size` which will exceed the 2GB limit. – Abel Dec 04 '09 at 11:43
-1

You also need to add the pointer size (System.IntPtr.Size) to each sizeof(T) to account for the pointer to the object in any given array element.

x0n
  • 47,695
  • 5
  • 84
  • 110
  • Not sure what you're driving at. If its a Value Type (OP's talking doubles), there isnt such an overhead on a per-item basis - though there's a .Length and other small bits of bookkeeping baggage for the array, but none of that is on a per-item basis. If its a ref type, then you're storing units of `IntPtr.Size`. Can you clarify? – Ruben Bartelink Dec 03 '09 at 15:23
  • Sorry, I wasn't around - I think I misunderstood the question. I didn't notice that the allocations were Doubles (value types). – x0n Dec 04 '09 at 16:08