21

I'm fairly familiar with the layout of objects on the heap in HotSpot, but not so much for Android.

For example, in a 32-bit HotSpot JVM, an object on the heap is implemented as an 8-byte header, followed by the object's fields (one byte for boolean, four bytes for a reference, and everything else as expected), laid out in some specific order (with some special rules for fields from superclasses), and padded out to a multiple of 8 bytes.

I've done some research, but I can't find any Android-specific information.

(I'm interested in optimizing some extremely widely used data structures to minimize memory consumption on Android.)

Sergey Ponomarev
  • 2,460
  • 1
  • 28
  • 39
Louis Wasserman
  • 172,699
  • 23
  • 307
  • 375
  • Have you read/seen this: https://sites.google.com/site/io/dalvik-vm-internals/ – Morrison Chang Feb 06 '13 at 21:09
  • I hadn't, but that doesn't have the information I want. It talks about the layout of the APK and the bytecode, but not about the objects on the heap. – Louis Wasserman Feb 06 '13 at 21:18
  • Dalvik seems mostly identical: 4 bytes class ptr, 4 bytes lock, then data: https://android.googlesource.com/platform/dalvik/+/master/vm/oo/Object.h – Chris Feb 06 '13 at 21:27
  • @Chris: I can't tell from that whether 1) a class with two `short` fields will get them both packed into one u4, 2) whether it's rounded up to a multiple of 8 bytes or a multiple of 4 bytes, etc. – Louis Wasserman Feb 07 '13 at 21:17
  • I think it's a "silly question": the whole *point* of an OS, of a JVM and of a garbage-collected environment is to *HIDE* implementation details like the byte layout of objects on the heap! Nevertheless, you can always "Use the Source" to answer *ANY question: http://code.google.com/p/dalvik/ – paulsm4 Feb 08 '13 at 21:58
  • PS: You can also send an e-mail to Dan Bornstein. I'll bet the odds are pretty good he'd reply - honest! – paulsm4 Feb 08 '13 at 22:07
  • 1
    @paulsm4: And I approve of hiding implementation details like that, but let's suppose you could rewrite Android's implementation of `java.util.HashSet` to reduce memory consumption per element by, say, 20%. For resource-constrained environments like mobile phones, for a massively common data structure like `HashSet`, that's a huge win. Knowledge of details like alignment constraints is massively helpful in optimizing data structures to minimize memory consumption. – Louis Wasserman Feb 08 '13 at 22:08
  • I applaud "curiousity". But I also agree with Don Knuth's observation that "Premature optimization is the root of all evil" ;) SUGGESTION: look at the source, and/or contact the original architect. It sounds like you're astute enough to understand the source, and I think Mr. Bornstein might be willing to share his insights. IMHO... – paulsm4 Feb 08 '13 at 22:14
  • @LouisWasserman, Do you have any knowledge how the members of java class are laid in Android memory. Thea header is quite simple but what re the rules or logic of the structures referenced from `instanceData`. – Damian Leszczyński - Vash Jun 23 '14 at 10:39

1 Answers1

19

dalvik/vm/oo/Object.h is your friend here. The comment for struct Object says:

/*
 * There are three types of objects:
 *  Class objects - an instance of java.lang.Class
 *  Array objects - an object created with a "new array" instruction
 *  Data objects - an object that is neither of the above
 *
 * We also define String objects.  At present they're equivalent to
 * DataObject, but that may change.  (Either way, they make some of the
 * code more obvious.)
 *
 * All objects have an Object header followed by type-specific data.
 */

java.lang.Class objects are special; their layout is defined by the ClassObject struct in Object.h. Array objects are simple:

struct ArrayObject : Object {
    /* number of elements; immutable after init */
    u4              length;

    /*
     * Array contents; actual size is (length * sizeof(type)).  This is
     * declared as u8 so that the compiler inserts any necessary padding
     * (e.g. for EABI); the actual allocation may be smaller than 8 bytes.
     */
    u8              contents[1];
};

For arrays, the widths are in vm/oo/Array.cpp. Booleans are width 1, objects have sizeof(Object*) length (usually 4), and all other primitive types have their expected (packed) length.

Data objects are really simple:

/*
 * Data objects have an Object header followed by their instance data.
 */
struct DataObject : Object {
    /* variable #of u4 slots; u8 uses 2 slots */
    u4              instanceData[1];
};

The layout of a DataObject (all non-Class class instances) is governed by computeFieldOffsets in vm/oo/Class.cpp. According to the comment there:

/*
 * Assign instance fields to u4 slots.
 *
 * The top portion of the instance field area is occupied by the superclass
 * fields, the bottom by the fields for this class.
 *
 * "long" and "double" fields occupy two adjacent slots.  On some
 * architectures, 64-bit quantities must be 64-bit aligned, so we need to
 * arrange fields (or introduce padding) to ensure this.  We assume the
 * fields of the topmost superclass (i.e. Object) are 64-bit aligned, so
 * we can just ensure that the offset is "even".  To avoid wasting space,
 * we want to move non-reference 32-bit fields into gaps rather than
 * creating pad words.
 *
 * In the worst case we will waste 4 bytes, but because objects are
 * allocated on >= 64-bit boundaries, those bytes may well be wasted anyway
 * (assuming this is the most-derived class).
 *
 * Pad words are not represented in the field table, so the field table
 * itself does not change size.
 *
 * The number of field slots determines the size of the object, so we
 * set that here too.
 *
 * This function feels a little more complicated than I'd like, but it
 * has the property of moving the smallest possible set of fields, which
 * should reduce the time required to load a class.
 *
 * NOTE: reference fields *must* come first, or precacheReferenceOffsets()
 * will break.
 */

So, superclass fields come first (as usual), followed by reference-type fields, followed by a single 32-bit field (if available, and if padding is required because there's an odd number of 32-bit reference fields) followed by 64-bit fields. Regular 32-bit fields follow. Note that all fields are 32-bit or 64-bit (shorter primitives are padded). In particular, at this time, the VM does not store byte/char/short/boolean fields using less than 4 bytes, though it certainly could support this in theory.

Note that all of this is based on reading the Dalvik source code as of commit 43241340 (Feb 6, 2013). Since this aspect of the VM doesn't appear to be publically documented, you should not rely on this to be a stable description of the VM's object layout: it may change over time.

nneonneo
  • 154,210
  • 32
  • 267
  • 343