62

I noticed that string literals have very different addresses in memory than other constants and variables (Linux OS): they have many leading zeroes (not printed).

Example:

const char *h = "Hi";
int i = 1;
printf ("%p\n", (void *) h);
printf ("%p\n", (void *) &i);

Output:

0x400634
0x7fffc1ef1a4c

I know they are stored in the .rodata part of the executable. Is there a special way the OS handles it afterwards, so the literals end up in a special area of memory (with leading zeroes)? Are there any advantages of that memory location or is there something special about it?

smci
  • 26,085
  • 16
  • 96
  • 138
Noidea
  • 1,265
  • 8
  • 16
  • http://stackoverflow.com/questions/4560720/why-does-the-stack-address-grow-towards-decreasing-memory-addresses – John Zwinck Nov 18 '16 at 12:55
  • 5
    It's all up to the operating system where it loads the code and where it allocates the stack. – Some programmer dude Nov 18 '16 at 12:55
  • 8
    Obviously implementation-specified, but RO data (your literal) is often loaded into separate page(s) marked for protected-mode exception-on-write triggering. Meaning: writing to it raises a structured exception. – WhozCraig Nov 18 '16 at 13:00
  • 2
    Is your question about specificially about Linux, hosted systems (with OS) in general, or also including freestanding systems (typically embedded with no OS)? If Linux only, you should add `[linux]` tag. If something else, please clarify. – user694733 Nov 18 '16 at 13:01
  • @user694733 mm.. I don't really know how huge the difference is, so I added linux. – Noidea Nov 18 '16 at 13:03
  • @WhozCraig thanks! Does it mean that this place likely was reserved even before I started (even wrote!) my program? – Noidea Nov 18 '16 at 13:07
  • @user694733 even with freestanding, you'll normally have .rodata near .code, not near .data (excluding Harvard architectures, I guess). – domen Nov 18 '16 at 17:10
  • 4
    Your question is back to front. You will find that *all* addresses have 'many leading zeros' *except* addresses of local variables, which are on the stack, which is allocated in your case from the top of the address space down. – user207421 Nov 18 '16 at 21:49
  • @EJP thanks, I didn't know it. – Noidea Nov 18 '16 at 21:53
  • 1
    To have your string more like your `int i = 1`, you may want to try `char h[] = "Hi"` – Hagen von Eitzen Nov 20 '16 at 10:38

5 Answers5

73

Here's how process memory is laid out on Linux (from http://www.thegeekstuff.com/2012/03/linux-processes-memory-layout/):

Linux process memory layout

The .rodata section is a write-protected subsection of the Initialized Global Data block. (A section which ELF executables designate .data is its writable counterpart for writable globals initialized to nonzero values. Writable globals initialized to zeros go to the .bss block. By globals here I mean global variables and all static variables regardless of placement.)

The picture should explain the numerical values of your addresses.

If you want to investigate further, then on Linux you can inspect the /proc/$pid/maps virtual files which describe the memory layout of running processes. You won't get the reserved (starting with a dot) ELF section names, but you can guess which ELF section a memory block originated from by looking at its memory protection flags. For example, running

$ cat /proc/self/maps #cat's memory map

gives me

00400000-0040b000 r-xp 00000000 fc:00 395465                             /bin/cat
0060a000-0060b000 r--p 0000a000 fc:00 395465                             /bin/cat
0060b000-0060d000 rw-p 0000b000 fc:00 395465                             /bin/cat
006e3000-00704000 rw-p 00000000 00:00 0                                  [heap]
3000000000-3000023000 r-xp 00000000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
3000222000-3000223000 r--p 00022000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
3000223000-3000224000 rw-p 00023000 fc:00 3026487                        /lib/x86_64-linux-gnu/ld-2.19.so
3000224000-3000225000 rw-p 00000000 00:00 0
3000400000-30005ba000 r-xp 00000000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
30005ba000-30007ba000 ---p 001ba000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
30007ba000-30007be000 r--p 001ba000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
30007be000-30007c0000 rw-p 001be000 fc:00 3026488                        /lib/x86_64-linux-gnu/libc-2.19.so
30007c0000-30007c5000 rw-p 00000000 00:00 0
7f49eda93000-7f49edd79000 r--p 00000000 fc:00 2104890                    /usr/lib/locale/locale-archive
7f49edd79000-7f49edd7c000 rw-p 00000000 00:00 0
7f49edda7000-7f49edda9000 rw-p 00000000 00:00 0
7ffdae393000-7ffdae3b5000 rw-p 00000000 00:00 0                          [stack]
7ffdae3e6000-7ffdae3e8000 r--p 00000000 00:00 0                          [vvar]
7ffdae3e8000-7ffdae3ea000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

The first r-xp block definitely came from .text (executable code), the first r--p block from .rodata, and the following rw-- blocks from .bss and .data. (In between the heap and the stack block are blocks loaded from dynamically linked libraries by the dynamic linker.)


Note: To comply with the standard, you should cast the int* for "%p" to (void*) or else the behavior is undefined.

PSkocik
  • 52,186
  • 6
  • 79
  • 122
  • Thanks, that's usefull! But if I have mutiple processes this still happens. So it doesn't lay them out one after another, but takes all "Initialized Global Data" from multiple processes and stores it together? – Noidea Nov 18 '16 at 13:21
  • 7
    @Noidea Different processes have different address spaces. 0xDEADBEEF in one process is (usually) completely unrelated to 0xDEADBEEF in another. There are some obvious minor advantages to the above layout related to debugging and block growth (for the heap block in particular, although it's not a big deal to fragment the heap with mmap if it can't grow up anymore). Also, the actual mapped addresses will usually be somewhat random for security reasons. – PSkocik Nov 18 '16 at 13:40
  • 6
    @Noidea : Do not conflate physical addresses (corresponding to addresses at the RAM) with virtual memory addresses (addresses in the process). It is the job of the [memory management unit](https://en.wikipedia.org/wiki/Memory_management_unit) to convert virtual to physical and all addresses used by a process are translated via the MMU. Each process has its own MMU tables, managed by the OS. – Eric Towers Nov 18 '16 at 14:10
  • 1
    Several platforms' default linker scripts also merge `.rodata` with `.text`. – Simon Richter Nov 19 '16 at 17:05
15

That's because string literals have static storage duration. That is, they will live during the whole program. Such variables may be stored in a special memory location which is neither on the so called heap nor the stack. Hence the difference in addresses.

Armen Tsirunyan
  • 120,726
  • 52
  • 304
  • 418
7

Remember that where a pointer is is different from where a pointer points to. A more realistic (apples-to-apples) comparison would be

printf ("%p\n", (void *) &h);
printf ("%p\n", (void *) &i);

I suspect you will find that h and p have similar addresses. Or, another more-realistic comparison would be

static int si = 123;
int *ip = &si;
printf ("%p\n", (void *) h);
printf ("%p\n", (void *) ip);

I suspect you'll find that h and ip point to a similar region of memory.

Steve Summit
  • 29,350
  • 5
  • 43
  • 68
  • 3
    No, `h` is already a pointer-to-char, so `&h` does nothing useful. Writing `h` and `&i` is correct as then both are the addresses of the referred string and `int` respectively. – underscore_d Nov 18 '16 at 16:41
  • 1
    @underscore_d I think you completely misunderstood the question and my answer, then. There's nothing "correct" or "incorrect" about writing `h` and `&i`; the OP was merely puzzled abut why the actual addresses on his system were so different. My point was if you write `&h` and `&i`, or `h` and `ip`, you're likely to see more similar addresses, and this exercise will (hopefully) help you understand why the numbers in `h` and `&i` are so different. – Steve Summit Nov 18 '16 at 16:52
  • 3
    @SteveSummit Pointer to a string literal will be another stack variable. But I was wondering why address of a string literal is so different from addresses of stack variables. Not why the addresses of two stack variables are similar ;) – Noidea Nov 18 '16 at 17:16
  • @Noidea And now you know, from the other answers: because string literals are never stored on the stack. – Steve Summit Nov 18 '16 at 17:18
  • @SteveSummit well I already knew, that they are not on stack, because the addresses are so different. – Noidea Nov 18 '16 at 17:29
1

Consider that literals are read-only variables and as well, there is a concept of a literal pool. What the literal pool is is a collection of the program's unique literals, where duplicate constants are discarded as references are merged into one.

There is one literal pool for each source, and depending on the sophistication of the link / bind program, literal pools can be placed next to each other to create one .rodata.

There is also no guarantee that the literal pool is read-only protected. Language though compiler designs treat it as so.

Consider my code fragment. I could have

const char *cp="hello world";
const char *cp1="hello world";

The good compiler will recognize that within that source code, the read-only literals cp, cp1,are pointing to identical strings, and will make cp1 point to cp's literal, discarding the second one.

One more point. The literal pool may be a multiple of 256bytes or different value. If the pool data is less than 256 bytes, the slack will be padded with hex zeros.

Different compilers, follow common development standards, permitting a module compiled with C, to be linked with a module compiled with assembly language or other language. The two literal pools are placed consecutively in .rodata.

0
printf ("%p\n", h); // h is the address of "Hi", which is in the rodata or other segments of the application.
printf ("%p\n", &i); // I think "i" is not a global variable, so &i is in the stack of main. The stack address is by convention in the top area of the memory space of the process.
user2760751
  • 301
  • 1
  • 4
  • 1
    This does not appear to answer the question that was asked. As a reminder, the question asked "Does the OS handle it specially? Are there any advantages of this way of handling it?" Your answer doesn't appear to address those questions. Would you like to edit your answer to more directly address what was asked? – D.W. Nov 18 '16 at 15:58