4

When I use the function getenv() from the Standard C Library, my program inherit the environment variables from its parent.

Example:

$ export FOO=42
$ <<< 'int main() {printf("%s\n", getenv("FOO"));}' gcc -w -xc - && ./a.exe
42

In libc, the environ variable is declared into environ.c. I am expecting it to be empty at the execution, but I get 42.

Going a bit further getenv can be simplified as follow:

char * getenv (const char *name)
{
    size_t len = strlen (name);
    char **ep;
    uint16_t name_start;

    name_start = *(const uint16_t *) name;
    len -= 2;
    name += 2;

    for (ep = __environ; *ep != NULL; ++ep)
    {
        uint16_t ep_start = *(uint16_t *) *ep;

        if (name_start == ep_start && !strncmp (*ep + 2, name, len)
                && (*ep)[len + 2] == '=')
            return &(*ep)[len + 3];
    }
    return NULL;
}
libc_hidden_def (getenv)

Here I will just get the content of the __environ variable. However I never initialized it.

So I get confused because environ is supposed to be NULL unless my main function is not the real entry point of my program. Perhaps gcc is ticking me by adding an _init function that is part of the standard C library.

Where is environ initialized?

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
nowox
  • 19,233
  • 18
  • 91
  • 202
  • 2
    Then what would be the point of **environment variables**??????? – Iharob Al Asimi Jun 24 '15 at 19:13
  • this looks correct to me. you define `FOO` before you run the program, so therefore it is accessible. @iharob there are many uses, just kind of specific to the program running. – Snappawapa Jun 24 '15 at 19:13
  • 2
    These variables have the purpose of being present in any program that runs on a given environment, you can however create a process with clean environment, but the point of using environment variables is that they are accessible from within a child process by any child process, so the shell of course creates the new process with a copy of it's own **environment**. – Iharob Al Asimi Jun 24 '15 at 19:17
  • Exactly which C library implementation are you quoting here? – zwol Jun 24 '15 at 19:38
  • @zwol [glibc](http://www.gnu.org/software/libc/download.html) – nowox Jun 24 '15 at 19:40

5 Answers5

11

The environment variables are passed down from the parent process as a third argument to main. The easiest way to discover this is to read the documentation for the system call execve, particularly this bit:

int execve(const char *filename, char *const argv[], char *const envp[]);

Description

execve() executes the program pointed to by filename. [...] argv is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed. envp is an array of strings, conventionally of the form key=value, which are passed as environment to the new program. Both argv and envp must be terminated by a NULL pointer. The argument vector and environment can be accessed by the called program's main function, when it is defined as:

int main(int argc, char *argv[], char *envp[])

The C library copies the envp argument into the environ global variable somewhere in its startup code, before it calls main: for instance, GNU libc does this in _init and musl libc does it in __init_libc. (You may find musl libc's code easier to trace through than GNU libc's.) Conversely, if you start a program using one of the exec wrapper functions that don't take an explicit environment vector, the C library supplies environ as the third argument to execve. Inheritance of environment variables is thus strictly a user-space convention. As far as the kernel is concerned, each program receives two argument vectors, and it doesn't care what's in them.

(Note that three-argument main is an extension to the C language. The C standard only specifies int main(void) and int main(int argc, char **argv) but it permits implementations to define additional forms (C11 Annex J.5.1 Environment Arguments). The three-argument main has been how environment variables work since Unix V7 if not longer, and is documented by Microsoft too — see What should main() return in C and C++?.)

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
zwol
  • 121,956
  • 33
  • 219
  • 328
  • 1
    So `gcc` is tricking me and `main` is not the real main function? I thought only the operating system can branch before main, not a library. If I build a program with no lib inclusion I will not expect to use the libc. – nowox Jun 24 '15 at 19:30
  • I like this answer: new knowledge! I just printed out the environment strings passed to `main` the way you say they are (MSVC). – Weather Vane Jun 24 '15 at 19:33
  • @WeatherVane MSVC supports three-argument main??! I would never have guessed. – zwol Jun 24 '15 at 22:16
  • 1
    @nowox Indeed, the "entry point" where the kernel starts program execution is *not* `main`, it is somewhere in the C library -- the usual name for *that* function is `_start`, and in the glibc source you will find it in the `csu` directory. (And with shared libraries there is a "program interpreter" that gets invoked even before that!) You have to use special arguments to the compiler (GCC: `-nostdlib -nostartfiles`) if you don't want the C library. You almost surely *do* want the C library, even if you think you don't. – zwol Jun 24 '15 at 22:21
  • @WeatherVane BTW, environment variables work differently on Windows. I am not familiar with the details, I just know it's not the same. – zwol Jun 24 '15 at 22:23
  • I expect they would, since they are a function of the OS. – Weather Vane Jun 24 '15 at 22:43
10

There is no mystery here.

First, the shell forks. Forked process obviously has the same environment. Then a new program is executed in the child. The syscall in question is execve, which amongst other things accepts a pointer to an environment.

So there, what environment is set after execing a binary depends entirely on the code which was doing the exec.

All this is can be easily seen by running strace.

EDIT: since the question was edited to ask about environ:

When you execute a dynamically linked binary, the very first userspace code doing anything comes from the loader. The loader amongst other things sets up variables like argc, argv or environ and only then calls main() from the binary.

Once more, sources for all this are freely available. While glibc's sources are rather hard to read due to atrocious formatting, BSD ones are easy and conceptually equivalent enough.

http://code.metager.de/source/xref/freebsd/libexec/rtld-elf/rtld.c#389

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
employee of the month
  • 3,598
  • 1
  • 10
  • 17
  • I did not know about `strace` this is pretty useful. – nowox Jun 24 '15 at 19:25
  • Can you detail your sentence: `When you execute a dynamically linked binary, the very first userspace code doing anything comes from the loader.`. Is the loader part of the executable? – nowox Jun 24 '15 at 19:53
  • The interesting fact here is that when the loader is called (at the entry point you indicate in your link), the stack in the newly-created memory image already has argc, argv and env pushed onto it. These must have been previously copied from the memory space in which exec() was invoked to the memory space created for the new executable, and that must have been done by the OS since only it has access to both memory spaces at the same time. – rici Jun 24 '15 at 20:15
  • Clearly, the setup code is quite common for all dynamically linked librariers, thus the loader is also shared. Binaries contain a path to the loader, which typically is /lib64/ld-linux-x86-64.so.2 (depends on your system of course and binaries you got). While linux will try to execute whatever loader you got here, e.g. FreeBSD will try to match it against the list of known loaders and will error out. – employee of the month Jun 25 '15 at 00:26
3

Under Linux when a program starts it has its arguments and environmental variables stored on the stack. For C programs the code that executes before main looks at this, makes the argv and envp arrays of pointers, and then calls main with these values (and argc).

When a program calls execvpe to turn into a new program (often after calling fork) then an envp is passed in, along with an argv. The kernel will copy the data at these into the new program's stack.

When any of the other exec functions are called then the glibc will pass in the current program's environ as the new program's envp to execvpe (or directly to sys_exec).

nategoose
  • 11,360
  • 25
  • 40
  • So each time I `exec` a program I will push on the stack the entire set of environment variables? This is a considerable amount of wasted memory (relatively speaking). – nowox Jun 24 '15 at 19:32
  • No. Each process has it's own stack. When you `exec` the kernel starts a new stack memory map for the new program that is taking over the process, copies the arguments and environmental variables from the old memory map onto the new stack, and then frees the old memory. – nategoose Jun 24 '15 at 19:54
2

The question is really, How does the shell run commands?

The answer is by creating a new process probably using fork() and execl(), which creates a process with the same environment as the current process.

You can however create a new process with a custom environment using execvpe()/execle().

But in any normal situation that wouldn't be necessary, and specially since many programs expect some environment variables to be defined like PATH for example, normally a child process inherits the environment variables from the environment where it is invoked.

Iharob Al Asimi
  • 51,091
  • 5
  • 53
  • 91
1

The father process that calls your program (your shell) defines FOO. The newly created process receives a copy from the parent.

Marged
  • 9,123
  • 9
  • 45
  • 87