What do 'real', 'user' and 'sys' mean in the output of time(1)?

Question

$ time foo
real        0m0.003s
user        0m0.000s
sys         0m0.004s
$

What do 'real', 'user' and 'sys' mean in the output of time?

Which one is meaningful when benchmarking my app?

how can i access just to one of them? for example just real time? — Mojtaba Ahmadi, Mar 06 '15 at 10:42
@Casillass Real - http://stackoverflow.com/questions/2408981/how-cant-i-redirect-the-output-of-time-command — ConcernedOfTunbridgeWells, Mar 06 '15 at 16:46
If your program exits that fast, none of them are meaningful, it's all just startup overhead. If you want to measure the whole program with `time`, have it do something that will take at least a second. — Peter Cordes, Nov 25 '17 at 00:01
It is really important to note that `time` is a bash keyword. So typing `man time` is _not_ giving you a man page for the bash `time`, rather it is giving the man page for `/usr/bin/time`. This has tripped me up. — irritable_phd_syndrome, Jul 03 '19 at 12:33

score 2279 · Accepted Answer · edited Jun 25 '19 at 11:28

Real, User and Sys process time statistics

One of these things is not like the other. Real refers to actual elapsed time; User and Sys refer to CPU time used only by the process.

Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.
Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process. See below for a brief description of kernel mode (also known as 'supervisor' mode) and the system call mechanism.

User+Sys will tell you how much actual CPU time your process used. Note that this is across all CPUs, so if the process has multiple threads (and this process is running on a computer with more than one processor) it could potentially exceed the wall clock time reported by Real (which usually occurs). Note that in the output these figures include the User and Sys time of all child processes (and their descendants) as well when they could have been collected, e.g. by wait(2) or waitpid(2), although the underlying system calls return the statistics for the process and its children separately.

Origins of the statistics reported by time (1)

The statistics reported by time are gathered from various system calls. 'User' and 'Sys' come from wait (2) (POSIX) or times (2) (POSIX), depending on the particular system. 'Real' is calculated from a start and end time gathered from the gettimeofday (2) call. Depending on the version of the system, various other statistics such as the number of context switches may also be gathered by time.

On a multi-processor machine, a multi-threaded process or a process forking children could have an elapsed time smaller than the total CPU time - as different threads or processes may run in parallel. Also, the time statistics reported come from different origins, so times recorded for very short running tasks may be subject to rounding errors, as the example given by the original poster shows.

A brief primer on Kernel vs. User mode

On Unix, or any protected-memory operating system, 'Kernel' or 'Supervisor' mode refers to a privileged mode that the CPU can operate in. Certain privileged actions that could affect security or stability can only be done when the CPU is operating in this mode; these actions are not available to application code. An example of such an action might be manipulation of the MMU to gain access to the address space of another process. Normally, user-mode code cannot do this (with good reason), although it can request shared memory from the kernel, which could be read or written by more than one process. In this case, the shared memory is explicitly requested from the kernel through a secure mechanism and both processes have to explicitly attach to it in order to use it.

The privileged mode is usually referred to as 'kernel' mode because the kernel is executed by the CPU running in this mode. In order to switch to kernel mode you have to issue a specific instruction (often called a trap) that switches the CPU to running in kernel mode and runs code from a specific location held in a jump table. For security reasons, you cannot switch to kernel mode and execute arbitrary code - the traps are managed through a table of addresses that cannot be written to unless the CPU is running in supervisor mode. You trap with an explicit trap number and the address is looked up in the jump table; the kernel has a finite number of controlled entry points.

The 'system' calls in the C library (particularly those described in Section 2 of the man pages) have a user-mode component, which is what you actually call from your C program. Behind the scenes, they may issue one or more system calls to the kernel to do specific services such as I/O, but they still also have code running in user-mode. It is also quite possible to directly issue a trap to kernel mode from any user space code if desired, although you may need to write a snippet of assembly language to set up the registers correctly for the call.

More about 'sys'

There are things that your code cannot do from user mode - things like allocating memory or accessing hardware (HDD, network, etc.). These are under the supervision of the kernel, and it alone can do them. Some operations like malloc orfread/fwrite will invoke these kernel functions and that then will count as 'sys' time. Unfortunately it's not as simple as "every call to malloc will be counted in 'sys' time". The call to malloc will do some processing of its own (still counted in 'user' time) and then somewhere along the way it may call the function in kernel (counted in 'sys' time). After returning from the kernel call, there will be some more time in 'user' and then malloc will return to your code. As for when the switch happens, and how much of it is spent in kernel mode... you cannot say. It depends on the implementation of the library. Also, other seemingly innocent functions might also use malloc and the like in the background, which will again have some time in 'sys' then.

@ron - According to the Linux man page, it aggregates the 'c' times with the process times, so I think it does. The parent times and child times are available separately from the times(2) call, though. I guess the Solaris/SysV version of time(1) does something similar. — ConcernedOfTunbridgeWells, Aug 30 '11 at 10:28
@ron I've just edited the answer: The time spent by child processes and their descendants count *only* when the time could have been collected by wait(2) or waitpid(2), and recursively with descendants. This implies that the child processes must have terminated. For instance, compare `time sh -c 'foo & sleep 1'` and `time sh -c 'foo & sleep 2'`, where `foo` is a command that takes CPU time between 1 and 2 seconds. The former outputs something around 0. — vinc17, Sep 10 '14 at 11:44
User+Sys lets you measure CPU usage of a process. You can use it to benchmark performance. This is particularly useful for multi-threaded code where more than one CPU core might be working on a computation. — ConcernedOfTunbridgeWells, Nov 09 '14 at 21:35
Not precisely on topic, nevertheless: Running "\time " is interesting - it provides more detail: (forgive poor formatting in the comment): $ time ps PID TTY TIME CMD 9437 pts/19 00:00:00 bash 11459 pts/19 00:00:00 ps real 0m0.025s user 0m0.004s sys 0m0.018s $ \time ps PID TTY TIME CMD 9437 pts/19 00:00:00 bash 11461 pts/19 00:00:00 time 11462 pts/19 00:00:00 ps 0.00user 0.01system 0:00.02elapsed 95%CPU (0avgtext+0avgdata 2160maxresident)k 0inputs+0outputs (0major+103minor)pagefaults 0swaps $ — kaiwan, May 09 '15 at 07:41
(Ran out of chars in the prev comment so): More detail? Use perf [1], [2]. [1] https://perf.wiki.kernel.org/index.php/Main_Page [2] http://www.brendangregg.com/perf.html — kaiwan, May 09 '15 at 07:46
Is there a simple way of finding out how the "Real" time is spent? Like, is it more time on network I/O than on disc I/O? — FORTRAN, Apr 19 '17 at 13:41
@Gunnar 'Real' time can be spent on other processes, so there isn't any way using these calls to ascertain what the computer was doing. You can get stats using the process accounting system on most unix systems, which may tell you useful information. Top (1) goes through kernel data structures via /dev/kmem to get its data. — ConcernedOfTunbridgeWells, May 23 '17 at 13:16
I find that sometimes the User + Sys does not equal real time. What would cause this? Also how does this generalise to GPU computation? — CMCDragonkai, May 10 '18 at 05:36
@CMCDragonkai the post discusses the relationship between user, sys and real in some detail. GPU time is not counted in process stats as far as I'm aware, as calls out to the GPU will either block the process (forcing a context switch) or will be non-blocking, in which case the process will continue and the system will measure the CPU time used by the process. You will need GPU-specific benchmarking tools to measure how much GPU time is being used by the process. — ConcernedOfTunbridgeWells, May 10 '18 at 11:19
Most of it is in the man page for time(1). The rest I learned from an operating systems course at university. There are books (e.g. Bach or Stevens) that cover this material. — ConcernedOfTunbridgeWells, Aug 19 '18 at 10:16
Not necessarily the case on multiprocessor machines. Multiple threads can rack up more CPU time than elapsed time. — ConcernedOfTunbridgeWells, Jan 14 '21 at 13:39
“various other statistics such as the number of context switches may also be gathered by time.” yes this is true. Add time.sleep(0) in to a large loop to test this, both the user and sys time increased — tinyhare, May 10 '21 at 03:58

score 314 · Answer 2 · edited May 23 '17 at 12:34

314

To expand on the accepted answer, I just wanted to provide another reason why real ≠ user + sys.

Keep in mind that real represents actual elapsed time, while user and sys values represent CPU execution time. As a result, on a multicore system, the user and/or sys time (as well as their sum) can actually exceed the real time. For example, on a Java app I'm running for class I get this set of values:

real    1m47.363s
user    2m41.318s
sys     0m4.013s

edited May 23 '17 at 12:34

Community

1
1

answered Apr 29 '10 at 05:29

lensovet

4,683
1
21
18

12

I'd always wondered about this. Since I know that my programs are single threaded, the difference between user and real time must be VM overhead, correct? – Quantum7 May 05 '10 at 00:13
12

not necessarily; the Sun JVM on Solaris machines as well as Apple's JVM on Mac OS X manages to use more than one core even in single-threaded apps. If you do a sample of a java process, you'll see that things like garbage collection run on separate threads (and some other stuff too that I don't remember off the top of my head). I don't know if you really want to term that "VM overhead" though. – lensovet May 07 '10 at 07:10
@Quantum7 - no, not necessarily. See my post above. Real is elapsed time, user and sys are accumulated time slice statistics from the CPU time the process actually uses. – ConcernedOfTunbridgeWells Apr 12 '11 at 16:20
4

I guess the amount of up-votes gave you enough reputation now :D. So what do you think about `real` exceeding `user` and `sys` total ? OS overhead such as thread context switching may be ? – Muhammad Gelbana Sep 26 '12 at 11:36
Context switching would add that much additional runtime only if you had a multithreaded application that was in reality running serially (i.e. due to locks). A single-threaded application (truly single-threaded, which the JVM isn't as we saw above) would also give you a similar result. – lensovet Sep 26 '12 at 18:35
24

Another potential issue could be I/O: if your application spends a good deal of time waiting to receive a file or stream, then obviously the real time would greatly exceed the user/sys time because no CPU time is used while waiting to get access to a file or something similar. – lensovet Sep 26 '12 at 18:39
I'd love to find out why people are sometimes down voting this answer. Is there something wrong with it? – lensovet Oct 19 '14 at 21:08
2

@MuhammadGelbana - this can happen if the application is blocked from executing for any reason. For example, if it is waiting on I/O, IPC or socket connections it will sit idle, accumulating no CPU time until the blocking call returns. – ConcernedOfTunbridgeWells Apr 21 '16 at 14:21
I came across this GC outputs. Its intuitive to say real = usr + sys but it is not always the case. This is also answered here http://unix.stackexchange.com/questions/40694/why-real-time-can-be-lower-than-user-time – RaviU Feb 14 '17 at 23:26
Wow this app looks really bug, looking at the executiuon time, you must be having a scanner in there rite? – omkaartg Jun 13 '18 at 13:36
@MuhammadGelbana It's not necessarily overhead at all. When the computer is hit with timer interrupts it'll decide to swap over to some daemons to keep them running, X11 to keep your windowing smooth, and then back to your process, in a fraction of a second. Most of that time will not be overhead, but still be time spent. appx <1% of that lost time will be "sys" time from other processes, so that'll be OS overhead like context switching, which counts towards real but not user/sys of the running process. Most of (real - (user+sys)) will be other program's "user" time however. – Nicholas Pipitone Jul 20 '18 at 18:27

score 51 · Answer 3 · answered Nov 24 '17 at 19:05

51

• real: The actual time spent in running the process from start to finish, as if it was measured by a human with a stopwatch

• user: The cumulative time spent by all the CPUs during the computation

• sys: The cumulative time spent by all the CPUs during system-related tasks such as memory allocation.

Notice that sometimes user + sys might be greater than real, as multiple processors may work in parallel.

answered Nov 24 '17 at 19:05

varun

3,910
29
28

`sys` is CPU-time spent in system calls (and page-fault handlers?) – Peter Cordes Nov 24 '17 at 23:56
2

`real` is often described as "wall-clock" time. – Peter Cordes Nov 24 '17 at 23:58

score 41 · Answer 4 · edited May 07 '21 at 20:51

Minimal runnable POSIX C examples

To make things more concrete, I want to exemplify a few extreme cases of time with some minimal C test programs.

All programs can be compiled and run with:

gcc -ggdb3 -o main.out -pthread -std=c99 -pedantic-errors -Wall -Wextra main.c
time ./main.out

and have been tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28, Linux kernel 4.18, ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB).

sleep

Non-busy sleep does not count in either user or sys, only real.

For example, a program that sleeps for a second:

#define _XOPEN_SOURCE 700
#include <stdlib.h>
#include <unistd.h>

int main(void) {
    sleep(1);
    return EXIT_SUCCESS;
}

GitHub upstream.

outputs something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

The same holds for programs blocked on IO becoming available.

For example, the following program waits for the user to enter a character and press enter:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    printf("%c\n", getchar());
    return EXIT_SUCCESS;
}

GitHub upstream.

And if you wait for about one second, it outputs just like the sleep example something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

For this reason time can help you distinguish between CPU and IO bound programs: What do the terms "CPU bound" and "I/O bound" mean?

Multiple threads

The following example does niters iterations of useless purely CPU-bound work on nthreads threads:

#define _XOPEN_SOURCE 700
#include <assert.h>
#include <inttypes.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

uint64_t niters;

void* my_thread(void *arg) {
    uint64_t *argument, i, result;
    argument = (uint64_t *)arg;
    result = *argument;
    for (i = 0; i < niters; ++i) {
        result = (result * result) - (3 * result) + 1;
    }
    *argument = result;
    return NULL;
}

int main(int argc, char **argv) {
    size_t nthreads;
    pthread_t *threads;
    uint64_t rc, i, *thread_args;

    /* CLI args. */
    if (argc > 1) {
        niters = strtoll(argv[1], NULL, 0);
    } else {
        niters = 1000000000;
    }
    if (argc > 2) {
        nthreads = strtoll(argv[2], NULL, 0);
    } else {
        nthreads = 1;
    }
    threads = malloc(nthreads * sizeof(*threads));
    thread_args = malloc(nthreads * sizeof(*thread_args));

    /* Create all threads */
    for (i = 0; i < nthreads; ++i) {
        thread_args[i] = i;
        rc = pthread_create(
            &threads[i],
            NULL,
            my_thread,
            (void*)&thread_args[i]
        );
        assert(rc == 0);
    }

    /* Wait for all threads to complete */
    for (i = 0; i < nthreads; ++i) {
        rc = pthread_join(threads[i], NULL);
        assert(rc == 0);
        printf("%" PRIu64 " %" PRIu64 "\n", i, thread_args[i]);
    }

    free(threads);
    free(thread_args);
    return EXIT_SUCCESS;
}

GitHub upstream + plot code.

Then we plot wall, user and sys as a function of the number of threads for a fixed 10^10 iterations on my 8 hyperthread CPU:

Plot data.

From the graph, we see that:

for a CPU intensive single core application, wall and user are about the same
for 2 cores, user is about 2x wall, which means that the user time is counted across all threads.

user basically doubled, and while wall stayed the same.
this continues up to 8 threads, which matches my number of hyperthreads in my computer.

After 8, wall starts to increase as well, because we don't have any extra CPUs to put more work in a given amount of time!

The ratio plateaus at this point.

Note that this graph is only so clear and simple because the work is purely CPU-bound: if it were memory bound, then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms "CPU bound" and "I/O bound" mean?

Quickly checking that wall < user is a simple way to determine that a program is multithreaded, and the closer that ratio is to the number of cores, the more effective the parallelization is, e.g.:

multithreaded linkers: Can gcc use multiple cores when linking?
C++ parallel sort: Are C++17 Parallel Algorithms implemented already?

Sys heavy work with sendfile

The heaviest sys workload I could come up with was to use the sendfile, which does a file copy operation on kernel space: Copy a file in a sane, safe and efficient way

So I imagined that this in-kernel memcpy will be a CPU intensive operation.

First I initialize a large 10GiB random file with:

dd if=/dev/urandom of=sendfile.in.tmp bs=1K count=10M

Then run the code:

#define _GNU_SOURCE
#include <assert.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *source_path, *dest_path;
    int source, dest;
    struct stat stat_source;
    if (argc > 1) {
        source_path = argv[1];
    } else {
        source_path = "sendfile.in.tmp";
    }
    if (argc > 2) {
        dest_path = argv[2];
    } else {
        dest_path = "sendfile.out.tmp";
    }
    source = open(source_path, O_RDONLY);
    assert(source != -1);
    dest = open(dest_path, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
    assert(dest != -1);
    assert(fstat(source, &stat_source) != -1);
    assert(sendfile(dest, source, 0, stat_source.st_size) != -1);
    assert(close(source) != -1);
    assert(close(dest) != -1);
    return EXIT_SUCCESS;
}

GitHub upstream.

which gives basically mostly system time as expected:

real    0m2.175s
user    0m0.001s
sys     0m1.476s

I was also curious to see if time would distinguish between syscalls of different processes, so I tried:

time ./sendfile.out sendfile.in1.tmp sendfile.out1.tmp &
time ./sendfile.out sendfile.in2.tmp sendfile.out2.tmp &

And the result was:

real    0m3.651s
user    0m0.000s
sys     0m1.516s

real    0m4.948s
user    0m0.000s
sys     0m1.562s

The sys time is about the same for both as for a single process, but the wall time is larger because the processes are competing for disk read access likely.

So it seems that it does in fact account for which process started a given kernel work.

Bash source code

When you do just time <cmd> on Ubuntu, it use the Bash keyword as can be seen from:

type time

which outputs:

time is a shell keyword

So we grep source in the Bash 4.19 source code for the output string:

git grep '"user\b'

which leads us to execute_cmd.c function time_command, which uses:

gettimeofday() and getrusage() if both are available
times() otherwise

all of which are Linux system calls and POSIX functions.

GNU Coreutils source code

If we call it as:

/usr/bin/time

then it uses the GNU Coreutils implementation.

This one is a bit more complex, but the relevant source seems to be at resuse.c and it does:

a non-POSIX BSD wait3 call if that is available
times and gettimeofday otherwise

1: https://i.stack.imgur.com/qAfEe.png**Minimal runnable POSIX C examples**

To make things more concrete, I want to exemplify a few extreme cases of time with some minimal C test programs.

All programs can be compiled and run with:

gcc -ggdb3 -o main.out -pthread -std=c99 -pedantic-errors -Wall -Wextra main.c
time ./main.out

and have been tested in Ubuntu 18.10, GCC 8.2.0, glibc 2.28, Linux kernel 4.18, ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB).

sleep

Non-busy sleep does not count in either user or sys, only real.

For example, a program that sleeps for a second:

#define _XOPEN_SOURCE 700
#include <stdlib.h>
#include <unistd.h>

int main(void) {
    sleep(1);
    return EXIT_SUCCESS;
}

GitHub upstream.

outputs something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

The same holds for programs blocked on IO becoming available.

For example, the following program waits for the user to enter a character and press enter:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    printf("%c\n", getchar());
    return EXIT_SUCCESS;
}

GitHub upstream.

And if you wait for about one second, it outputs just like the sleep example something like:

real    0m1.003s
user    0m0.001s
sys     0m0.003s

For this reason time can help you distinguish between CPU and IO bound programs: What do the terms "CPU bound" and "I/O bound" mean?

Multiple threads

The following example does niters iterations of useless purely CPU-bound work on nthreads threads:

#define _XOPEN_SOURCE 700
#include <assert.h>
#include <inttypes.h>
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

uint64_t niters;

void* my_thread(void *arg) {
    uint64_t *argument, i, result;
    argument = (uint64_t *)arg;
    result = *argument;
    for (i = 0; i < niters; ++i) {
        result = (result * result) - (3 * result) + 1;
    }
    *argument = result;
    return NULL;
}

int main(int argc, char **argv) {
    size_t nthreads;
    pthread_t *threads;
    uint64_t rc, i, *thread_args;

    /* CLI args. */
    if (argc > 1) {
        niters = strtoll(argv[1], NULL, 0);
    } else {
        niters = 1000000000;
    }
    if (argc > 2) {
        nthreads = strtoll(argv[2], NULL, 0);
    } else {
        nthreads = 1;
    }
    threads = malloc(nthreads * sizeof(*threads));
    thread_args = malloc(nthreads * sizeof(*thread_args));

    /* Create all threads */
    for (i = 0; i < nthreads; ++i) {
        thread_args[i] = i;
        rc = pthread_create(
            &threads[i],
            NULL,
            my_thread,
            (void*)&thread_args[i]
        );
        assert(rc == 0);
    }

    /* Wait for all threads to complete */
    for (i = 0; i < nthreads; ++i) {
        rc = pthread_join(threads[i], NULL);
        assert(rc == 0);
        printf("%" PRIu64 " %" PRIu64 "\n", i, thread_args[i]);
    }

    free(threads);
    free(thread_args);
    return EXIT_SUCCESS;
}

GitHub upstream + plot code.

Then we plot wall, user and sys as a function of the number of threads for a fixed 10^10 iterations on my 8 hyperthread CPU:

Plot data.

From the graph, we see that:

for a CPU intensive single core application, wall and user are about the same
for 2 cores, user is about 2x wall, which means that the user time is counted across all threads.

user basically doubled, and while wall stayed the same.
this continues up to 8 threads, which matches my number of hyperthreads in my computer.

After 8, wall starts to increase as well, because we don't have any extra CPUs to put more work in a given amount of time!

The ratio plateaus at this point.

Note that this graph is only so clear and simple because the work is purely CPU-bound: if it were memory bound, then we would get a fall in performance much earlier with less cores because the memory accesses would be a bottleneck as shown at What do the terms "CPU bound" and "I/O bound" mean?

Quickly checking that wall < user is a simple way to determine that a program is multithreaded, and the closer that ratio is to the number of cores, the more effective the parallelization is, e.g.:

multithreaded linkers: Can gcc use multiple cores when linking?
C++ parallel sort: Are C++17 Parallel Algorithms implemented already?

Sys heavy work with sendfile

The heaviest sys workload I could come up with was to use the sendfile, which does a file copy operation on kernel space: Copy a file in a sane, safe and efficient way

So I imagined that this in-kernel memcpy will be a CPU intensive operation.

First I initialize a large 10GiB random file with:

dd if=/dev/urandom of=sendfile.in.tmp bs=1K count=10M

Then run the code:

#define _GNU_SOURCE
#include <assert.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/sendfile.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *source_path, *dest_path;
    int source, dest;
    struct stat stat_source;
    if (argc > 1) {
        source_path = argv[1];
    } else {
        source_path = "sendfile.in.tmp";
    }
    if (argc > 2) {
        dest_path = argv[2];
    } else {
        dest_path = "sendfile.out.tmp";
    }
    source = open(source_path, O_RDONLY);
    assert(source != -1);
    dest = open(dest_path, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
    assert(dest != -1);
    assert(fstat(source, &stat_source) != -1);
    assert(sendfile(dest, source, 0, stat_source.st_size) != -1);
    assert(close(source) != -1);
    assert(close(dest) != -1);
    return EXIT_SUCCESS;
}

GitHub upstream.

which gives basically mostly system time as expected:

real    0m2.175s
user    0m0.001s
sys     0m1.476s

I was also curious to see if time would distinguish between syscalls of different processes, so I tried:

time ./sendfile.out sendfile.in1.tmp sendfile.out1.tmp &
time ./sendfile.out sendfile.in2.tmp sendfile.out2.tmp &

And the result was:

real    0m3.651s
user    0m0.000s
sys     0m1.516s

real    0m4.948s
user    0m0.000s
sys     0m1.562s

The sys time is about the same for both as for a single process, but the wall time is larger because the processes are competing for disk read access likely.

So it seems that it does in fact account for which process started a given kernel work.

Bash source code

When you do just time <cmd> on Ubuntu, it use the Bash keyword as can be seen from:

type time

which outputs:

time is a shell keyword

So we grep source in the Bash 4.19 source code for the output string:

git grep '"user\b'

which leads us to execute_cmd.c function time_command, which uses:

gettimeofday() and getrusage() if both are available
times() otherwise

all of which are Linux system calls and POSIX functions.

GNU Coreutils source code

If we call it as:

/usr/bin/time

then it uses the GNU Coreutils implementation.

This one is a bit more complex, but the relevant source seems to be at resuse.c and it does:

a non-POSIX BSD wait3 call if that is available
times and gettimeofday otherwise

score 15 · Answer 5 · answered Nov 05 '10 at 04:34

15

Real shows total turn-around time for a process; while User shows the execution time for user-defined instructions and Sys is for time for executing system calls!

Real time includes the waiting time also (the waiting time for I/O etc.)

answered Nov 05 '10 at 04:34

susenj

151
1
2

score 2 · Answer 6 · answered Oct 30 '20 at 13:50

I want to mention some other scenario when the real-time is much much bigger than user + sys. I've created a simple server which respondes after a long time

real 4.784
user 0.01s
sys  0.01s

the issue is that in this scenario the process waits for the response which is not on the user site nor in the system.

Something similar happens when you run the find command. In that case, the time is spent mostly on requesting and getting a response from SSD.

score 1 · Answer 7 · answered Jul 19 '20 at 01:13

In very simple terms, I like to think about it like this:

real is the actual amount of time it took to run the command (as if you had timed it with a stopwatch)
user and sys are how much 'work' the CPU had to do to execute the command. This 'work' is expressed in units of time.

Generally speaking:

user is how much work the CPU did to run to run the command's code
sys is how much work the CPU had to do to handle 'system overhead' type tasks (such as allocating memory, file I/O, ect.) in order to support the running command

Since these last two times are counting 'work' done, they don't include time a thread might have spent waiting (such as waiting on another process or for disk I/O to finish).

real, however, is a measure of actual runtime and not 'work', so it does include any time spent waiting.

What do 'real', 'user' and 'sys' mean in the output of time(1)?

7 Answers7

Linked

Related