I'm trying to understand how to measure performance and decided to write the very simple program:
section .text
global _start
_start:
mov rax, 60
syscall
And I ran the program with perf stat ./bin
The thing I was surprised by is the stalled-cycles-frontend
was too high.
0.038132 task-clock (msec) # 0.148 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
2 page-faults # 0.052 M/sec
107,386 cycles # 2.816 GHz
81,229 stalled-cycles-frontend # 75.64% frontend cycles idle
47,654 instructions # 0.44 insn per cycle
# 1.70 stalled cycles per insn
8,601 branches # 225.559 M/sec
929 branch-misses # 10.80% of all branches
0.000256994 seconds time elapsed
As I understand the stalled-cycles-frontend
it means that CPU frontend has to wait for the result of some operation (e.g. bus-transaction) to complete.
So what caused CPU frontend to wait for most of the time in that simplest case?
And 2 page faults? Why? I read no memory pages.