18

Newer Linux kernels have a sysfs tunable /proc/sys/kernel/perf_event_paranoid which allows the user to adjust the available functionality of perf_events for non-root users, with higher numbers being more secure (offering correspondingly less functionality):

From the kernel documenation we have the following behavior for the various values:

perf_event_paranoid:

Controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The default value is 2.

-1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK

>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN

>=1: Disallow CPU event access by users without CAP_SYS_ADMIN

>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN

I have 1 in my perf_event_paranoid file which should "Disallow CPU event access" - but what does that mean exactly?

A plain reading would imply no access to CPU performance counter events (such as Intel PMU events), but it seems I can access those just fine. For example:

$ perf stat sleep 1

 Performance counter stats for 'sleep 1':

          0.408734      task-clock (msec)         #    0.000 CPUs utilized          
                 1      context-switches          #    0.002 M/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                57      page-faults               #    0.139 M/sec                  
         1,050,362      cycles                    #    2.570 GHz                    
           769,135      instructions              #    0.73  insn per cycle         
           152,661      branches                  #  373.497 M/sec                  
             6,942      branch-misses             #    4.55% of all branches        

       1.000830821 seconds time elapsed

Here, many of the events are CPU PMU events (cycles, instructions, branches, branch-misses, cache-misses).

If these aren't the CPU events being referred to, what are they?

BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
  • random guess: restricts non-generic uarch-specific events like `uops_issued.any`? – Peter Cordes Aug 18 '18 at 18:33
  • @PeterCordes - nope, `ocperf stat -e uops_issued.any sleep 1` also works. – BeeOnRope Aug 18 '18 at 18:34
  • 1
    Another indication that it's not plain PMU events is that value `2` prevents kernel profiling, and indeed any kernel CPU events (`:k` suffix) return zero when using 2. Since 2 is supposed to be strictly more secure than 1, it implies that user mode events are allowed in 1 and 2, and kernel mode events in 1 (indeed, `:k` works when using 1), so "CPU events" must mean something narrower or different than plain PMU events... – BeeOnRope Aug 18 '18 at 18:37
  • 2
    New guess: it's for system-wide profiling, or whole-CPU instead of my-process. You can profile kernel code invoked by your own process with `1`, but probably it tries to defend against directly timing *other* users' processes, leaving only hyperthreading timing side-channels. – Peter Cordes Aug 18 '18 at 19:14
  • 3
    Value of "3" was proposed in 2013: https://lkml.org/lkml/2016/1/11/587 "security,perf: Allow further restriction of perf_event_open". The option was added as https://github.com/torvalds/linux/commit/0764771dab80d7b84b9a271bee7f1b21a04a3f0c and https://github.com/torvalds/linux/commit/0fbdea19e9394a5cb5f2f5081b028c50b558910a in 2009. We should check LKML for the patches for more comments. – osgx Aug 18 '18 at 19:20

1 Answers1

10

In this case CPU event refers to monitoring events per CPU rather than per task. For perf tools this restricts the usage of

-C, --cpu=
    Count only on the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1.
    Ranges of CPUs are specified with -: 0-2. In per-thread mode, this option is ignored. The -a option is still necessary
    to activate system-wide monitoring. Default is to count on all CPUs.

-a, --all-cpus
    system-wide collection from all CPUs (default if no target is specified)

For perf_event_open this considers the following case:

pid == -1 and cpu >= 0
       This measures all processes/threads on the specified CPU.  This requires CAP_SYS_ADMIN capability or a /proc/sys/ker‐
       nel/perf_event_paranoid value of less than 1.

This may be version specific, the cited documentation is from 4.17. This is another related question.

Zulan
  • 20,904
  • 6
  • 41
  • 90
  • 3
    For what it's worth, that [the accepted answer](https://unix.stackexchange.com/a/14256/87246) on that other question seems wrong wrt `paranoid == 2`. With `paranoid == 2` I can definitely still use `perf stat` to get events, but I just can't see the kernel PMU counts (user only). – BeeOnRope Aug 19 '18 at 19:42
  • 2
    I can confirm that this answer is correct by looking at the code. Also, the check `sysctl_perf_event_paranoid > 0` is performed to determine whether it is allowed to use thread-shared (shared between logical processors) events on P4 processors (formally called TI events). – Hadi Brais Aug 19 '18 at 21:27
  • 1
    Confirmed through local testing. – BeeOnRope Aug 27 '18 at 16:31