1

I have the next easy function in order to measure the computational time by process:

double get_cpu_time()
{
  //LINUX      
  const static int64_t NANOS_PER_SEC = 1000000000L;
  struct timespec time;
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
  return (((int64_t) time.tv_sec) * NANOS_PER_SEC) + ((int64_t) time.tv_nsec);

}


ini_time = get_cepu_time();

//intesive computation code

end_time = get_cepu_time();


end_time = end_time - ini_time

;

This function return the computational time of each process, in a simple equation could be suach as:

Tcomp = Tcpu + Taccmen => inst * ILP + #miss cache * Latency time

In interesting in obtain only the Tcpu time (time executing instruction without consider time for looking for the data), do you know any function in order to obtain this time, or a function that return the memory access time, then I could sustract( tcomp - Taccmem)

best regards,

Jen

  • How about `getrusage`? – Carl Norum Feb 06 '13 at 22:09
  • I doubt that's possible without special functions in the CPU - modern x86 CPU's do a lot of measurements of various types [if enabled] in performance counters. But I'm not entirely sure even that will do exactly what you are asking for, as I'm not sure you can measure the exact things you want. There's definitely a "count of instructions completed" [or something to that effect]. – Mats Petersson Feb 06 '13 at 22:18

2 Answers2

3

Use the perf command on linux to get this kind of performance data.

For example, on x86 platform

perf stat -B sleep 5

Performance counter stats for 'sleep 5':

      0.344308 task-clock                #    0.000 CPUs utilized          
             1 context-switches          #    0.003 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
           154 page-faults               #    0.447 M/sec                  
        977183 cycles                    #    2.838 GHz                    
        586878 stalled-cycles-frontend   #   60.06% frontend cycles idle   
        430497 stalled-cycles-backend    #   44.05% backend  cycles idle   
        720815 instructions              #    0.74  insns per cycle        
                                         #    0.81  stalled cycles per insn
        152217 branches                  #  442.095 M/sec                  
          7646 branch-misses             #    5.02% of all branches        

   5.002763199 seconds time elapsed

This runs the sleep 5 command and gives you details gathered from the performance counters on the x86 processor. Of interest to you would be to look at the counts of instructions executed and number of cyles, the ratio is instructions per cycle which it calculates for you, it also tells you how many cycles on average the processor was stalled per instruction. To get the number of cache references and the number of misses you need to ask for that explicitly

perf stat -B -e cache-references,cache-misses,cycles,instructions

See Why doesn't perf report cache misses?

Community
  • 1
  • 1
amdn
  • 10,570
  • 28
  • 42
0

... ran out of comment space...

if you are interested in profiling an algorithm, and want to abstract out even memory access time, then you will have to put your test data directly in general purpose registers... so that would be assembly. but if you are alright with stack memory access times, then you could stub out some stack variables and use those... you probably don't really need to do any of that , just do

struct timespec time;
struct timespec time2;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time); 
something();
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);

then worry about decoding the times after you are finished... you don't want to do a bunch of calculating in your get time function... that will cause problems if you are actually trying to get meaningful data..

if you aren't worried about time, but computational complexity and number of instructions then you should compile to assembly and try to follow it through... (sometimes easier said than done...)

    void simple()
{
    int i = 0;
    for (int j=0;j<25;j++)
        i+=j;
}

//-> gcc -S -std=c99 simple.c

_simple:

Leh_func_begin1:
    pushq   %rbp
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movl    $0, -4(%rbp)
    movl    $0, -8(%rbp)
    jmp LBB1_2
LBB1_1:
    movl    -4(%rbp), %eax
    movl    -8(%rbp), %ecx
    addl    %ecx, %eax
    movl    %eax, -4(%rbp)
    movl    -8(%rbp), %eax
    addl    $1, %eax
    movl    %eax, -8(%rbp)
LBB1_2:
    movl    -8(%rbp), %eax
    cmpl    $24, %eax
    jle LBB1_1
    popq    %rbp
    ret
Leh_func_end1:

you can sort of make sense of it LBB1_2: corresponds to the condition part of the for loop and that jumps back to the LBB1_1: to do the inside of the for loop then the increment part of the for loop, then falls through to LBB1_2: again... kind of cool.

Grady Player
  • 13,550
  • 2
  • 47
  • 75