103

I wish to calculate the time it took for an API to return a value. The time taken for such an action is in the space of nanoseconds. As the API is a C++ class/function, I am using the timer.h to calculate the same:

  #include <ctime>
  #include <iostream>

  using namespace std;

  int main(int argc, char** argv) {

      clock_t start;
      double diff;
      start = clock();
      diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
      cout<<"printf: "<< diff <<'\n';

      return 0;
  }

The above code gives the time in seconds. How do I get the same in nano seconds and with more precision?

schoetbi
  • 9,843
  • 7
  • 44
  • 70
gagneet
  • 31,111
  • 28
  • 70
  • 99
  • the above code calculates in seconds, i want to get the answer in nano seconds... – gagneet Nov 08 '08 at 18:16
  • Need to add the platform to the question (and preferably to the title as well) to get a good answer. – Patrick Johnmeyer Nov 08 '08 at 19:57
  • Additionally to getting the time, one needs to look up issues with microbenchmarking (which is extremely complex) - just doing one execution, and getting the time at beginning and end, is unlikely to give enough precision. – Blaisorblade Apr 30 '12 at 16:27
  • @Blaisorblade: Especially since I've discovered in some of my tests that `clock()` is not nearly as fast as I thought it was. – Mooing Duck Apr 30 '12 at 16:31

17 Answers17

85

What others have posted about running the function repeatedly in a loop is correct.

For Linux (and BSD) you want to use clock_gettime().

#include <sys/time.h>

int main()
{
   timespec ts;
   // clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
   clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}

For windows you want to use the QueryPerformanceCounter. And here is more on QPC

Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:

QueryPerformanceCounter() and QueryPerformanceFrequency() offer a bit better resolution, but have different issues. For example in Windows XP, all AMD Athlon X2 dual core CPUs return the PC of either of the cores "randomly" (the PC sometimes jumps a bit backwards), unless you specially install AMD dual core driver package to fix the issue. We haven't noticed any other dual+ core CPUs having similar issues (p4 dual, p4 ht, core2 dual, core2 quad, phenom quad).

EDIT 2013/07/16:

It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx

...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for multiple processors, bugs in the BIOS or drivers may result in these routines returning different values as the thread moves from one processor to another...

However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.

This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.

Other fine reading related to timers:

See the comments for more details.

Community
  • 1
  • 1
grieve
  • 11,984
  • 10
  • 45
  • 59
  • 1
    I've seen TSC clock skew on an older dual Xeon PC, but not nearly as bad as on an Athlon X2 with C1 clock ramping enabled. With C1 clock ramping, executing a HLT instruction slows down the clock, causing the TSC on idle cores to increment more slowly than on active cores. – bk1e Nov 09 '08 at 05:33
  • 6
    CLOCK_MONOTONIC works on the versions of Linux I have avalaible. – Bernard Feb 15 '09 at 23:14
  • 1
    @Bernard - That must be newly added since I last looked at this. Thanks for the heads up. – grieve Feb 26 '09 at 21:15
  • 3
    In fact, you have to use `CLOCK_MONOTONIC_RAW`, if it is available, in order to get hardware time not adjusted by NTP. –  Mar 01 '12 at 00:51
  • As discussed here, correct implementation of QPC do not use the TSC counter, at least where it is known to be unreliable: http://stackoverflow.com/q/510462/53974 – Blaisorblade Apr 30 '12 at 16:25
  • @Blaisorblade: that's a pretty useless assurance when many current realworld systems do suffer from cross-core drift, sometimes of several seconds magnitude even without power saving states involved. The notion of "correct" is just a way for MS to blame the HAL/BIOS firmware authors, with nobody fixing it. – Tony Delroy Jul 10 '13 at 11:40
  • @grieve: this answer should be changed - the cross-core post-boot mis-sync and drift issues with the TSC registers used by QPC definitely aren't limited to old AMD dual core machines, and persist with recent Intel Core i7s and Xeons. – Tony Delroy Jul 10 '13 at 13:22
  • @TonyD: Do you have any supporting links for that? If so I will be happy to add them to the answer. Or feel free to add them yourself. – grieve Jul 10 '13 at 19:52
  • @TonyD: I might have been unclear. I'm talking about correctness of Microsoft code - their implementation of QPC now (as of Windows XP Service Pack 2, according to the source I linked) workarounds the problems of the TSC by not using it. Do you have evidence of *later* systems with this problem on QPC? If so, you should not challenge my reasoning, but my source (with supporting links). If what I linked is instead correct, using QPC is robust if and only if your application runs on XP SP 2 or later. grieve, if this turns out to be correct, would you add this note to the answer? – Blaisorblade Jul 13 '13 at 13:38
  • @Blaisorblade: Yes I would be happy to. – grieve Jul 13 '13 at 16:41
  • @grieve: The QPC API page - http://msdn.microsoft.com/en-us/library/ms644904(v=vs.85).aspx - says "you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL)." MS blaming firmware - still unreliable. Another page, "Build date: 6/12/2013", says reliable QPC use requires "3.Compute all timing on a single thread." / "4... it's best to keep the thread on a single processor.": http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx - no mentions of being ok on recent Windows versions. – Tony Delroy Jul 16 '13 at 06:49
70

This new answer uses C++11's <chrono> facility. While there are other answers that show how to use <chrono>, none of them shows how to use <chrono> with the RDTSC facility mentioned in several of the other answers here. So I thought I would show how to use RDTSC with <chrono>. Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system's built-in clock facilities (which will likely be based on clock(), clock_gettime() and/or QueryPerformanceCounter.

Note that the RDTSC instruction is x86-specific. QueryPerformanceCounter is Windows only. And clock_gettime() is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clock and std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.

First, here is how you create a C++11-compatible clock out of the Intel rdtsc assembly instruction. I'll call it x::clock:

#include <chrono>

namespace x
{

struct clock
{
    typedef unsigned long long                 rep;
    typedef std::ratio<1, 2'800'000'000>       period; // My machine is 2.8 GHz
    typedef std::chrono::duration<rep, period> duration;
    typedef std::chrono::time_point<clock>     time_point;
    static const bool is_steady =              true;

    static time_point now() noexcept
    {
        unsigned lo, hi;
        asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
        return time_point(duration(static_cast<rep>(hi) << 32 | lo));
    }
};

}  // x

All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).

To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.

If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).

Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:

#include <iostream>

template <class clock>
void
test_empty_loop()
{
    // Define real time units
    typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
    // or:
    // typedef std::chrono::nanoseconds nanoseconds;
    // Define double-based unit of clock tick
    typedef std::chrono::duration<double, typename clock::period> Cycle;
    using std::chrono::duration_cast;
    const int N = 100000000;
    // Do it
    auto t0 = clock::now();
    for (int j = 0; j < N; ++j)
        asm volatile("");
    auto t1 = clock::now();
    // Get the clock ticks per iteration
    auto ticks_per_iter = Cycle(t1-t0)/N;
    std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
    // Convert to real time units
    std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
              << "ps per iteration\n";
}

The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanoseconds unit I could have used.

As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cycle in the code).

The loop is timed with calls to clock::now() on either side. If you want to name the type returned from this function it is:

typename clock::time_point t0 = clock::now();

(as clearly shown in the x::clock example, and is also true of the system-supplied clocks).

To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.

You can get the count in any duration by using the count() member function. This returns the internal representation. Finally I use std::chrono::duration_cast to convert the duration Cycle to the duration picoseconds and print that out.

To use this code is simple:

int main()
{
    std::cout << "\nUsing rdtsc:\n";
    test_empty_loop<x::clock>();

    std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
    test_empty_loop<std::chrono::high_resolution_clock>();

    std::cout << "\nUsing std::chrono::system_clock:\n";
    test_empty_loop<std::chrono::system_clock>();
}

Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clock and std::chrono::system_clock. For me this prints out:

Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration

Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration

Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration

This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).

Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:

  1. The clock speed of my machine in order to define x::clock.
  2. The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.
Cody Gray
  • 222,280
  • 47
  • 466
  • 543
Howard Hinnant
  • 179,402
  • 46
  • 391
  • 527
  • 6
    By "RDTSC is Intel-only", you're really referring to the x86 architecture and derivatives, aren't you? [AMD, Cyrix, Transmeta x86 chips have the instruction](http://stackoverflow.com/a/8605960/103167), and Intel RISC and ARM processors don't. – Ben Voigt Oct 17 '12 at 15:44
  • 2
    @BenVoigt: +1 Yes, your correction is quite correct, thank you. – Howard Hinnant Oct 18 '12 at 02:12
  • 1
    How will CPU throttling affect this? Doesn't the clock speed change based on cpu load? – Tejas Kale Apr 25 '16 at 06:45
  • @TejasKale: This is described in the answer in the two consecutive paragraphs starting with "To build a clock you...". Typically timing code does not measure work which blocks a thread (but it can). And so typically your CPU won't throttle. But if you are measuring code involving sleep, mutex lock, condition_variable wait, etc, the `rdtsc` clock is likely to have inaccurate conversions to other units. It is a good idea to set your measurements up so that you can easily change and compare clocks (as shown in this answer). – Howard Hinnant Apr 25 '16 at 14:23
29

With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.

Still, something like that is a start:

Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.

inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}

This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.

Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().

Brock Adams
  • 82,642
  • 19
  • 207
  • 268
VonC
  • 1,042,979
  • 435
  • 3,649
  • 4,283
  • thanks for the information, this is useful. i did not think of the cpu cycles to compute the time, i think that is a very good point to keep in mind :-) – gagneet Nov 09 '08 at 03:29
  • 4
    Using QueryPerformanceFrequency() to turn TSC counts into elapsed time may not work. QueryPerformanceCounter() uses the HPET (High Precision Event Timer) on Vista when available. It uses the ACPI power management timer if the user adds /USEPMTIMER to boot.ini. – bk1e Nov 09 '08 at 05:21
23

To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime(). The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)

inline uint64_t rdtsc()
{
    uint32_t lo, hi;
    __asm__ __volatile__ (
      "xorl %%eax, %%eax\n"
      "cpuid\n"
      "rdtsc\n"
      : "=a" (lo), "=d" (hi)
      :
      : "%ebx", "%ecx" );
    return (uint64_t)hi << 32 | lo;
}

and for clock_gettime: (I chose microsecond resolution arbitrarily)

#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
    timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}

the timing and values produced:

Absolute values:
rdtsc           = 4571567254267600
clock_gettime   = 1278605535506855

Processing time: (10000000 runs)
rdtsc           = 2292547353
clock_gettime   = 1031119636
Marius
  • 3,182
  • 1
  • 26
  • 34
22

I am using the following to get the desired results:

#include <time.h>
#include <iostream>
using namespace std;

int main (int argc, char** argv)
{
    // reset the clock
    timespec tS;
    tS.tv_sec = 0;
    tS.tv_nsec = 0;
    clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    ...
    ... <code to check for the time to be put here>
    ...
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
    cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;

    return 0;
}
iammilind
  • 62,239
  • 27
  • 150
  • 297
gagneet
  • 31,111
  • 28
  • 70
  • 99
  • 2
    I downvoted because trying to apply this code I had to first google why the timespec is not defined. Then I had to google whats POSIX... and so as I understood it, this code is not relevant for Windows users who what to stick with the standard library. – Daniel Katz May 16 '15 at 19:31
8

For C++11, here is a simple wrapper:

#include <iostream>
#include <chrono>

class Timer
{
public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const {
        return std::chrono::duration_cast<second_>
            (clock_::now() - beg_).count(); }

private:
    typedef std::chrono::high_resolution_clock clock_;
    typedef std::chrono::duration<double, std::ratio<1> > second_;
    std::chrono::time_point<clock_> beg_;
};

Or for C++03 on *nix,

class Timer
{
public:
    Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }

    double elapsed() {
        clock_gettime(CLOCK_REALTIME, &end_);
        return end_.tv_sec - beg_.tv_sec +
            (end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
    }

    void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }

private:
    timespec beg_, end_;
};

Example of usage:

int main()
{
    Timer tmr;
    double t = tmr.elapsed();
    std::cout << t << std::endl;

    tmr.reset();
    t = tmr.elapsed();
    std::cout << t << std::endl;
    return 0;
}

From https://gist.github.com/gongzhitaao/7062087

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
gongzhitaao
  • 6,073
  • 3
  • 32
  • 44
5

In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.

For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.

Greg Hewgill
  • 828,234
  • 170
  • 1,097
  • 1,237
  • actualyy, i am trying to get the performance of the api for a particular call. for each run, it might give a different time, this may effect the graph i make for the performance improvement... hence the time in nano seconds. but yeah, this is a great idea, will consider it. – gagneet Nov 08 '08 at 18:56
5

You can use the following function with gcc running under x86 processors:

unsigned long long rdtsc()
{
  #define rdtsc(low, high) \
         __asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))

  unsigned int low, high;
  rdtsc(low, high);
  return ((ulonglong)high << 32) | low;
}

with Digital Mars C++:

unsigned long long rdtsc()
{
   _asm
   {
        rdtsc
   }
}

which reads the high performance timer on the chip. I use this when doing profiling.

Blaisorblade
  • 6,172
  • 40
  • 69
Walter Bright
  • 4,131
  • 1
  • 21
  • 27
  • 2
    this is useful, i will check if the processor is x86, as i am using a apple mac for experimentation... thanks :-) – gagneet Nov 09 '08 at 03:26
  • 1
    What values is the user supposed to give for high and low? Why do you define a macro inside the body of a function? Also, ulonglong, presumably typedef'd to unsigned long long, isn't a standard type. I'd like to use this but I'm not sure how ;) – Joseph Garvin Jun 11 '09 at 21:07
  • 1
    unsigned long is not the right thing to use under linux. You may want to consider using int instead as long and long long are both 64-bit on 64-bit Linux. – Marius Jul 08 '10 at 15:37
  • 3
    The TSC counter is nowadays often unreliable: it changes its speed on many processor when the frequency is changed, and is inconsistent across different cores, hence the TSC does not always grow. – Blaisorblade Apr 30 '12 at 16:30
  • 1
    @Marius: I implemented your comment, using `unsigned int` as the internal type. – Blaisorblade Apr 30 '12 at 16:34
3

Using Brock Adams's method, with a simple class:

int get_cpu_ticks()
{
    LARGE_INTEGER ticks;
    QueryPerformanceFrequency(&ticks);
    return ticks.LowPart;
}

__int64 get_cpu_clocks()
{
    struct { int32 low, high; } counter;

    __asm cpuid
    __asm push EDX
    __asm rdtsc
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    return *(__int64 *)(&counter);
}

class cbench
{
public:
    cbench(const char *desc_in) 
         : desc(strdup(desc_in)), start(get_cpu_clocks()) { }
    ~cbench()
    {
        printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
        if(desc) free(desc);
    }
private:
    char *desc;
    __int64 start;
};

Usage Example:

int main()
{
    {
        cbench c("test");
        ... code ...
    }
    return 0;
}

Result:

test took: 0.0002 ms

Has some function call overhead, but should be still more than fast enough :)

Thomas
  • 2,807
  • 21
  • 36
3

You can use Embedded Profiler (free for Windows and Linux) which has an interface to a multiplatform timer (in a processor cycle count) and can give you a number of cycles per seconds:

EProfilerTimer timer;
timer.Start();

... // Your code here

const uint64_t number_of_elapsed_cycles = timer.Stop();
const uint64_t nano_seconds_elapsed =
    mumber_of_elapsed_cycles / (double) timer.GetCyclesPerSecond() * 1000000000;

Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically. Therefore to be sure that converted times are correct, it is necessary to fix processor frequency before profiling.

Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Mi-La
  • 525
  • 5
  • 13
3

If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.

If you are using Boost, you can check boost::posix_time.

Raymond Martineau
  • 5,801
  • 1
  • 18
  • 12
  • want to keep the code portable, will see the boost library and check if i can bundle this with the code. thanks :-) – gagneet Nov 09 '08 at 03:28
3

I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.

#include <dos.h>

void main() 
{
struct  time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
   t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...

// read the time here remove Hours and min if the time is in sec

gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d  Hundreds:%d\n",t.ti_hour-Hour,
                             t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main
sth
  • 200,334
  • 49
  • 262
  • 354
2

What do you think about that:

    int iceu_system_GetTimeNow(long long int *res)
    {
      static struct timespec buffer;
      // 
    #ifdef __CYGWIN__
      if (clock_gettime(CLOCK_REALTIME, &buffer))
        return 1;
    #else
      if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &buffer))
        return 1;
    #endif
      *res=(long long int)buffer.tv_sec * 1000000000LL + (long long int)buffer.tv_nsec;
      return 0;
    }
icegood
  • 44
  • 3
2

Here is a nice Boost timer that works well:

//Stopwatch.hpp

#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP

//Boost
#include <boost/chrono.hpp>
//Std
#include <cstdint>

class Stopwatch
{
public:
    Stopwatch();
    virtual         ~Stopwatch();
    void            Restart();
    std::uint64_t   Get_elapsed_ns();
    std::uint64_t   Get_elapsed_us();
    std::uint64_t   Get_elapsed_ms();
    std::uint64_t   Get_elapsed_s();
private:
    boost::chrono::high_resolution_clock::time_point _start_time;
};

#endif // STOPWATCH_HPP


//Stopwatch.cpp

#include "Stopwatch.hpp"

Stopwatch::Stopwatch():
    _start_time(boost::chrono::high_resolution_clock::now()) {}

Stopwatch::~Stopwatch() {}

void Stopwatch::Restart()
{
    _start_time = boost::chrono::high_resolution_clock::now();
}

std::uint64_t Stopwatch::Get_elapsed_ns()
{
    boost::chrono::nanoseconds nano_s = boost::chrono::duration_cast<boost::chrono::nanoseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(nano_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_us()
{
    boost::chrono::microseconds micro_s = boost::chrono::duration_cast<boost::chrono::microseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(micro_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_ms()
{
    boost::chrono::milliseconds milli_s = boost::chrono::duration_cast<boost::chrono::milliseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(milli_s.count());
}

std::uint64_t Stopwatch::Get_elapsed_s()
{
    boost::chrono::seconds sec = boost::chrono::duration_cast<boost::chrono::seconds>(boost::chrono::high_resolution_clock::now() - _start_time);
    return static_cast<std::uint64_t>(sec.count());
}
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
Patrick K
  • 321
  • 1
  • 5
2

If this is for Linux, I've been using the function "gettimeofday", which returns a struct that gives the seconds and microseconds since the Epoch. You can then use timersub to subtract the two to get the difference in time, and convert it to whatever precision of time you want. However, you specify nanoseconds, and it looks like the function clock_gettime() is what you're looking for. It puts the time in terms of seconds and nanoseconds into the structure you pass into it.

Will Mc
  • 246
  • 2
  • 8
2

Minimalistic copy&paste-struct + lazy usage

If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.

You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.

#include <chrono>
struct MeasureTime
{
    using precision = std::chrono::microseconds;
    std::vector<std::chrono::steady_clock::time_point> times;
    std::chrono::steady_clock::time_point oneLast;
    void p() {
        std::cout << "Mark " 
                << times.size()/2
                << ": " 
                << std::chrono::duration_cast<precision>(times.back() - oneLast).count() 
                << std::endl;
    }
    void m() {
        oneLast = times.back();
        times.push_back(std::chrono::steady_clock::now());
    }
    void t() {
        m();
        p();
        m();
    }
    MeasureTime() {
        times.push_back(std::chrono::steady_clock::now());
    }
};

Usage

MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...

Standard output result

Mark 1: 123
Mark 2: 32
Mark 3: 433234

If you want summary after execution

If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):

void s() { // summary
    int i = 0;
    std::chrono::steady_clock::time_point tprev;
    for(auto tcur : times)
    {
        if(i > 0)
        {
            std::cout << "Mark " << i << ": "
                    << std::chrono::duration_cast<precision>(tprev - tcur).count()
                    << std::endl;
        }
        tprev = tcur;
        ++i;
    }
}

So then you can just use:

MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();

Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().

Yeti
  • 2,195
  • 2
  • 26
  • 33
0

plf::nanotimer is a lightweight option for this, works in Windows, Linux, Mac and BSD etc. Has ~microsecond accuracy depending on OS:

  #include "plf_nanotimer.h"
  #include <iostream>

  int main(int argc, char** argv)
  {
      plf::nanotimer timer;

      timer.start()

      // Do something here

      double results = timer.get_elapsed_ns();
      std::cout << "Timing: " << results << " nanoseconds." << std::endl;    
      return 0;
  }
metamorphosis
  • 1,543
  • 13
  • 23