c++ - Benchmarking code - am I doing it right? -
i want benchmark c/c++ code. want measure cpu time, wall time , cycles/byte. wrote mesurement functions have problem cycles/byte.
to cpu time wrote function getrusage()
rusage_self
, wall time use clock_gettime
monotonic
, cycles/byte use rdtsc
.
i process input buffer of size, example, 1024: char buffer[1024]
. how benchmark:
- do warm-up phase, call
fun2measure(args)
1000 times:
for(int i=0; i<1000; i++) fun2measure(args);
then, real-timing benchmark, wall time:
`unsigned long i; double timetaken; double timetotal = 3.0; // process 3 seconds
for (timetaken=(double)0, i=0; timetaken <= timetotal; timetaken = walltime(1), i++) fun2measure(args); `
and cpu time (almost same):
for (timetaken=(double)0, i=0; timetaken <= timetotal; timetaken = walltime(1), i++) fun2measure(args);
but when want cpu cycle count function, use piece of code:
`unsigned long s = cyclecount(); (timetaken=(double)0, i=0; timetaken <= timetotal; timetaken = walltime(1), i++) { fun2measure(args); } unsigned long e = cyclecount(); unsigned long s = cyclecount(); (timetaken=(double)0, i=0; timetaken <= timetotal; timetaken = cputime(1), i++) { fun2measure(args); } unsigned long e = cyclecount();`
and then, count cycles/byte: ((e - s) / (i * inputssize);
. here inputssize
1024 because length of buffer
. when rise totaltime
10s ge strange results:
for 10s:
did fun2measure 1148531 times in 10.00 seconds 1024 bytes, 0 cycles/byte [cpu] did fun2measure 1000221 times in 10.00 seconds 1024 bytes, 3.000000 cycles/byte [wall]
for 5s:
did fun2measure 578476 times in 5.00 seconds 1024 bytes, 0 cycles/byte [cpu] did fun2measure 499542 times in 5.00 seconds 1024 bytes, 7.000000 cycles/byte [wall]
for 4s:
did fun2measure 456828 times in 4.00 seconds 1024 bytes, 4 cycles/byte [cpu] did fun2measure 396612 times in 4.00 seconds 1024 bytes, 3.000000 cycles/byte [wall]
my questions:
- are results ok?
- why when increase time 0 cycles/byte in cpu?
- how can measure average time, mean, standard deviation etc statistics such benchmarking?
- is benchmarking method 100% ok?
cheers!
1st edit:
after changing i
double
:
did fun2measure 1138164.00 times in 10.00 seconds 1024 bytes, 0.410739 cycles/byte [cpu] did fun2measure 999849.00 times in 10.00 seconds 1024 bytes, 3.382036 cycles/byte [wall]
my results seem ok. question #2 isnt question anymore:)
your cyclecount benchmark flawed includes cost walltime/cputime function calls. in general though, urge use proper profiler instead of trying reinvent wheel. performance counters give numbers can rely on. note cycles unreliable cpu not running @ fixed frequency or kernel may task switch , halt app time.
i write benchmarks such run given function n times, n being large enough such enough samples. externally apply profiler such linux perf me hard numbers reason about. repeating benchmark given time can calculate stddev/avg values, can in script runs benchmark few times , evaluates output of profiler.
Comments
Post a Comment