**TimedExec** is a small utility for *benchmarking* programs. It will *execute* the specified program with the specified command-line arguments and *measure* the time that it takes for the execution to complete. Because the execution time of a program unavoidably is subject to certain variations (e.g. due to environmental noise), each measurement will be repeated *multiple* times. The number of metering passes can be configured as desired. Optionally, a number of “warm-up” passes can be performed *prior to* the metering passes. The warm-up passes prevent caching effects from interfering with the measurement.
Once all metering passes have been completed, TimedExec will compute the ***mean*** execution time as well as the ***median*** execution time of the program. It will also record the *fastest* and *slowest* execution time that has been observed. Furthermore, TimedExec computes the *standard error*, in order to determine [***confidence intervals***](http://www.uni-siegen.de/phil/sozialwissenschaften/soziologie/mitarbeiter/ludwig-mayerhofer/statistik/statistik_downloads/konfidenzintervalle.pdf) from the benchmarking results.
*TimedExec* uses a very simple command-line syntax. Just type **`TimedExec`**, followed by the program that you want to benchmark. Optionally, any number arguments can be appended; these parameters will be passed to the program.
In the following example we use *TimedExec* to benchmark the program **`ping.exe`** with the arguments **`-n 12 www.google.com`**. By default, the command will be executed *five* times, preceded by a single "warm-up" pass:
When comparing measurement results, the [***mean***](https://en.wikipedia.org/wiki/Arithmetic_mean) (average) execution time may seem like the most obvious choice. However, it has to be noted that the *mean* of a data sample is highly sensitive to “outliers” and therefore can be misleading! This is especially true, when there exists a lot of variation in the data sample. Consequently, comparing the [***median***](https://en.wikipedia.org/wiki/Median) of the execution times often is the better choice. That is because the *median* of a data sample is much more robust against outliers.
Furthermore, it is important to keep in mind that the *mean* (or *median*) execution time computed from a limited number of metering passes only yields an ***estimate*** of the program's “real” average execution time (expected value). The “real” value can only be determined accurately from an *infitinte* number of metering passes – which is **not** possible in practice. In this situation, we can have a look at the [***confidence intervals***](http://www.uni-siegen.de/phil/sozialwissenschaften/soziologie/mitarbeiter/ludwig-mayerhofer/statistik/statistik_downloads/konfidenzintervalle.pdf). These intervals contain the “real” value, *with very high probability*. The most commonly used *confidence interval* is the “95%” one. Higher confidence means broader interval, and vice versa.
Simply put, as long as the confidence intervals of the runtime of program “A” and the runtime of program “B” *overlap*, we **must not** conclude that either of these programs runs faster (or slower). In fact, **no** real conclusion can be drawn in that case!
This tools measures the runtime of *processes*. Because creating a process has a certain overhead, and because the system timer has a limited precision – usually in the range of a few milliseconds, but can be worse – this tool is **not** suitable for benchmarking programs or functions with *very short* runtime! The process to be measured should run *at least* for a couple of seconds, in order to get useful benchmark results. If you need to benchmark functions with *very short* runtime, it is recommended to use [*high-precision timers*](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter) directly inside your program code, rather than launching separate processes.