**TimedExec** is a small utility for *benchmarking* command-line programs. It will *execute* the specified program with the specified command-line arguments and then *measure* the time that it takes for the execution to complete. In order to obtain *accurate* results, all measurements are implemented via *high-resolution* performance timers. And, since program execution times unavoidably are subject to certain variations (e.g. due to environmental noise), each test will be repeated *multiple* times. The number of metering passes can be configured as desired. Optionally, a number of "warm-up" passes can be performed *prior to* the first metering pass. The warm-up passes prevent caching effects from interfering with the execution times.
TimedExec will then compute the ***mean*** execution time as well as the ***median*** execution time of all metering passes. It will also record the *fastest* and *slowest* execution time that has been measured. Furthermore, TimedExec computes the *standard error* in order to determine [***confidence intervals***](http://www.uni-siegen.de/phil/sozialwissenschaften/soziologie/mitarbeiter/ludwig-mayerhofer/statistik/statistik_downloads/konfidenzintervalle.pdf) from the benchmarking results. These are the *ranges* which contain the program's “real” average execution time (expected value), *with very high probability*. All results will be saved to a log file.
*TimedExec* uses a very simple command-line syntax. Just type **`TimedExec`**, followed by the program that you want to benchmark. Optionally, any number arguments can be appended; these parameters will be passed to the program.
In the following example we use *TimedExec* to benchmark the program **`ping.exe`** with the arguments **`-n 12 www.google.com`**. By default, the command will be executed *five* times, preceded by a single "warm-up" pass:
When comparing measurement results, the ***mean*** (average) execution time may seem like the most obvious choice. However, it has to be noted that the *mean* of a data sample is highly sensitive to “outliers” and therefore can be misleading! This is especially true, when there exists a lot of variation in the data sample. Consequently, comparing the ***median*** execution times often is the better choice. That is because the *median* of a data sample is much more robust against outliers.
Furthermore, it is important to keep in mind that the *mean* (or *median*) execution time computed from a limited number of metering passes only yields an ***estimate*** of the program's “real” average execution time (expected value). The “real” value can only be determined accurately from an *infitinte* number of metering passes – which is **not** possible in practice. In this situation, we can have a look at the ***confidence intervals***. These intervals contain the “real” value, *with very high probability*. The most commonly used *confidence interval* is the “95%” one (higher confidence means broader interval, and vice versa).
Simply put, as long as the confidence intervals of program A and program B *overlap* (at least partially), we **must not** conclude that either of these programs runs faster (or slower) in the average case. ***No*** conclusion can be drawn in that case!