A tool for benchmarking compile-time (type-checking time) performance of Agda programs. It is implemented as an Agda backend, which means that it functions as a full Agda compiler, with additional arguments for the benchmarking. The benchmarking is done using the Criterion library.
A criterion benchmark is created for each top-level value in the given Agda
module whose name starts with bench, and calling the tool with an Agda file
will run each of the benchmarks and measure the time it takes to reduce them to
weak-head normal form (or normal form if the --nf flag is given). For
instance, given the module
module Example where
open import Agda.Builtin.Nat
open import Agda.Builtin.List
open import Agda.Builtin.Equality
downFrom : Nat → List Nat
downFrom zero = []
downFrom (suc n) = n ∷ downFrom n
sum-rec : List Nat → Nat
sum-rec [] = 0
sum-rec (x ∷ xs) = x + sum-rec xs
sum-acc : Nat → List Nat → Nat
sum-acc z [] = z
sum-acc z (x ∷ xs) = sum-acc (z + x) xs
sum-acc! : Nat → List Nat → Nat
sum-acc! z [] = z
sum-acc! 0 (x ∷ xs) = sum-acc! x xs
sum-acc! z (x ∷ xs) = sum-acc! (z + x) xs
n = 10000
bench-rec = sum-rec (downFrom n)
bench-acc = sum-acc 0 (downFrom n)
bench-acc! = sum-acc! 0 (downFrom n)we can invoke agda-bench as
$ agda-bench Example.agda
Benchmarks:
bench-rec = 49995000
bench-acc = 49995000
bench-acc! = 49995000
benchmarking bench-rec
time 16.71 ms (16.12 ms .. 17.60 ms)
0.986 R² (0.968 R² .. 0.997 R²)
mean 17.19 ms (16.78 ms .. 17.76 ms)
std dev 1.203 ms (822.0 μs .. 1.709 ms)
variance introduced by outliers: 29% (moderately inflated)
benchmarking bench-acc
time 31.69 ms (30.80 ms .. 32.71 ms)
0.997 R² (0.994 R² .. 0.999 R²)
mean 32.35 ms (31.67 ms .. 33.92 ms)
std dev 2.064 ms (1.061 ms .. 3.637 ms)
variance introduced by outliers: 22% (moderately inflated)
benchmarking bench-acc!
time 10.61 ms (10.52 ms .. 10.71 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 10.83 ms (10.73 ms .. 10.93 ms)
std dev 266.5 μs (203.0 μs .. 381.2 μs)
Individual benchmarks can be run by giving the -B name-of-benchmark flag. The
-B flag passes along its argument to the Criterion driver. Use -B --help
for more information.
To benchmark an expression that isn't bound to a top-level bench definition you can
use the -C EXPR flag:
$ agda-bench Example.agda -C "sum-rec (downFrom 100)"
benchmarking sum-rec (downFrom 100)
time 107.0 μs (105.5 μs .. 108.7 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 107.6 μs (106.7 μs .. 109.0 μs)
std dev 3.532 μs (2.379 μs .. 4.833 μs)
variance introduced by outliers: 32% (moderately inflated)
Instead to measuring evaluation time you can measure type checking time of an
expression using the -T (or --type-check) flag:
$ agda-bench Example.agda -T "refl : sum-rec (downFrom 100) ≡ 4950"
time 252.0 μs (248.6 μs .. 255.0 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 249.7 μs (247.1 μs .. 253.3 μs)
std dev 9.637 μs (7.431 μs .. 13.56 μs)
The full list of options (not including the Criterion options) are:
benchmark backend options
-B ARGS --bench-options=ARGS Benchmarking options. Use -B --help for more information.
-s EXPR --single=EXPR Evaluate a single expression
-C EXPR --custom=EXPR Add a custom benchmark for EXPR
-T EXPR : TYPE --type-check=EXPR : TYPE Add a custom benchmark to type check EXPR against TYPE
-n --nf Full normalisation instead of weak-head reduction
--call-by-name Use call-by-name