Caliper: Right-size your CI runners
The problem: CI runners are a black box
How do you know if you're overpaying for CI runners? Is it actually more expensive to run longer on a smaller runner than run shorter on a larger one? You pick a runner size sort of randomly, builds run, and you pay the bill. But is a 32-core runner actually faster than a 16-core one for your builds? Does more RAM help? Without data, you have to just make your best guess.
We built Caliper to answer these questions with actual measurements.
What Caliper does
Caliper is a CLI tool that benchmarks your build commands across different CPU/RAM configurations. It uses Docker containers with resource limits to simulate different runner sizes, runs multiple iterations with a warm-up run, and calculates build time statistics: mean, median, standard deviation, P90, P95, and success rate.
The key feature is matrix mode: give Caliper a list of CPU and RAM values, and it will test every combination automatically and provide stats.
Real results: Benchmarking InfluxDB
We benchmarked the InfluxDB Rust build (cargo clean && cargo build) across 25 configurations on a Hetzner AX162-R dedicated server, 10 runs per configuration:
| CPUs | RAM | Mean | Median | Std Dev | Min | Max | Success |
|---|---|---|---|---|---|---|---|
| 2 | 8 GB | 6m2s | 6m2s | 152ms | 6m1s | 6m2s | 100% |
| 2 | 16 GB | 6m0s | 6m1s | 142ms | 6m0s | 6m1s | 100% |
| 2 | 32 GB | 6m1s | 6m1s | 545ms | 5m59s | 6m1s | 100% |
| 2 | 64 GB | 6m0s | 6m0s | 184ms | 6m0s | 6m0s | 100% |
| 2 | 128 GB | 6m1s | 6m2s | 637ms | 6m0s | 6m2s | 100% |
| 4 | 8 GB | 3m30s | 3m30s | 601ms | 3m29s | 3m31s | 100% |
| 4 | 16 GB | 3m28s | 3m28s | 684ms | 3m27s | 3m29s | 100% |
| 4 | 32 GB | 3m29s | 3m29s | 572ms | 3m28s | 3m30s | 100% |
| 4 | 64 GB | 3m29s | 3m30s | 966ms | 3m28s | 3m30s | 100% |
| 4 | 128 GB | 3m29s | 3m29s | 861ms | 3m28s | 3m30s | 100% |
| 8 | 8 GB | 2m41s | 2m41s | 1.2s | 2m38s | 2m43s | 100% |
| 8 | 16 GB | 2m39s | 2m40s | 2.0s | 2m36s | 2m41s | 100% |
| 8 | 32 GB | 2m40s | 2m40s | 1.4s | 2m37s | 2m42s | 100% |
| 8 | 64 GB | 2m39s | 2m41s | 3.5s | 2m33s | 2m42s | 100% |
| 8 | 128 GB | 2m41s | 2m41s | 2.2s | 2m34s | 2m42s | 100% |
| 16 | 8 GB | 2m14s | 2m14s | 829ms | 2m13s | 2m15s | 100% |
| 16 | 16 GB | 2m13s | 2m12s | 901ms | 2m11s | 2m15s | 100% |
| 16 | 32 GB | 2m12s | 2m12s | 499ms | 2m11s | 2m13s | 100% |
| 16 | 64 GB | 2m13s | 2m14s | 761ms | 2m12s | 2m15s | 100% |
| 16 | 128 GB | 2m13s | 2m13s | 800ms | 2m12s | 2m14s | 100% |
| 32 | 8 GB | 2m12s | 2m12s | 831ms | 2m11s | 2m13s | 100% |
| 32 | 16 GB | 2m11s | 2m11s | 1.0s | 2m9s | 2m12s | 100% |
| 32 | 32 GB | 2m9s | 2m11s | 2.6s | 2m6s | 2m13s | 100% |
| 32 | 64 GB | 2m13s | 2m12s | 638ms | 2m12s | 2m14s | 100% |
| 32 | 128 GB | 2m11s | 2m12s | 1.2s | 2m8s | 2m13s | 100% |
CPUs scale with diminishing returns
Going from 2 to 4 CPUs cuts build time nearly in half (6m to 3.5m). 4 to 8 CPUs gives another ~25% improvement. 8 to 16 gives ~17%. Beyond 16 CPUs, there's almost no improvement.
The sweet spot is 4-8 CPUs. A 4-core runner costs 2x more than a 2-core but runs ~1.7x faster, making it roughly cost-neutral with much faster feedback. If you really care about speed, go to 16. Beyond that, you're burning money for no benefit.
RAM doesn't matter above 8GB
At 4 CPUs, build time was 3m 30s with 8GB and 3m 29s with 128GB. The difference is noise. We saw the same pattern across all CPU configurations: RAM simply doesn't affect this Rust build.
Save your money: 8GB is enough.
Your builds will be different
This is a Rust build. JavaScript bundlers, Python test suites, Go compilers, and Java builds all behave differently. Some are memory-bound, some are I/O-bound, some parallelize better than others. The only way to know what's optimal for your builds is to benchmark them yourself.
Try it yourself
Install Caliper:
curl -sSL https://raw.githubusercontent.com/attunehq/caliper/main/install.sh | shRun a matrix benchmark (adjust image, command, and configs as needed):
caliper matrix all \
--image ubuntu-2404-go-rust \
--repo https://github.com/org/repo \
--runs 10 \
--command "cargo clean && cargo build" \
--cpus "2,4,8,16" \
--rams "8,16,32,64"Full documentation and source code are available on GitHub.
About Attune
Attune is an applied AI company building the future of software engineering tools. We love the craft of making software, and we think AI can be a useful tool for serious engineers. You can see more of the things we are working on here.