Caliper: Right-size your CI runners

The problem: CI runners are a black box

How do you know if you're overpaying for CI runners? Is it actually more expensive to run longer on a smaller runner than run shorter on a larger one? You pick a runner size sort of randomly, builds run, and you pay the bill. But is a 32-core runner actually faster than a 16-core one for your builds? Does more RAM help? Without data, you have to just make your best guess.

We built Caliper to answer these questions with actual measurements.

What Caliper does

Caliper is a CLI tool that benchmarks your build commands across different CPU/RAM configurations. It uses Docker containers with resource limits to simulate different runner sizes, runs multiple iterations with a warm-up run, and calculates build time statistics: mean, median, standard deviation, P90, P95, and success rate.

The key feature is matrix mode: give Caliper a list of CPU and RAM values, and it will test every combination automatically and provide stats.

Real results: Benchmarking InfluxDB

We benchmarked the InfluxDB Rust build (cargo clean && cargo build) across 25 configurations on a Hetzner AX162-R dedicated server, 10 runs per configuration:

CPUsRAMMeanMedianStd DevMinMaxSuccess
28 GB6m2s6m2s152ms6m1s6m2s100%
216 GB6m0s6m1s142ms6m0s6m1s100%
232 GB6m1s6m1s545ms5m59s6m1s100%
264 GB6m0s6m0s184ms6m0s6m0s100%
2128 GB6m1s6m2s637ms6m0s6m2s100%
48 GB3m30s3m30s601ms3m29s3m31s100%
416 GB3m28s3m28s684ms3m27s3m29s100%
432 GB3m29s3m29s572ms3m28s3m30s100%
464 GB3m29s3m30s966ms3m28s3m30s100%
4128 GB3m29s3m29s861ms3m28s3m30s100%
88 GB2m41s2m41s1.2s2m38s2m43s100%
816 GB2m39s2m40s2.0s2m36s2m41s100%
832 GB2m40s2m40s1.4s2m37s2m42s100%
864 GB2m39s2m41s3.5s2m33s2m42s100%
8128 GB2m41s2m41s2.2s2m34s2m42s100%
168 GB2m14s2m14s829ms2m13s2m15s100%
1616 GB2m13s2m12s901ms2m11s2m15s100%
1632 GB2m12s2m12s499ms2m11s2m13s100%
1664 GB2m13s2m14s761ms2m12s2m15s100%
16128 GB2m13s2m13s800ms2m12s2m14s100%
328 GB2m12s2m12s831ms2m11s2m13s100%
3216 GB2m11s2m11s1.0s2m9s2m12s100%
3232 GB2m9s2m11s2.6s2m6s2m13s100%
3264 GB2m13s2m12s638ms2m12s2m14s100%
32128 GB2m11s2m12s1.2s2m8s2m13s100%

CPUs scale with diminishing returns

Build Time by CPU Count (8GB RAM)
2 CPU
6m 2s
4 CPU
3m 30s
8 CPU
2m 41s
16 CPU
2m 14s
32 CPU
2m 12s

Going from 2 to 4 CPUs cuts build time nearly in half (6m to 3.5m). 4 to 8 CPUs gives another ~25% improvement. 8 to 16 gives ~17%. Beyond 16 CPUs, there's almost no improvement.

The sweet spot is 4-8 CPUs. A 4-core runner costs 2x more than a 2-core but runs ~1.7x faster, making it roughly cost-neutral with much faster feedback. If you really care about speed, go to 16. Beyond that, you're burning money for no benefit.

RAM doesn't matter above 8GB

Build Time by RAM (4 CPUs)
8 GB
3m 30s
16 GB
3m 28s
32 GB
3m 29s
64 GB
3m 29s
128 GB
3m 29s

At 4 CPUs, build time was 3m 30s with 8GB and 3m 29s with 128GB. The difference is noise. We saw the same pattern across all CPU configurations: RAM simply doesn't affect this Rust build.

Save your money: 8GB is enough.

Your builds will be different

This is a Rust build. JavaScript bundlers, Python test suites, Go compilers, and Java builds all behave differently. Some are memory-bound, some are I/O-bound, some parallelize better than others. The only way to know what's optimal for your builds is to benchmark them yourself.

Try it yourself

Install Caliper:

curl -sSL https://raw.githubusercontent.com/attunehq/caliper/main/install.sh | sh

Run a matrix benchmark (adjust image, command, and configs as needed):

caliper matrix all \ --image ubuntu-2404-go-rust \ --repo https://github.com/org/repo \ --runs 10 \ --command "cargo clean && cargo build" \ --cpus "2,4,8,16" \ --rams "8,16,32,64"

Full documentation and source code are available on GitHub.

About Attune

Attune is an applied AI company building the future of software engineering tools. We love the craft of making software, and we think AI can be a useful tool for serious engineers. You can see more of the things we are working on here.