Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions data/planet/jonludlam/caching-opam-solutions---part-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Caching opam solutions - part 2
description:
url: https://jon.recoil.org/blog/2025/09/caching-opam-solutions2.html
date: 2025-09-23T00:00:00-00:00
preview_image:
authors:
- Jon Ludlam
source:
ignore:
---

<section><h1><a href="https://jon.recoil.org/atom.xml#caching-opam-solutions---part-2" class="anchor"></a>Caching opam solutions - part 2</h1><ul class="at-tags"><li class="published"><span class="at-tag">published</span> <p>2025-09-23</p></li></ul><ul class="at-tags"><li class="notanotebook"><span class="at-tag">notanotebook</span> </li></ul><p>Some results from the <a href="https://jon.recoil.org/caching-opam-solutions.html" title="caching-opam-solutions">previous post</a>. This time I've run day10 on 144 or so commits from opam-repository to see how well the cache performs. The results are quite interesting.</p><p>First let's talk about the "examination map". This is a map from package name to a list of other packages whose solutions should be recalculated if the package in question is altered. It's built by first looking at the packages that the solver asks about during the solution for a package, and then taking <em>all</em> of the solutions, and 'inverting' the map, so for example, if both packages 'a' and 'b' ask about package 'c' during their solutions, then altering 'c' means that the solutions for both 'a' and 'b' need to be recalculated. The examination map entry for 'c' would then be <code>'a'; 'b'</code>. We can plot the histogram of the sizes of each entry in the examination map:</p>
<div class="chart-container">

<object data="examination_map_histogram.svg" type="image/svg+xml" width="100%">

<img src="https://jon.recoil.org/examination_map_histogram.png" alt="Package Examiner Distribution Histogram" style="max-width: 100%; height: auto;">
</object>
</div>
<p>Some interesting features from these data:</p><ul><li>The most common number of observers is 1, meaning that the package is not involved in the solution of any other package. There are approximately 2000 such packages.</li><li>Most (~80%) of packages have fewer than 100 observers. This means that if we alter one of these packages, we only need to recalculate the solutions for fewer than 100 other packages.</li><li>A <em>very</em> small number of packages are observed in all 4,400 solutions. This is actually a bit artificial, as the solver adds the ocaml-compiler package as an input to all solves to ensure we get the correct compiler version. There's another way to do this which would avoid this particular problem.</li><li>A small number of packages have a very large number of observers, around 3800. This mostly corresponds with <code>dune</code> and its dependencies and associated packages. There are around 350 such packages, and any change to these means we need to recalcuate most of the solutions.</li></ul><p>This last point doesn't mean that we actually <em>recompile</em> 3,800 packages, just that we need to recalcualte the solution, which might then lead to a cache hit of the layer and no actual compilation. However, recalculating the solutions of all of the packages takes (on my computer) around 10,000 seconds, or roughly 5 minutes of wall-clock time as I've got 32 threads.</p><p>However, if the package that's changes <i>isn't</i> one of those 350 packages, then the number of solutions that need to be recalculated is dramatically reduced. I ran the logic over the last few weeks of commits to opam-repository, from commit <code>109398e2fd61803126becd398df0f1eabc9f3ca2</code> of the 10th September up until commit <code>3f21ebe342ce440d9c9142ffe1185d8e5a326085</code> from the 22nd. In this time there were 144 commits (counting only those from <code>git log --first-parent</code>). Of these, only 4 resulted in a full resolve - the first commit, since obviously we have no cache at that point, the <a href="https://github.com/ocaml/opam-repository/commit/40283204789e7116e1c99466de902cd565d121cf">release of OCaml 5.4.0 beta2</a> by <a href="https://perso.quaesituri.org/florian.angeletti/">Florian Angeletti</a>, a fix of <a href="https://github.com/ocaml/opam-repository/commit/6ef6813522b6ea29933f6451236a1639bdbaec61">ocaml-base-compiler for MSVC</a> by <a href="https://www.dra27.uk/blog/">David</a> and a fix for <a href="https://github.com/ocaml/opam-repository/commit/d141887ab0b4fc0836ad0787f1f806585a260bc8">BER-OCaml</a> by <a href="https://www.cl.cam.ac.uk/~jdy22/">Jeremy Yallop</a>. Then 25 commits resulted in recalculating solutions for 3800 packages as they hit dune-adjacent packages, 5 commits resulted in recalculating between 100 and 300 packages and the remaining 110 commits resulted in recalculating fewer than 100 packages, the majority of which resulted in recalculating fewer than 5 packages.</p><p>Overall, at a rough estimate, this means that over this period, using this caching strategy gave us a 5x speedup in the solver!</p></section><p>Continue reading <a href="https://jon.recoil.org/blog/2025/09/caching-opam-solutions2.html">here</a></p>