My last post explained how I retrieved and corrected data from snapshot.debian.org so that dose3 was able to parse it. In this post I will cover some surprising results I found when using my tools on those Packages and Sources files from 2005 until today.
For each pair of Packages and Sources files I did the following:
- created a reduced distribution
- calculated the dependency graph
I call a reduced distribution the smallest set of binary and source packages with the following properties:
- all source packages can be built with the available binary packages
- all binary packages are built from the available source packages
Creating a reduced distribution first, greatly increases the execution speed of my algorithms as it reduces the amount of binary and source packages by an order of magnitude while still preserving the dependency cycle situation of the core packages. In many cases, once the packages of a reduced distribution are available, all the rest of Debian can be compiled from them without any dependency cycles.
As also mentioned in earlier posts, there is always one central, big strongly connected component (SCC) in the dependency graph.
I am especially interested in how the size of the reduced distribution and the SCC change over time as both are an indication of:
- the amount of interdependencies between core packages
- the amount of dependency cycles in the dependency graph
Lets look at the plots I did from the data I gathered. The gray data points indicate that at that point in time, one or more of the core source packages (the ones in the reduced distribution) in Debian Sid was not compilable. This means that the resulting values cannot be fully trusted. But as it is mostly only a single source package that doesnt compile, it doesnt influence the overall result much and therefor I included them anyways. Red and green data points represent a fully successful run.
The only thing that I do not yet understand is what happened in 2007...
So while a potential porter in 2005 only had to look at a graph of 150 nodes, he now needs to solve a graph of nearly 1000 nodes. The amount of edges in the dependency graph grew even more dramatic from about 500 to over 8000 edges.
While the dependency situation for Debian Sid in 2005 can easily be printed using xdot and visually solved, this in not possible anymore in 2012.
While dependencies of only a few dozen source packages had to manually be dropped in 2005, now even dropping build dependencies from a few hundred source packages doesnt solve the dependency situation.
So my assumption is, that due to a growing amount of interdependencies between source and binary packages (as both gain more features), bootstrapping Debian for a new architecture becomes harder over time. Is this also the perceived subjective impression of people that ported Debian in the past?
If my assumption is correct, then there is a growing need for official support of droppable build dependencies (or "stage builds" or "profile builds") to break dependency cycles during the bootstrapping process. Work of a porter would be much easier if source packages would already contain information about what build dependencies can be dropped (if so needed). In the best case, a machine could use those annotations to calculate a build order automatically.
As one can see in the graph above, there are currently 370 source packages in the main SCC. This means that no more than this amount of packages (but probably much less) have to be annotated to break the SCC into a directed acyclic graph.
Discussion about what syntax to use to mark potentially droppable build dependencies currently happens in bug#661538 but should maybe be discussed by a wider audience. The currently favored solution was proposed in said bugreport by Guillem Jover and is called "build profiles". It has the advantage that it is not only trivial to implement (a patch exist for dpkg and dose3 already supports them) but would also be useful for other purposes like embedded builds. The format is similar to how architecture restrictions for individual dependencies are specified but uses "triangular brackets":
Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny
The work Patrick McDermott did for his GSoC project over the summer already uses above syntax.