My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was four months ago and about several incremental updates. This post will be of similar nature. The most interesting news is probably the additional data that bootstrap.debian.net now provides. This is listed in the next section. All subsequent sections then list the changes under the hood that made the additions to bootstrap.debian.net possible.
The bootstrap.debian.net service used to have botch as a git submodule but now runs botch from its Debian package. This at least proves that the botch Debian package is mature enough to do useful stuff with it. In addition to the bootstrapping results by architecture, bootstrap.debian.net now also hosts the following additional services:
- History of graph size shows how the dependency graph developed over time for a normal self-contained repository plus for both minimizing strategies, updated every five days
- Crossbuild dependency satisfaction gives an overview of reasons why the crossbuild dependency situation cannot yet be analyzed with bug numbers where applicable
- Crossbuild order modifies the metadata of a repository so that the crossbuild dependency situation can be analyzed and then outputs an overview page as it is done for every architecture on the main page
- Source package importance calculates the port metric for source packages on a daily basis
Further improvements concern how dependency cycles are now presented in the
html overviews. While before, vertices in a cycle where separated by commas as
if they were simple package lists, vertices are now connected by unicode
arrows. Dashed arrows indicate build dependencies while solid arrows indicate
builds-from relationships. For what it's worth, installation set vertices now
contain their installation set in their
Botch has long depended on features of an unreleased version of
in turn depended on an unrelease version of
libcudf. Both projects have
recently made new releases so that I was now able to drop the
submodule and rely on the host system's
dose3 version instead. This also made
it possible to create a Debian package of botch which currently sits at Debian
mentors. Writing the package also finally made me create a usable
install target in the
Makefile as well as adding stubs for the manpages of
the 44 applications that botch currently ships. The actual content of these
manpages still has to be written. The only documentation botch currently ships
botch-doc package is an offline version of the wiki on gitorious.
The new page ExamplesGraphs even includes pictures.
By default, botch analyzes the native bootstrapping phase. That is, assume that
the initial set of
build-essential packages magically
exists and find out how to bootstrap the rest from there through native
compilation. But part of the bootstrapping problem is also to create the set of
build-essential packages from nothing via cross
compilation. Botch is unable to analyze the cross phase because too many
packages cannot satisfy their crossbuild dependencies due to multiarch
conflicts. This problem is only about the dependency metadata and not about
whether a given source package actually crosscompiles fine in practice.
Helmut Grohne has done great work with rebootstrap which is regularly run by jenkins.debian.net. He convinced me that we need an overview of what packages are blocking the analysis of the cross case and that it was useful to have a crossbuild order even if that was a fake order just to have a rough overview of the current situation in Debian Sid.
I wrote a couple of scripts which would run
dose-builddebcheck on a
repository, analyze which packages fail to satisfy their crossbuild
dependencies and why, fix those cases by adjusting package metadata accordingly
and repeat until all relevant source packages satisfy their crossbuild
dependencies. The result of this can then be used to identify the packages that
need to be modified as well as to generate a crossbuild order.
The fixes to the metadata are done in an automatic fashion and do not necessarily reflect the real fix that would solve the problem. Nevertheless, I ended up agreeing that it is better to have a slightly wrong overview than no overview at all.
Minimizing the dependency graph size
Installation sets in the dependency graph are calculated independent from each
other. If two binary packages provide
A, then dependencies on
different installation sets might choose different binary packages as providers
A. The same holds for disjunctive dependencies. If a package depends on
| C and another package depends on
C | A then there is no coordination to
C so to minimize the overall amount of vertices in the graph. I
implemented two methods to minimize the impact of cases where the dependency
solver has multiple options to satisfy a dependency through
The first method is inspired by Helmut Grohne. An algorithm goes through all disjunctive binary dependencies and removes all virtual packages, leaving only real packages. Of the remaining real packages, the first one is selected. For build dependencies, the algorithm drops all but the first package in every disjunction. This is also what sbuild does. Unfortunately this solution produces an unsatisfiable dependency situation in most cases. This is because oftentimes it is necessary to select the virtual disjunctive dependency because of a conflict relationship introduced by another package.
The second method involves
aspcud, a cudf solver which can optimize a
solution by a criteria. This solution is based on an idea by Pietro Abate who
implemented the basis for this idea back in 2012. In contrast to a usual cudf
problem, binary packages now also depend on the source packages they build
from. If we now ask
aspcud to find an installation set for one of the base
source packages (I chose
src:build-essential) then it will return an
installation set that includes source packages. As an optimization criteria the
number of source packages in the installation set is minimized. This solution
would be flawless if there were no conflicts between binary packages. Due to
conflicts not all binary packages that must be coinstallable for this strategy
to work can be coinstalled. The quick and dirty solution is to remove all
conflicts before passing the cudf universe to
aspcud. But this also means
that the solution does sometimes not work in practice.
Botch now finally has a
test target in its
tests two code paths of the
native.sh script and the
Running these two scripts covers testing most parts of botch. Given that I did
lots of refactoring in the past weeks, the test cases greatly helped to assure
that I didnt break anything in the process.
I also added autopkgtests to the Debian packaging which test the same
things as the
test target but naturally run the installed version of botch
instead. The autopkgtests were a great help in weeding out some lasts bugs
which made botch depend on being executed from its source directory.
Reading the suggestions in the Debian python policy I evaluated the
possibility to use Python 3 for the Python scripts in botch. While I was at it
I added transparent decompression for gzip, bz2 and xz based on the file magic,
replaced python-apt with python-debian because of bug#748922 and added
argparse argument parsing to all scripts.
Unfortunately I had to find out that Python 3 support does not yet seem to be possible for botch for the following reasons:
- no soap module for Python 3 in Debian (needed for bts access)
- hash randomization is turned on by default in Python 3 and therefore the graph output of networkx is not deterministic anymore (bug#749710)
Thus I settled for changing the code such that it would be compatible with
Python 2 as well as with Python 3. Because of the changed string handling and
sys.stdout properties in Python 3 this proved to be tricky. On the other
hand this showed me bugs in my code where I was wrongly relying on
deterministic dictionary key traversal.