botch updates

categories: debian

My last update about ongoing development of botch, the bootstrap/build ordering tool chain, was four months ago and about several incremental updates. This post will be of similar nature. The most interesting news is probably the additional data that bootstrap.debian.net now provides. This is listed in the next section. All subsequent sections then list the changes under the hood that made the additions to bootstrap.debian.net possible.

bootstrap.debian.net

The bootstrap.debian.net service used to have botch as a git submodule but now runs botch from its Debian package. This at least proves that the botch Debian package is mature enough to do useful stuff with it. In addition to the bootstrapping results by architecture, bootstrap.debian.net now also hosts the following additional services:

Further improvements concern how dependency cycles are now presented in the html overviews. While before, vertices in a cycle where separated by commas as if they were simple package lists, vertices are now connected by unicode arrows. Dashed arrows indicate build dependencies while solid arrows indicate builds-from relationships. For what it's worth, installation set vertices now contain their installation set in their title attribute.

Debian package

Botch has long depended on features of an unreleased version of dose3 which in turn depended on an unrelease version of libcudf. Both projects have recently made new releases so that I was now able to drop the dose3 git submodule and rely on the host system's dose3 version instead. This also made it possible to create a Debian package of botch which currently sits at Debian mentors. Writing the package also finally made me create a usable install target in the Makefile as well as adding stubs for the manpages of the 44 applications that botch currently ships. The actual content of these manpages still has to be written. The only documentation botch currently ships in the botch-doc package is an offline version of the wiki on gitorious. The new page ExamplesGraphs even includes pictures.

Cross

By default, botch analyzes the native bootstrapping phase. That is, assume that the initial set of Essential:yes and build-essential packages magically exists and find out how to bootstrap the rest from there through native compilation. But part of the bootstrapping problem is also to create the set of Essential:yes and build-essential packages from nothing via cross compilation. Botch is unable to analyze the cross phase because too many packages cannot satisfy their crossbuild dependencies due to multiarch conflicts. This problem is only about the dependency metadata and not about whether a given source package actually crosscompiles fine in practice.

Helmut Grohne has done great work with rebootstrap which is regularly run by jenkins.debian.net. He convinced me that we need an overview of what packages are blocking the analysis of the cross case and that it was useful to have a crossbuild order even if that was a fake order just to have a rough overview of the current situation in Debian Sid.

I wrote a couple of scripts which would run dose-builddebcheck on a repository, analyze which packages fail to satisfy their crossbuild dependencies and why, fix those cases by adjusting package metadata accordingly and repeat until all relevant source packages satisfy their crossbuild dependencies. The result of this can then be used to identify the packages that need to be modified as well as to generate a crossbuild order.

The fixes to the metadata are done in an automatic fashion and do not necessarily reflect the real fix that would solve the problem. Nevertheless, I ended up agreeing that it is better to have a slightly wrong overview than no overview at all.

Minimizing the dependency graph size

Installation sets in the dependency graph are calculated independent from each other. If two binary packages provide A, then dependencies on A in different installation sets might choose different binary packages as providers of A. The same holds for disjunctive dependencies. If a package depends on A | C and another package depends on C | A then there is no coordination to choose C so to minimize the overall amount of vertices in the graph. I implemented two methods to minimize the impact of cases where the dependency solver has multiple options to satisfy a dependency through Provides and dependency disjunctions.

The first method is inspired by Helmut Grohne. An algorithm goes through all disjunctive binary dependencies and removes all virtual packages, leaving only real packages. Of the remaining real packages, the first one is selected. For build dependencies, the algorithm drops all but the first package in every disjunction. This is also what sbuild does. Unfortunately this solution produces an unsatisfiable dependency situation in most cases. This is because oftentimes it is necessary to select the virtual disjunctive dependency because of a conflict relationship introduced by another package.

The second method involves aspcud, a cudf solver which can optimize a solution by a criteria. This solution is based on an idea by Pietro Abate who implemented the basis for this idea back in 2012. In contrast to a usual cudf problem, binary packages now also depend on the source packages they build from. If we now ask aspcud to find an installation set for one of the base source packages (I chose src:build-essential) then it will return an installation set that includes source packages. As an optimization criteria the number of source packages in the installation set is minimized. This solution would be flawless if there were no conflicts between binary packages. Due to conflicts not all binary packages that must be coinstallable for this strategy to work can be coinstalled. The quick and dirty solution is to remove all conflicts before passing the cudf universe to aspcud. But this also means that the solution does sometimes not work in practice.

Test cases

Botch now finally has a test target in its Makefile. The test target tests two code paths of the native.sh script and the cross.sh script. Running these two scripts covers testing most parts of botch. Given that I did lots of refactoring in the past weeks, the test cases greatly helped to assure that I didnt break anything in the process.

I also added autopkgtests to the Debian packaging which test the same things as the test target but naturally run the installed version of botch instead. The autopkgtests were a great help in weeding out some lasts bugs which made botch depend on being executed from its source directory.

Python 3

Reading the suggestions in the Debian python policy I evaluated the possibility to use Python 3 for the Python scripts in botch. While I was at it I added transparent decompression for gzip, bz2 and xz based on the file magic, replaced python-apt with python-debian because of bug#748922 and added argparse argument parsing to all scripts.

Unfortunately I had to find out that Python 3 support does not yet seem to be possible for botch for the following reasons:

  • no soap module for Python 3 in Debian (needed for bts access)
  • hash randomization is turned on by default in Python 3 and therefore the graph output of networkx is not deterministic anymore (bug#749710)

Thus I settled for changing the code such that it would be compatible with Python 2 as well as with Python 3. Because of the changed string handling and sys.stdout properties in Python 3 this proved to be tricky. On the other hand this showed me bugs in my code where I was wrongly relying on deterministic dictionary key traversal.

View Comments