port bootstrap build-ordering tool report 2

Timestamp: 2012-06-17 19:35
Tags: blog

A copy of this post is sent to soc-coordination@lists.alioth.debian.org as well as to debian-bootstrap@lists.mister-muffin.de.

Diary

June 4

I added the first version of basenocycles.ml to git. Given an initial set of cross built packages, it tries to compile as much as possible on the resulting system in multiple rounds.

June 5

During June 3, I discovered and error in my program that would only come up when using the Debian Sid package lists as the input:

Fatal error: exception Assert_failure("common/edosSolver.ml", 610, 11)

On this day, June 5, I wrote a minimal test case for this problem.

The same day, Pietro figured out that this is a bug in dose which will be fixed in the next release.

Begin writing code to figure out how important a binary package is for the further build process.

Try to use Depsolver.edos_install to find out what packages are needed to make debhelper available.

Restructure basenocycles.ml, exclude source packages that already have been built, still trouble with already existing binary packages and Cudf.mem_installed, comment stuff better.

June 6

I wrote some crude code (only estimate, not correct, fixed later) that would give a rough overview of how often a given binary package is directly required as a build dependency.

Debhelper came out as the most needed package. It is architecture:all, so it does not have to be built but it has unfulfilled runtime dependencies. To make those packages available, 13 (actually 11, fixed later) packages have to be compiled on Ubuntu Natty. But those packages all (except gettext) require debhelper itself to be built. The first dependency cycle.

This dependency cycle (actually, the 12 cycles) can be broken by either cross compiling those source packages or by making them build without debhelper. One goal of the program is to help decide what the easier option is, but this is not yet implemented.

To play around a bit, I created the possibility to specify a list of packages that are additionally to the minimal set of cross compiled packages also cross compiled. I added the 13 packages found above to the list, thus making the binary packages they build available. This made debhelper available in the system.

As a result, 1625 out of 3339 source packages can be built with just a minimal build system (priority:essential packages plus build-essential) and debhelper available.

The next package that blocks the most other source packages from being built is cdbs. The next nine packages in that list also require cdbs so it seems to be the next important package to be available.

Pietro’s suggestions make me:

 - do not open BootstrapCommon but ExtLib, Common, Algo, Debian
 - do proper option parsing and logging
 - use Debcudf.ignore_essential = true
 - do Debcudf.init_tables (binlist@srclist)
 - use @ with shorter list first
 - use more List.rev_append instead of @
 - use CudfAdd.who_provides to find if a package is available

June 7

Pietro and Patrick suggest that for solving the debhelper cycles, one could provide a minimal debhelper version so that the above list of 12 packages can be built without debhelper.

I try to figure out how to get a list of packages that are missing to make a package installable/buildable. This functionality should be provided in dose but I fail to find it.

June 8

Lacking a solution of the problem of June 7, I write a mail to Pietro.

I start my first graphs in ocaml using the ocamlgraph library.

The graph I generate, starts off at a binary package. For each binary package it connects new vertices as its runtime dependencies. If a binary package is not arch:all and also not yet otherwise compiled, its source package is also added.

The result is a graph in which set of source packages in it will make the initial package available, if those source packages would be cross compiled.

The graph is extended further than the source packages.

June 9

I refine a couple of functions, make univ_get_pkg_by_name return the package with the highest version number.

I wrote a rather lengthy (1027 words) email to the list that explains my status as of this day.

I can create graphviz dot files with ocaml, can create node and edge types and create the graph by an imperative pattern that I saw a lot in Pietro’s code.

Disjunctions are not yet handled correctly (see mail from June 8).

The graphs generated look like the following: http://mister-muffin.de/p/8nyc.png

June 11

I write a test case which shows how CudfAdd.who_provides doesnt find virtual packages.

Automate the process of finding the packages that, if cross compiled, would make another package available.

Add more predicates (identifying source packages) and improve input file reading code.

Move build_compile_rounds which compiles as many source packages as it can in multiple rounds on a given minimal system a toplevel function and thereby abstract it more.

Create a rudimentary text based menu to choose different actions to take for an analysis.

Start writing an extended version of simple_dependency_graph for deeper analysis.

Use xdot to show graphs from the text menu. Allow saving those graphs to a file.

June 12

Move functionality from the extended version of simple_dependency_graph over to the normal version and delete the extended version.

Add the new Closure vertex type.

Create extended_dependency_graph which is supposed to not contain single binary package vertices but handle a package and its installation set as one vertex.

The output of extended_dependency_graph is optionally reduced to the biggest (non degenerate) strongly connected component.

User gets the option of choosing the exploration depth.

June 13

Pietro replies to my mail from June 8 but apparently I failed to express myself well enough in my last mail, so I rephrase my question.

Pietro replies to my email from June 11 and explains how the effect I see is due to “a nuisance of the debian to cudf encoding”. As a result I change my code accordingly.

June 14

Another lengthy (1130 words) email to the list. I explain what was done in the past days, what parts work and how they work. I list some rationales on why I did things the way I did them.

The most important observation is, that after improving my code again and again, I ended up representing the dependency cycle problem in the same (very similar) way that Pietro suggested in the beginning. This is probably a good discovery.

Lots of data of that email is now of only little use as of June 16, I make lots of improvements in correctness.

As I dont have an answer to my other email to Pietro from June 13, I implement a very crude way to get an answer to the question of what packages are missing for a package to be available/compileable. I called it flatten_vpkgformula_best_effort and it suffers from many faults including disjunctions and package conflicts.

Patrick spots a problem. As a result, I make sure that at no point, the source package of an arch:all package can be listed.

June 15

As a reply to my mail from June 13, Pietro creates a new branch in the git and adds the code I needed to get a proper installation set.

June 16

As a result of Pietro’s efforts from June 15, I make great advancements on all fronts.

Details of the current status follow in the next section.

Results

A big leap was made on June 16 due to Pietro’s great help on making me understand how Depsolver.listcheck can be used for my purposes. My difficulties in finding the solution myself are rooted in many parts of the dose framework being poorly commented but Pietro did already a couple of documentation commits whenever things were unclear for me.

Using Depsolver.listcheck makes it possible to be more distribution agnostic and I dont have to handle vpkgs, virtual packages and constraints myself anymore. The code also doesnt suffer anymore by wrongly analyzed dependencies and conflicts. The only thing that is not yet taken care of, is that Depsolver.listcheck only chooses one out of several possible installation set. A final version should be able to take into account that a different installation set could provide a better solution.

Overall, in comparison to two weeks ago, I can now properly build, traverse and analyze graphs, can choose an installation set properly, understand more about dependencies, closures, dose and ocaml in general.

Finding the importance of binary packages for building

When calculating how many source packages are depending on the availability of a binary package I originally flattened the pkg.Cudf.depends list twice for a rough overview. This is of course wrong due to disjunctions and conflicts and also doesnt provide a deep dependency inspection. The new method is to calculate an installation set that is necessary to compile a source package for every source package. The resulting list of binary packages is then used to find out how often a binary package appears in an installation set.

I see three drawbacks though:

calculating an installation set for each source package in the archive is very slow
if X packages build depend on A then also X packages will build depend on the installation set of A, resulting in lots of duplication
only one installation set is selected though there are many

Removing simple graph

The simple graph which contained single binary and source packages was removed. I realized it doesnt really serve any purpose to look at it. As a result, Bin vertices and InstallDep edges are also not part of the graph anymore. Since it was responsible for generating the list of source packages that have to be cross built to make a package available, I created a new function get_cross_source_packages which uses an installation to serve the same purpose.

Fix extended_dependency_graph

extended_dependency_graph now uses installation sets for generating the list of packages that is needed to compile a source package or install a binary package. The list of build dependencies does not include packages that are already installable. The list of runtime dependencies does not include packages that are otherwise available (cross built, arch:all…). Instead of checking for list membership all the time, I created hash tables for the list of installable as well as for the list of available binary packages.

Future

There are two big tasks for the next two weeks:

Task one is to find a way to give hints on which packages to best consider for having reduced build dependencies. This would then probably finally make use of Pietro’s cycle algorithms.

Task two is to find a way to break cycles and create a build-DAG from a list of packages that already have staged build dependency information.

Patrick is currently working on patching dpkg with Build-Depends-StageN dependencies as making perl cross compilable. If he doesnt need the ability to decide which packages to modify to have staged build dependencies in the near future, then task one is probably less urgent and therefor of less importance right now?

On the other hand, I can easily generate fake reduced build dependencies so that doing task two right now would not be a problem. Also, having the solution for task two will make it possible to show the user what effect it would have to add reduced build dependencies to a package.

For the reasons above (it’s not urgent, task one profits from task two being solved) I will go and implement task two first (if there are no objections from my mentors).

Another idea, that I discussed with wookey and Patrick yesterday, was that due to multiarch being used for more and more packages, there should exist a set of packages that is cross compilable without any change to the package.

We agreed that I make a list of packages that, if cross compiled, would break dependency cycles and make other packages available. I created such a list of about 160 packages for Ubuntu Natty that, if cross compiled, made it possible to have 87% of Natty available (those numbers have to be treated with caution as I did not yet use the proper way of installation sets when generating that list, but the order of magnitude should be correct). Wookey can then try to cross compile those packages. If some packages of those “crucial” source packages are found to be cross compilable, then they should be cross compiled because it means that no work has to be done to break some cycles. Cross compiling all packages that are cross compilable out of the box is no solution, as only natively compiled packages can go into the archive. This is why the list of potentially additionally cross compiled source packages has to be kept as small as possible.