Automatically devising a build order that allows to bootstrap Debian, currently fails (amongst other reasons) because of the lack of metadata information about which build dependencies can potentially be dropped from source packages. If that information was available, an algorithm could decide which build dependencies to drop so that dependency cycles can be broken.
Finding droppable build dependencies of a source package is something only humans can do. This is because it involves to manually analyze and test the build system of a source package. Build systems are neither uniform nor do they encode their dependencies in a way that can directly be mapped to Debian packages. Therefor they are not machine readable.
One idea to solve the dilemma, is to find a Linux distribution that provides the following:
- allows to do "profile builds" of its source packages with different features enabled or disabled
- stores information about which feature requires which build dependency
- stores everything in a format that can be parsed and analyzed
- covers a similar range of software packages as Debian does
If such a distribution can be found then the information from it can be used to find dependencies that can also be dropped from Debian source packages.
Gentoo is a distribution that fulfills above requirements through so called USE flags that allow to enable or disable features during compilation. Dependencies of Gentoo source packages are stored in .ebuild files that control the build process. Since .ebuild files are bash scripts, parsing them is not trivial. I therefor used the emerge software package to extract that information. Thanks to the well written emerge code and to quick help in the Gentoo IRC channel, it didnt take long to make the code run on Debian. My sourcecode is downloadable here:
Before I list the results of using Gentoo USE flags to determine dependencies that can potentially be dropped from Debian source packages, let me list the problems that this method entails.
Only package name matching, no version matching
When writing the mapping from Debian to Gentoo packages and back I discard version information. There are just too many versions that either Debian or Gentoo have and are not present in the other. So the assumption is, that Debian Sid and Gentoo have both the most recent major versions of upstream software which has roughly the same requirements in terms of build dependencies.
Gentoo packages are matched to Debian source packages
In Gentoo there are only source packages and no binary packages. So I map Gentoo packages to Debian source packages. But Gentoo source packages build depend on other source packages while Debian source packages depend on binary packages. So at some point I have to translate Gentoo packages to Debian source packages and those source packages to Debian binary packages. I do this by analyzing the original binary package build dependencies of a Debian source package and then filter out those binary packages as being droppable that are built by the Debian source packages that were found to be droppable.
Not the exact same package set
There is some software that is only in Gentoo and some that is only in Debian. Debian and Gentoo also split some source packages differently.
Gentoo has more direct dependencies
Many build dependencies in Debian are indirectly pulled in through dependencies of direct build dependencies. In Gentoo source packages directly depend on most things they need to build successfully. This leads to the list of dependencies in Gentoo to be much larger than the list of dependencies in the corresponding Debian source package. It also means that lots of dependencies that can be dropped in Gentoo are not found to be droppable in Debian because they are not direct dependencies of that source package.
There are no implicit dependencies
Gentoo will often drop dependencies that are essential or build-essential packages in Debian and are therefor implicit build dependencies that cannot be dropped.
Despite the many problems, the result doesnt look too wrong. I got some Debian source packages that were found to have droppable build dependencies from Thorsten Glaser and all dependencies that Gentoo found to be droppable were also dropped by him.
To put everything into numbers: the current 912 nodes big SCC in Debian Sid can be reduced to 6 individual SCC with 422, 5, 5, 3, 2 and 2 nodes each. So using Gentoo cuts the size of the central component to more than half.
Surely, there will be a number of dependencies that were found to be droppable in Gentoo but are actually not droppable in Debian. The point is, that it is better to have "some" data even if it contains false positives than no data at all. It is easier for a human to verify if some suggested droppable build dependencies are actually correct than going through hundreds of source packages with thousands of dependencies manually.