visit at Paris IRILL

categories: debian

Last week, I was invited to give a talk about my Debian bootstrapping efforts at IRILL in Paris. The slides of my talk are online as pdf.

The time I spent in Paris with Pietro Abate was very fruitful. I have to thank him and Roberto di Cosmo for inviting me and even compensating for my travel expenses.

The things we actually managed to implement during my visit:

  • removed huge chunks of code that were not needed anymore, making everything more concise and pretty: ended up removing over 1600 lines
  • basebuildsystem.ml can now fill add-cross-sources.list with sources for debhelper (as debhelper is quite build-essential)
  • start evaluating Gentoo as a source for reduced build dependencies
  • add unit test skeleton and material
  • compile with dose3 master
  • add graphML output
  • feed graphs into analysis tools for visualization

The two most important things (in my opinion) that we came up with for future implementation, were the idea to harvest reduced build dependency information from Gentoo as well as finding a flaw in the way the current dependency graph relates to binary packages and their installation sets.

I am still busy with evaluating output from my trials with Gentoo, so I will cover this topic in a later blog post once I generated the actual impact on the dependency graph. The current status is, that Gentoo USE flags allow me to find possibly droppable build dependencies for 250 out of the 350 interesting Debian source packages that are part of the main scc.

Pietro also found a current flaw in how the dependency graph is generated. While source package A and B might both depend on binary package C (and its installation set), it is wrong to add a dependency to C from both source packages without further verification. Due to virtual packages and disjunctive dependencies, C might have many possible installation sets. Only one of them is chosen in the current code. The problem is, that this one chosen set might conflict with the other build dependencies of source packages depending on C. Therefor, there must exist multiple binary package nodes C, each with a different installation set, dynamically generated as they are needed. Source packages must point to the node for C that possesses an installation set that doesnt conflict with its own build dependencies.

Other TODO notes that we came up with and that I will be implementing are:

  • integrating dose3 as a git submodule
  • create a proper build system
  • try using a different cudf solver
  • unit tests
  • finally coming up with a name (suggestions welcome - I'm bad at name-finding)
  • building a Debian package (depends on having a name first)
  • formalize/visualize/document the current algorithms
  • to break the main scc, use additional heuristics like:
    • the order induced by reduced_dist to classify nodes in the graph
    • centrality/distance in graph
    • comparing scc of different Debian snapshots with each other

As I cannot always depend on new dose3 versions being pushed to Debian Sid right after their release, I will do the dose3 git submodule integration over the weekend. This will allow me to evaluate the results I got from my evaluation of Gentoo USE flags that I gathered over the past week.

View Comments

Bootstrappable Debian - How to help

categories: debian

TLDR: multiarch, multiarch, multiarch, cross buildability, staged build dependencies, wiki page, corrections/hints/requests to debian-bootstrap at lists.mister-muffin.de

This summer (and this year's GSoC) is nearing its end and to make it easier for people to make use of the information my tools produced so far, I created a page in the Debian wiki. It lists not only the open issues I see but also statistics that I gathered using the output of my GSoC project. I want to use this blog post to make people aware of that page as well as to get some feedback on it and anything related to it.

The biggest blocker my tools face, is that many packages are still missing multiarch information. As long as at least the basic packages do not have their cross build dependencies satisfied via multiarch for an existing foreign architecture, automated tools can not properly analyze the dependency situation in the bootstrapping case, when many packages of the new foreign architecture do not even exist yet.

If Debian is supposed to be bootstrappable, then the first stage is to make a set of basic packages cross compile for an existing foreign architecture. Once this is possible, a tool of mine can analyze the cyclic build dependency situation that might occur when cross compiling for an architecture that does not exist yet. Then, staged cross builds can be used to cross compile a minimal foreign system. Due to missing multiarch classification, it is not known yet how big the cyclic build dependency situation is for the base packages.

It is not only the conversion of packages to multiarch that is needed but also the adding of the :any (and rare cases :native) qualifier to build dependencies on M-A: allowed packages. Prominent build dependencies that should (but are not yet) be M-A: allowed are python and gettext. Both are needed as a build dependency by many packages of the base system.

Unfortunately wanna-build does not understand qualifiers like :any and :native yet. Until it does, no package can be marked :any or :native and cross compilation of many base packages can not succeed.

Once the point is reached, where a base system can be cross compiled from nothing, native compilation can start. Since native compilation doesnt depend on multiarch, the dependency situation when trying to natively compiling all of Debian from nothing is understood much better. Unfortunately, the cyclic build dependency situation is also much worse in the native case and there exists a big 1000 node strongly connected component of binary and source packages that all interdepend on each other.

This dependency mess can be solved using three approaches:

The wiki page gives many hints on how to find packages that each method can be applied to.

Stage building is a tool that might be useful for cross building (we dont know for sure yet) but is definitely needed for native compilation. It is needed for native compilation because after all possible dependencies are moved to Build-Depends-Indep, the only other alternative to stage building for breaking dependency cycles is to cross build source packages. Since building a package without one of its build dependencies "staged" is often much easier than making the package in question cross compile, it is a preferred alternative. Once more packages have been made multiarch, it might be possible to prove that there is no alternative to introducing a notion of staged builds.

Some people (wookey, Patrick McDermott, Guillem Jover, myself) decided that the following format to mark staged build dependencies would be preferred over others:

Build-Depends: huge (>= 1.0) [i386 arm] <!embedded !bootstrap>, tiny

The <> format was proposed by Guillem Jover in bug#661538. Patches for dpkg and dose3 are done. More people need to discuss about this format for a final decision on how to indicate staged build dependencies.

For more information on the topic, have a look at the corresponding wiki page. Feel free to direct any comments/critique/hints to debian-bootstrap at lists.mister-muffin.de or directly to me.

View Comments

port bootstrap build-ordering tool report 1

categories: debian

A copy of this post is sent to soc-coordination@lists.alioth.debian.org as well as to debian-bootstrap@lists.mister-muffin.de.

Diary

May 21

  • Cloned Dose3 and made it build
  • Retrieved bootstrap.ml and bootstrap2.ml from old revisions as they were deleted
  • Compiled, tested and investigated the functionality of bootstrap.ml and bootstrap2.ml on a theoretical level as no test data was available

May 22

  • Pietro sends me a tarball with his current version of bootstrap.ml and dummy as well as real test data
  • Created a gitorious account, project and repository
  • Compiled, tested and investigated his code
  • Ran into several runtime problems with the supplied dummy examples
  • Created Makefile to automatically fill ./examples/real/
  • Found that .dot files are too big to be rendered
  • Trying to figure out how hints work, how base-system was generated and why execution takes hours

May 23

  • Pietro made examples work which let me understand the code much more
  • Improvement of .dot output and output formatting
  • Refactored code into bootstrapCommon.ml for shared functionality and bootstrap.ml for option parsing and main()

May 24

  • Play with xdeb.py
  • Generate dot graphs with bootstrap.ml and analyze them with sccmap
  • Try to find a way to have a reduced package selection other than main archives of ubuntu/debian
  • Initial work on trying to find the list of minimal source packages that have to be cross compiled
  • Create debian-bootstrap@lists.mister-muffin.de mailinglist

May 25

  • Implement a replacement for apt-rdepends and grep-dctrl functionality in ocaml, both working on Package files
  • Retrieve list of packages with priority:required
  • Retrieve their runtime dependencies
  • Retrieve the packages that are added with build-essential and dependencies
  • Retrieve the list of source packages that are needed to build the above
  • Retrieve list of binary packages that are build from the source packages in addition
  • some more functionality in the Makefile

May 29

  • Depsolver.dependency_closure replaces homebrew functionality in a better and faster way
  • Only consider those binary packages that can actually be installed, given the limited amount of available packages using Depsolver.edos_install
  • Create proper list diff by correctly comparing Cudf.package members

May 30

  • Big code restructuring
  • consider arch:all packages to be available by default
  • Got helpful sourcecode comments by Pietro

May 31

  • Use Depsolver.trim to reduce a universe to the installable packages
  • Compile with dose 2.9.17

June 1

  • Basebuildsystem now also writes output to min-cross-sources.list and base-system.list
  • Begin work on basenocycles.ml to see how much the minimal system can build without cycle breaking

June 2

  • Use Depsolver.trim to find source packages that can be built given the restricted universe
  • Find the final list of packages that are available without solving staged build dependencies for Natty
  • Many code simplifications

Results

I learned a good chunk of ocaml and how to use dose3 and libcudf.

I created a gitorious project and a git repository for all the sourcecode.

git clone git://gitorious.org/debian-bootstrap/botch.git

The git as of now contains 30 commits and 1197 lines of ocaml code.

So far, 62 emails have been exchanged between me and Pietro and Wookey.

I created a mailinglist for this project where all email exchange so far is publicly accessible in the archives. You can also download all of the email exchange in mbox format. Everybody is welcome to join and/or read the list.

What seems to be finished: the program that finds the minimal amount of source packages that have to be cross compiled to end up with a minimal build system. What it does is:

  1. get all essential packages
  2. get their runtime dependencies
  3. get build-essential plus runtime dependencies
  4. get all source packages that are necessary to build 1.-3. those are the packages that have to be cross compiled
  5. get a list of all packages that are built by source packages from 4.
  6. add all packages from 1.,2.,3. and 5. plus all arch:all packages to a universe
  7. use Depsolver.trim on that universe to figure out which of those packages are actually installable

The result of 7. will then contain a list of packages that are available automatically on the foreign system due to cross compiled source packages and arch:all packages.

For Debian Sid, the output of my program is:

# (1) number of packages with priority:required: 62
# (2) plus, number of dependencies of priority:required packages: 20
# (3) plus, build-essential and dependencies: 31
# number of source packages to build the above: 71
# number of additional packages built from the above source packages: 292
# (4) number of packages of those plus arch:all packages that are installable: 6421
# total number of installable packages (1)+(2)+(3)+(4): 6534

For Ubuntu Natty it is:

# (1) number of packages with priority:required: 96
# (2) plus, number of dependencies of priority:required packages: 7
# (3) plus, build-essential and dependencies: 31
# number of source packages to build the above: 87
# number of additional packages built from the above source packages: 217
# (4) number of packages of those plus arch:all packages that are installable: 2102
# total number of installable packages (1)+(2)+(3)+(4): 2236

So for Debian, 71 source packages definitely have to be made cross compilable while for Natty, the number is 87.

The last two days I was toying around with these minimal systems to see how big the number of source packages is, that can be built on top of them without running into dependency cycles. After installing the binary packages that were built, I checked again until no new packages could be built.

For Natty, I was only able to find 28 additional packages that can be built on top of the 2236 existing ones. This means that a number of dependency cycles prevent building anything else.

In the coming two weeks I will focus on coming up with a tool that cleverly helps the user to identify packages that would be useful to have for building more packages (probably determined by how many packages depend on it - debhelper is an obvious candidate). The tool would then show why that crucial package is not available (in case of debhelper because some of its runtime dependencies are not available and require debhelper to be built) and how the situation can best be resolved. The possible methods to do so are to identify a package that is part of a cycle and either cross compile it or let it have staged build dependencies.

View Comments

cross-compilable and bootstrappable Debian

categories: debian

When packaging software for Debian, there exist two important assumptions:

  1. Compilation is done natively
  2. Potentially all of Debian is available at compile time

Both assumptions make the life of a package maintainer much easier and they do not create any problem unless you are one of the unlucky few who want to run Debian on an architecture that it does not yet exist for.

You will then have to use either cross compile a set of base packages (which is hard because packages are built and tested to built natively, not cross - perl is a big blocker of building the minimal set of packages cross but through multiarch other packages become easier to cross build) or use other distributions like OpenEmbedded or Gentoo which you compiled (or retrieved otherwise) for that new architecture to hack a core of Debian source packages until they build a minimal Debian system that you can chroot into and continue natively building the rest of it. But even if you manage to get that far you will continue to be plagued by cyclic build and runtime dependencies. So you start to hack source packages so that they drop some dependencies and you can break enough cycles to advance step by step.

The Debian ports page lists 24 ports of Debian, so despite its unpleasant nature, porting it is something that is not done seldom.

The process as laid out above has a number of drawbacks:

  • The process is mostly manual and reinvented every time it is done.
  • If you can't cross compile something, then you need another distribution for the bootstrapping process. Debian itself should be sufficient.
  • Its complexity and manual nature prevents architectures with little workforce behind them from catching up to the main archive.
  • It also avoids that Debian exists in CPU optimized sub-arch builds.

If Debian would provide a set of core packages that are cross-compilable and which suffice for a minimal foreign build system, and if it would also have enough source packages that provide a reduced build dependency set so that all dependency cycles can be broken, building Debian for a yet unknown architecture could be mostly automated.

The benefits would be:

  • Putting Debian on a foreign architecture would (in the best case) boil down to making the code cross-compile for and native-compile on that architecture.
  • Debian would not need any other distribution to be ported to a different architecture. This would make Debian even more "universal".
  • Lagging architectures can be more easily updated or rebooted than when they were initially created.
  • Debian optimized for specific CPUs (Raspberry Pie, OpenMoko...) would be more attractive.

With three of this year's GSoC projects, this dream seems to come into reach.

There is the "Multiarch Cross-Toolchains" project by Thibaut Girka and mentored by Hector Oron and Marcin Juszkiewicz. Cross-compiling toolchains need packages from the foreign architecture to be installed alongside the native libraries. Cross-compiler packages have been available through the emdebian repositories but always were more of a hack. With multiarch, it is now possible to install packages from multiple architectures at once, so that cross-compilation toolchains can be realized in a proper manner and therefor can also enter the main archives. Besides creating multiarch enabled toolchains, he will also be responsible for making them build on the Debian builld system as cross-architecture dependencies are not yet supported.

There is also the "Bootstrappable Debian" project by Patrick "P. J." McDermott and mentored by Wookey and Jonathan Austin. He will make a small set of source packages multiarch cross-compilable (using cross-compilers provided by Thibaut Girka) and add a Build-Depends-StageN header to critical packages so that they can be built with reduced build dependencies for breaking dependency cycles. He will also patch tools as necessary to recognize the new control header.

And then there is my project: "Port bootstrap build-ordering tool" (Application). It is mentored by Wookey and Pietro Abate. In contrast to the other two, my output will be more on the meta-level as I will not modify any actual Debian package or patch Debian tools with more functionality. Instead the goal of this project is threefold:

  1. find the minimal set of source packages that have to be cross compiled
  2. help the user to find packages that are good candidates for breaking build dependency cycles through added staged build dependencies or by making them cross-compilable
  3. develop a tool that takes the information about packages that can be cross compiled or have staged build dependencies to output an ordering with which packages must be built to go from nothing to a full archive

More on that project in my follow-up post.

View Comments

setting up mailman, postfix, lighttpd

categories: debian

I was worried about having to learn hundreds of configuration options to properly set up mailman, postfix and lighttpd on Debian Squeeze. Turned out, that except for lighttpd it all works out of the box.

apt-get install postfix

When asked by debconf, I specified lists.mister-muffin.de as the fully qualified domain name.

apt-get install mailman
newlist mailman

The newlist command reminds me that I have to add its output to /etc/aliases. After doing so, I have to run:

newaliases

From now on, I can add any mailinglist by running newlist, editing /etc/aliases and running newaliases.

Mailinglists can also be added through the mailman webinterface but one still has to put the according entries into /etc/aliases.

Following is a working lighttpd configuration that works out of the box with the default settings of mailman on Debian squeeze.

This was the only part that caused me some headaches.

server.modules += ("mod_alias", "mod_cgi", "mod_accesslog")

$HTTP["host"] == "lists.mister-muffin.de" { accesslog.filename =
    accesslog.filename = "/var/log/lighttpd/lists-access-log"

    alias.url += (
        "/cgi-bin/mailman/private/" => "/var/lib/mailman/archives/private/",
        "/cgi-bin/mailman/public/" => "/var/lib/mailman/archives/public/",
        "/pipermail/" => "/var/lib/mailman/archives/public/",
        "/cgi-bin/mailman/"=> "/var/lib/mailman/cgi-bin/",
        "/images/mailman/" => "/usr/share/images/mailman/",
    )

    cgi.assign = (
        "/admin" => "",
        "/admindb" => "",
        "/confirm" => "",
        "/create" => "",
        "/edithtml" => "",
        "/listinfo" => "",
        "/options" => "",
        "/private" => "",
        "/rmlist" => "",
        "/roster" => "",
        "/subscribe" => "")
}

server.document-root        = "/var/www"
server.errorlog             = "/var/log/lighttpd/error.log"
server.pid-file             = "/var/run/lighttpd.pid"
server.username             = "www-data"
server.groupname            = "www-data"
index-file.names            = ( "index.html" )
server.dir-listing          = "disable"
include_shell "/usr/share/lighttpd/create-mime.assign.pl"

As a bonus, I wanted to import my existing email exchange with my GSoC mentors into the mailinglist. First I was planning on manually sending the email messages to the list, but a much easier option is to just import them in mbox format.

To extract all email messages, I first wrote the following python snippet:

import mailbox, itertools
box = mailbox.mbox('~/out')
for message in itertools.chain(mailbox.mbox('~/sent'), mailbox.Maildir('~/Mail/Web/', factory=None)):
if (("wookey" in message.get('to', "").lower()
or "wookey" in message.get('cc', "").lower()
or "wookey" in message.get('from', "").lower()
or "abate" in message.get('to', "").lower()
or "abate" in message.get('cc', "").lower()
or "abate" in message.get('from', "").lower())
and not message['subject'][0] == '['
and not message['subject'] == "multistrap"):
box.add(message)
box.close()

It iterates through messages in my mbox and maildir mailboxes, filters them for emails by wookey or pietro, strips away some messages I found to not be relevant and then saves the filtered result into the mbox mailbox ~/out.

It is important to specify factory=None for the Maildir parser, because it otherwise defaults to rfc822.Message instead of MaildirMessage.

Also do not forget to call box.close(). I initially forgot to do so and ended up with missing messages in ~/out.

I then copy the archive in its place:

scp out lists.mister-muffin.de:/var/lib/mailman/archives/private/debian-bootstrap.mbox/debian-bootstrap.mbox

Another thing that initially caused me trouble, was that the mbox didnt have the correct permissions due to the scp. Fixing them:

chown -R list:www-data /var/lib/mailman/archives/private/
chmod 664 /var/lib/mailman/archives/private/debian-bootstrap.mbox/debian-bootstrap.mbox

And update the mailman archive like this:

sudo -u list /usr/lib/mailman/bin/arch debian-bootstrap /var/lib/mailman/archives/private/debian-bootstrap.mbox/debian-bootstrap.mbox

Initially I was running the above command as root which screws up permissions as well.

View Comments
« Older Entries -- Newer Entries »