I have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down.
Creating a Debian rootfs disk image for all these devices basically follows the same steps:
- create an disk image file, partition it, format the partitions and mount the
/partition into a directory
multistrapto extract a selection of armel or armhf packages into the directory
- copy over
/usr/bin/qemu-arm-staticfor qemu user mode emulation
- chroot into the directory to execute package maintainer scripts with
dpkg --configure -a
- copy the disk image onto the sd card
It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways.
So I tried using
fakechroot and after some initial troubles I
managed to build a foreign architecture rootfs without needing root
priviliges for steps two, three and four. I wrote about my solution which
still included some workarounds in another article here. These
workarounds were soon not needed anymore as upstream fixed the outstanding
issues. As a result I wrote the
polystrap tool which combines
fakechroot and qemu user mode emulation. Recently
I managed to integrate
proot support in a separate branch of
Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also
wanted to put Debian on it by following the instructions given by the ev3dev
project. Even though ev3dev calls itself a "distribution" it only deviates
from pure Debian by its kernel, some configuration options and its initial
package selection. Otherwise it's vanilla Debian. The project also supplies
multistrap based scripts which create the rootfs and then
partition and populate an SD card. All of this is of course done as the
While the creation of the file/directory structure of the foreign Debian armel
rootfs can by now easily be done without superuser priviliges by running
proot, creating the SD card image
still seems to be a bit more tricky. While it is no problem to write a
partition table to a regular file, it turned out to be tricky to mount these
partition because tools like
losetup require superuser
permissions. Tools like
fuse-ext2 which otherwise would be
able to work on a regular file without superuser privileges do not seem to
allow to specify the required offsets that the partitions have within the disk
fuseloop there exists a tool which allows to "loop-mount"
parts of a file in userspace to a new file and thus allows tools like
fuse-ext2 to work as they normally do. But
fuseloop is not
packaged for Debian yet and thus also not in the current Debian stable. An
obvious workaround would be to create and fill each partition in a separate
file and concatenate them together. But why do I have to write my data twice
just because I do not want to become the superuser? Even worse: because
parted refuses to write a partition table to a file which is too small to
hold the specified partitions, one spends twice the disk space of the final
image: the image with the partition table plus the image with the main
So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits?
Gladly it seems that the (at least in my opinion) hardest part of faking chroot
and executing foreign architecture package maintainer scripts is already
possible without superuser privileges by using
proot together with qemu user mode emulation. But then there is still the
blocker of creating the disk image itself through some user mode loop mounting
of a filesystem occupying a virtual "partition" in the disk image.
Why has all this only become available so very recently and still requires a
number of workarounds to fully work in userspace? There exists a surprising
amount of scripts which wrap
multistrap. Most of them require
superuser privileges. Does everybody just accept that they have to put a
in front of every invocation and hope for the best? While this might be okay
for well tested code like
multistrap the countless wrapper
scripts might accidentally (be it a bug in the code or a typo in the given
command line arguments) write to your primary hard disk instead of your SD
card. Such behavior can easily be mitigated by not executing any such script
with superuser privileges in the first place.
Operations like loop mounting affect the whole system. Why do I have to touch
anything outside of my home directory (
/dev/loop in this case) to populate a
file in it with some meaningful bits? Virtualization is no option because every
virtualization solution again requires root privileges.
One might argue that a number of solutions just require some initial setup by
root to then later be used by a regular user (for example
configuration or the
schroot approach). But then again: why do I have to
write anything outside of my home directory (even if it is only once) to be
able to write something meaningful to a file in it?
The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting.
Given all these downsides, why is it still so common to just assume that one is
able and willing to use
sudo and be done with it in most cases?
I really wonder why technologies like
fakechroot have only
been developed this late. Has this problem not been around since the earliest
days of Linux/Unix?
Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?
The following post is a verbatim copy of my message to the debian-devel list.
While botch produces loads of valuable data to help maintainers modifying the right source packages with build profiles and thus make Debian bootstrappable, it has so far failed at producing this data in a format which is:
- human readable (nobody wants to manually go through 12 MB of JSON data)
- generated automatically periodically and published somewhere (nobody wants to run botch on his own machine or update periodically update the TODO wiki page)
- available on a per-source-package-basis (no maintainer wants to know about the 500 source packages he does NOT maintain)
While human readability is probably still lacking (it's hard to write in a manner understandable by everybody about a complicated topic you are very familiar with), the bootstrapping results are now generated automatically (on a daily basis) and published in a per-source-package-basis as well. Thus let me introduce to you:
For now it is funny to see that the main architectures do not bootstrap (since July, for different reasons) while less popular ones like ia64 and s390x bootstrap right now without problems (at least in theory). According to the logs (also accessible at above link, here for amd64) this is because gcc-4.6 currently fails compiling due to a build-conflict (this has been reported in bug#724865). Once this bug is fixed, all arches should be bootstrappable again. Let me remind you here that the whole analysis is done on the dependency relationships only (not a single source package is actually compiled at any point) and compilation might fail for many other reasons in practice.
It has been the idea of Paul Wise to integrate this data into the pts so that maintainers of affected source packages can react to the heuristics suggested by botch. For this purpose, the website also publishes the raw JSON data from which the HTML pages are generated (here for amd64). The bugreport against the bts can be found in bug#728298.
I'm sure that a couple of things regarding understandability of the results are not yet sufficiently explained or outright missing. If you see any such instance, please drop me a mail, suggesting what to change in the textual description or presentation of the results.
I also created the following two wiki pages to give an overview of the utilized terminology:
Feel also free to tell me if anything in these pages is unclear.
Direct patches against the python code producing the HTML from the raw JSON data are also always welcome.
In October 2012, Joachim Breitner asked me whether botch (well, it
didnt have a name back then) can also be used to calculate a build order for
recompiling ~450 haskell-packages with a new ghc version (it was probably
the 7.6.1 release) to upload them to experimental. What is still blocking this
ability is the inability of botch to directly read
*.dsc files instead of
having to rely on
Sources files. On the other hand (in case
there exists a set of
Sources files) it is now much easier to
use botch for such a use case for which it was not originally designed.
To demonstrate how botch can be used to calculate the build order for library
transitions, I wrote the script
executes the individual botch tools in the correct order with the correct
arguments. To validate the correctness of botch, I compared its output to the
order which ben produces for the same transitions.
create-transition-order.sh script is called with a ben query string and
optionally a snapshot.debian.org timestamp as its arguments. The script
relies on ben being installed because ben query strings cannot be translated
grep-dctrl query strings as
ben query splits fields which contain
comma separated values (like
Build-Depends) at the comma before
it searches for matching strings. Unfortunately, the
create-transition-order.sh script also currently relies on a yet unreleased
ben feature which allows
ben download to create a
ben.cache file. You can
track the progress of this feature as bug#714703. Creating a
file is necessary because some queries rely on an association between binary
and source packages which is not present in all packages in a
For example the haskell transition query makes use of this feature by
.source field in its query.
The result of these trials was, that botch produced nearly the same build order for nearly all transitions. The only differences are due to shortcomings of ben and botch.
For example, ben is not able to create an order for transitions where involved
packages form one or more cycles. A prominent example is the haskell
ghc itself is only built during step 15 after many other
packages which would have needed ghc to be compiled first. Botch solves this
problem by reducing all strongly connected components in a cyclic graph to a
single vertex before creating the order. This operation makes the graph acyclic
and creating a build order trivially. The only remaining problems can then be
found within the strongly connected components (for Haskell they are of maximum
size two) but the overall build order is correct.
On the other hand botch has no notion of packages which are affected by a transition and thus creates a build order which in some cases is longer than the one created by ben. When generating the interdependencies between packages, ben only considers those which are part of the transition. Botch on the other hand considers all dependency relationships. It would be simple to solve this issue in botch by removing unaffected packages from the dependency graph through edge contraction (an operation already used by botch for other tasks).
This exercise also let me find another bug in
dose3 where libdose would
not associate a binary package with a binNMU version with its associated source
package but instead report a version mismatch error. This problem was also
reported in the dose bug tracker.
Another advantage of HTML is, that the data can now easily be shared with other developers without them having to run botch or install any other program first. So without further ado, please find the result of running our heuristics on Debian Sid as of 2013-01-01 here: http://mister-muffin.de/bootstrap/stats/
Here is a screenshot of what you can expect (you can see part of the table of edges with most cycles through them):
As a developer uses this information to add build profiles to source packages, this HTML page can be regenerated and will show which strongly connected components are now left in the graph and how to best make them acyclic as well.
Besides other improvements I also updated the data I used to generate the graphs from this post. I found this update to be necessary since we found and fixed several implementation bugs since October 2012, so this new plot should be more accurate. Here a graph which shows the amount of vertices in the biggest strongly connected component of Debian Sid by date.
Some datapoints are missing for dates for which important source packages in Sid were not compilable.
After nearly one year of development, the "Port bootstrap build-ordering tool" now finally has a name: botch. It stands for Bootstrap/Build Order Tool CHain. Now we dont have to call it "Port bootstrap build-ordering tool" anymore (nobody did that anyways) and also dont have to talk about it by referring to it as "the tools" in email, IRC or in my master thesis which is due in a few weeks. With this, the issue also doesnt block the creation of a Debian package anymore. Since only a handful of people have a clone of it anyway, I also renamed the gitorious git repository url (and updated all links in this blog and left a text file informing about the name change in the old location). The new url is: https://gitorious.org/debian-bootstrap/botch
Further improvements since my last post in January include:
- greatly improved speed of partial order calculation
- calculation of strong edges and strong subgraphs in build graphs and source graphs
- calculation of source graphs from scratch or from build graphs
- calculation of strong bridges and strong articulation points
- calculate betweenness of vertices and edges
- find self-cycles in the source graph
- add webselfcycle code to generate a HTML page of self-cycles
- allow to collapse all strongly connected components of the source graph into a single vertex to make it acyclic
- allow to create a build order for cyclic graphs (by collapsing SCC)
- allow to specify custom installation sets
- add more feedback arc set heuristics (eades, insertion sort, sifting, moving)
- improve the cycle heuristic
In the mean time a paper about the theoretical aspects of this topic which Pietro Abate and I submitted to the CBSE 2013 conference got accepted (hurray!) and I will travel to the conference in Canada in June.
Botch (yeey, I can call it by a name!) is also an integral part of one of the proposals for a Debian Google Summer of Code project this year, mentored by Wookey and co-mentored by myself. Lets hope it gets accepted and produces good results in the end of the summer!