Why do I need superuser privileges when I just want to write to a regular file

categories: debian, linux

I have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down.

Creating a Debian rootfs disk image for all these devices basically follows the same steps:

  1. create an disk image file, partition it, format the partitions and mount the / partition into a directory
  2. use debootstrap or multistrap to extract a selection of armel or armhf packages into the directory
  3. copy over /usr/bin/qemu-arm-static for qemu user mode emulation
  4. chroot into the directory to execute package maintainer scripts with dpkg --configure -a
  5. copy the disk image onto the sd card

It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways.

So I tried using fakeroot+fakechroot and after some initial troubles I managed to build a foreign architecture rootfs without needing root priviliges for steps two, three and four. I wrote about my solution which still included some workarounds in another article here. These workarounds were soon not needed anymore as upstream fixed the outstanding issues. As a result I wrote the polystrap tool which combines multistrap, fakeroot, fakechroot and qemu user mode emulation. Recently I managed to integrate proot support in a separate branch of polystrap.

Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also wanted to put Debian on it by following the instructions given by the ev3dev project. Even though ev3dev calls itself a "distribution" it only deviates from pure Debian by its kernel, some configuration options and its initial package selection. Otherwise it's vanilla Debian. The project also supplies some multistrap based scripts which create the rootfs and then partition and populate an SD card. All of this is of course done as the superuser.

While the creation of the file/directory structure of the foreign Debian armel rootfs can by now easily be done without superuser priviliges by running multistrap under fakeroot/fakechroot/proot, creating the SD card image still seems to be a bit more tricky. While it is no problem to write a partition table to a regular file, it turned out to be tricky to mount these partition because tools like kpartx and losetup require superuser permissions. Tools like mkfs.ext3 and fuse-ext2 which otherwise would be able to work on a regular file without superuser privileges do not seem to allow to specify the required offsets that the partitions have within the disk image. With fuseloop there exists a tool which allows to "loop-mount" parts of a file in userspace to a new file and thus allows tools like mkfs.ext3 and fuse-ext2 to work as they normally do. But fuseloop is not packaged for Debian yet and thus also not in the current Debian stable. An obvious workaround would be to create and fill each partition in a separate file and concatenate them together. But why do I have to write my data twice just because I do not want to become the superuser? Even worse: because parted refuses to write a partition table to a file which is too small to hold the specified partitions, one spends twice the disk space of the final image: the image with the partition table plus the image with the main partition's content.

So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits?

Gladly it seems that the (at least in my opinion) hardest part of faking chroot and executing foreign architecture package maintainer scripts is already possible without superuser privileges by using fakeroot and fakechroot or proot together with qemu user mode emulation. But then there is still the blocker of creating the disk image itself through some user mode loop mounting of a filesystem occupying a virtual "partition" in the disk image.

Why has all this only become available so very recently and still requires a number of workarounds to fully work in userspace? There exists a surprising amount of scripts which wrap debootstrap/multistrap. Most of them require superuser privileges. Does everybody just accept that they have to put a sudo in front of every invocation and hope for the best? While this might be okay for well tested code like debootstrap and multistrap the countless wrapper scripts might accidentally (be it a bug in the code or a typo in the given command line arguments) write to your primary hard disk instead of your SD card. Such behavior can easily be mitigated by not executing any such script with superuser privileges in the first place.

Operations like loop mounting affect the whole system. Why do I have to touch anything outside of my home directory (/dev/loop in this case) to populate a file in it with some meaningful bits? Virtualization is no option because every virtualization solution again requires root privileges.

One might argue that a number of solutions just require some initial setup by root to then later be used by a regular user (for example /etc/fstab configuration or the schroot approach). But then again: why do I have to write anything outside of my home directory (even if it is only once) to be able to write something meaningful to a file in it?

The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting.

Given all these downsides, why is it still so common to just assume that one is able and willing to use sudo and be done with it in most cases?

I really wonder why technologies like fakeroot and fakechroot have only been developed this late. Has this problem not been around since the earliest days of Linux/Unix?

Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?

View Comments

announcing bootstrap.debian.net

categories: debian

The following post is a verbatim copy of my message to the debian-devel list.

While botch produces loads of valuable data to help maintainers modifying the right source packages with build profiles and thus make Debian bootstrappable, it has so far failed at producing this data in a format which is:

  • human readable (nobody wants to manually go through 12 MB of JSON data)
  • generated automatically periodically and published somewhere (nobody wants to run botch on his own machine or update periodically update the TODO wiki page)
  • available on a per-source-package-basis (no maintainer wants to know about the 500 source packages he does NOT maintain)

While human readability is probably still lacking (it's hard to write in a manner understandable by everybody about a complicated topic you are very familiar with), the bootstrapping results are now generated automatically (on a daily basis) and published in a per-source-package-basis as well. Thus let me introduce to you:

bootstrap.debian.net

Paul Wise encouraged me to set this up and also donated the debian.net CNAME to my server. Thanks a lot! The data is generated daily from the midnight packages/sources records of snapshot.debian.org (I hope it's okay to grab the data from there on a daily basis). The resulting data can be viewed in HTML format (with some javascript for to allow table sorting and paging in case you use javascript) per architecture (here for amd64). In addition it also produces HTML pages per source package for all source packages which are involved in a dependency cycle in any architecture (for example src:avahi or src:python2.7). Currently there are 518 source packages involved in a dependency cycle. While this number seems high, remember that estimations by calculating a feedback arc set suggest that only 50-60 of these source packages have to be modified with build profiles to make the whole graph acyclic.

For now it is funny to see that the main architectures do not bootstrap (since July, for different reasons) while less popular ones like ia64 and s390x bootstrap right now without problems (at least in theory). According to the logs (also accessible at above link, here for amd64) this is because gcc-4.6 currently fails compiling due to a build-conflict (this has been reported in bug#724865). Once this bug is fixed, all arches should be bootstrappable again. Let me remind you here that the whole analysis is done on the dependency relationships only (not a single source package is actually compiled at any point) and compilation might fail for many other reasons in practice.

It has been the idea of Paul Wise to integrate this data into the pts so that maintainers of affected source packages can react to the heuristics suggested by botch. For this purpose, the website also publishes the raw JSON data from which the HTML pages are generated (here for amd64). The bugreport against the bts can be found in bug#728298.

I'm sure that a couple of things regarding understandability of the results are not yet sufficiently explained or outright missing. If you see any such instance, please drop me a mail, suggesting what to change in the textual description or presentation of the results.

I also created the following two wiki pages to give an overview of the utilized terminology:

Feel also free to tell me if anything in these pages is unclear.

Direct patches against the python code producing the HTML from the raw JSON data are also always welcome.

View Comments

Using botch to generate transition build orders

categories: debian

In October 2012, Joachim Breitner asked me whether botch (well, it didnt have a name back then) can also be used to calculate a build order for recompiling ~450 haskell-packages with a new ghc version (it was probably the 7.6.1 release) to upload them to experimental. What is still blocking this ability is the inability of botch to directly read *.dsc files instead of having to rely on Packages and Sources files. On the other hand (in case there exists a set of Packages and Sources files) it is now much easier to use botch for such a use case for which it was not originally designed.

To demonstrate how botch can be used to calculate the build order for library transitions, I wrote the script create-transition-order.sh which executes the individual botch tools in the correct order with the correct arguments. To validate the correctness of botch, I compared its output to the order which ben produces for the same transitions.

The create-transition-order.sh script is called with a ben query string and optionally a snapshot.debian.org timestamp as its arguments. The script relies on ben being installed because ben query strings cannot be translated into grep-dctrl query strings as ben query splits fields which contain comma separated values (like Depends and Build-Depends) at the comma before it searches for matching strings. Unfortunately, the create-transition-order.sh script also currently relies on a yet unreleased ben feature which allows ben download to create a ben.cache file. You can track the progress of this feature as bug#714703. Creating a ben.cache file is necessary because some queries rely on an association between binary and source packages which is not present in all packages in a Packages file. For example the haskell transition query makes use of this feature by including the .source field in its query.

The result of these trials was, that botch produced nearly the same build order for nearly all transitions. The only differences are due to shortcomings of ben and botch.

For example, ben is not able to create an order for transitions where involved packages form one or more cycles. A prominent example is the haskell transition where ghc itself is only built during step 15 after many other packages which would have needed ghc to be compiled first. Botch solves this problem by reducing all strongly connected components in a cyclic graph to a single vertex before creating the order. This operation makes the graph acyclic and creating a build order trivially. The only remaining problems can then be found within the strongly connected components (for Haskell they are of maximum size two) but the overall build order is correct.

On the other hand botch has no notion of packages which are affected by a transition and thus creates a build order which in some cases is longer than the one created by ben. When generating the interdependencies between packages, ben only considers those which are part of the transition. Botch on the other hand considers all dependency relationships. It would be simple to solve this issue in botch by removing unaffected packages from the dependency graph through edge contraction (an operation already used by botch for other tasks).

This exercise also let me find another bug in dose3 where libdose would not associate a binary package with a binNMU version with its associated source package but instead report a version mismatch error. This problem was also reported in the dose bug tracker.

View Comments

New bootstrap analysis results

categories: debian

I wrote a small wiki page for botch. It includes links to some hopefully useful resources like my FOSDEM 2013 talk or my master thesis about botch.

The final goal of botch is to generate a build order with which Debian can be bootstrapped from zero to more than 18000 source packages. But such a build order can only be generated once enough source packages (by now the number is around 70) incorporate build profiles which allow to compile them with less build dependencies and thereby break all build dependency cycles. So an intermediary goal of botch is, to find a "good" (i.e. a close to minimal amount) selection of such build dependencies that should be dropped from their source packages. So far, the results of all heuristics we use for this task were just dumped to standard output. Since this textual format was hard to read for a human developer and even worse machine readable, that tool now outputs its results in JSON. This data is then converted by another tool to a more human readable format. One such format is HTML and with the help of javascript for sortable and pageable tables, the data can then be presented without producing too much visual clutter (at least compared to the initial textual format).

Another advantage of HTML is, that the data can now easily be shared with other developers without them having to run botch or install any other program first. So without further ado, please find the result of running our heuristics on Debian Sid as of 2013-01-01 here: http://mister-muffin.de/bootstrap/stats/

Here is a screenshot of what you can expect (you can see part of the table of edges with most cycles through them):

As a developer uses this information to add build profiles to source packages, this HTML page can be regenerated and will show which strongly connected components are now left in the graph and how to best make them acyclic as well.

Besides other improvements I also updated the data I used to generate the graphs from this post. I found this update to be necessary since we found and fixed several implementation bugs since October 2012, so this new plot should be more accurate. Here a graph which shows the amount of vertices in the biggest strongly connected component of Debian Sid by date.

Some datapoints are missing for dates for which important source packages in Sid were not compilable.

View Comments

botch - the debian bootstrap software has a name

categories: debian

After nearly one year of development, the "Port bootstrap build-ordering tool" now finally has a name: botch. It stands for Bootstrap/Build Order Tool CHain. Now we dont have to call it "Port bootstrap build-ordering tool" anymore (nobody did that anyways) and also dont have to talk about it by referring to it as "the tools" in email, IRC or in my master thesis which is due in a few weeks. With this, the issue also doesnt block the creation of a Debian package anymore. Since only a handful of people have a clone of it anyway, I also renamed the gitorious git repository url (and updated all links in this blog and left a text file informing about the name change in the old location). The new url is: https://gitorious.org/debian-bootstrap/botch

Further improvements since my last post in January include:

  • greatly improved speed of partial order calculation
  • calculation of strong edges and strong subgraphs in build graphs and source graphs
  • calculation of source graphs from scratch or from build graphs
  • calculation of strong bridges and strong articulation points
  • calculate betweenness of vertices and edges
  • find self-cycles in the source graph
  • add webselfcycle code to generate a HTML page of self-cycles
  • allow to collapse all strongly connected components of the source graph into a single vertex to make it acyclic
  • allow to create a build order for cyclic graphs (by collapsing SCC)
  • allow to specify custom installation sets
  • add more feedback arc set heuristics (eades, insertion sort, sifting, moving)
  • improve the cycle heuristic

In the mean time a paper about the theoretical aspects of this topic which Pietro Abate and I submitted to the CBSE 2013 conference got accepted (hurray!) and I will travel to the conference in Canada in June.

Botch (yeey, I can call it by a name!) is also an integral part of one of the proposals for a Debian Google Summer of Code project this year, mentored by Wookey and co-mentored by myself. Lets hope it gets accepted and produces good results in the end of the summer!

View Comments
« Older Entries -- Newer Entries »