unshare without superuser privileges

categories: code, debian, linux

TLDR: With the help of Helmut Grohne I finally figured out most of the bits necessary to unshare everything without becoming root (though one might say that this is still cheated because the suid root tools newuidmap and newgidmap are used). I wrote a Perl script which documents how this is done in practice. This script is nearly equivalent to using the existing commands lxc-usernsexec [opts] -- unshare [opts] -- COMMAND except that these two together cannot be used to mount a new proc. Apart from this problem, this Perl script might also be useful by itself because it is architecture independent and easily inspectable for the curious mind without resorting to sources.debian.net (it is heavily documented at nearly 2 lines of comments per line of code on average). It can be retrieved here at https://gitlab.mister-muffin.de/josch/user-unshare/blob/master/user-unshare

Long story: Nearly two years after my last last rant about everything needing superuser privileges in Linux, I'm still interested in techniques that let me do more things without becoming root. Helmut Grohne had told me for a while about unshare(), or user namespaces as the right way to have things like chroot without root. There are also reports of LXC containers working without root privileges but they are hard to come by. A couple of days ago I had some time again, so Helmut helped me to get through the major blockers that were so far stopping me from using unshare in a meaningful way without executing everything with sudo.

My main motivation at that point was to let dpkg-buildpackage when executed by sbuild be run with an unshared network namespace and thus without network access (except for the loopback interface) because like pbuilder I wanted sbuild to enforce the rule not to access any remote resources during the build. After several evenings of investigating and doctoring at the Perl script I mentioned initially, I came to the conclusion that the only place that can unshare the network namespace without disrupting anything is schroot itself. This is because unsharing inside the chroot will fail because dpkg-buildpackage is run with non-root privileges and thus the user namespace has to be unshared. But this then will destroy all ownership information. But even if that wasn't the case, the chroot itself is unlikely to have (and also should not) tools like ip or newuidmap and newgidmap installed. Unsharing the schroot call itself also will not work. Again we first need to unshare the user namespace and then schroot will complain about wrong ownership of its configuration file /etc/schroot/schroot.conf. Luckily, when contacting Roger Leigh about this wishlist feature in bug#802849 I was told that this was already implemented in its git master \o/. So this particular problem seems to be taken care of and once the next schroot release happens, sbuild will make use of it and have unshare --net capabilities just like pbuilder already had since last year.

With the sbuild case taken care of, the rest of this post will introduce the Perl script I wrote. The name user-unshare is really arbitrary. I just needed some identifier for the git repository and a filename.

The most important discovery I made was, that Debian disables unprivileged user namespaces by default with the patch add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch to the Linux kernel. To enable it, one has to first either do

echo 1 | sudo tee /proc/sys/kernel/unprivileged_userns_clone > /dev/null

or

sudo sysctl -w kernel.unprivileged_userns_clone=1

The tool tries to be like unshare(1) but with the power of lxc-usernsexec(1) to map more than one id into the new user namespace by using the programs newgidmap and newuidmap. Or in other words: This tool tries to be like lxc-usernsexec(1) but with the power of unshare(1) to unshare more than just the user and mount namespaces. It is nearly equal to calling:

lxc-usernsexec [opts] -- unshare [opts] -- COMMAND

Its main reason of existence are:

  • as a project for me to learn how unprivileged namespaces work
  • written in Perl which means:
    • architecture independent (same executable on any architecture)
    • easily inspectable by other curious minds
  • tons of code comments to let others understand how things work
  • no need to install the lxc package in a minimal environment (perl itself might not be called minimal either but is present in every Debian installation)
  • not suffering from being unable to mount proc

I hoped that systemd-nspawn could do what I wanted but it seems that its requirement for being run as root will not change any time soon

Another tool in Debian that offers to do chroot without superuser privileges is linux-user-chroot but that one cheats by being suid root.

Had I found lxc-usernsexec earlier I would've probably not written this. But after I found it I happily used it to get an even better understanding of the matter and further improve the comments in my code. I started writing my own tool in Perl because that's the language sbuild was written in and as mentioned initially, I intended to use this script with sbuild. Now that the sbuild problem is taken care of, this is not so important anymore but I like if I can read the code of simple programs I run directly from /usr/bin without having to retrieve the source code first or use sources.debian.net.

The only thing I wasn't able to figure out is how to properly mount proc into my new mount namespace. I found a workaround that works by first mounting a new proc to /proc and then bind-mounting /proc to whatever new location for proc is requested. I didn't figure out how to do this without mounting to /proc first partly also because this doesn't work at all when using lxc-usernsexec and unshare together. In this respect, this perl script is a bit more powerful than those two tools together. I suppose that the reason is that unshare wasn't written with having being called without superuser privileges in mind. If you have an idea what could be wrong, the code has a big FIXME about this issue.

Finally, here a demonstration of what my script can do. Because of the /proc bug, lxc-usernsexec and unshare together are not able to do this but it might also be that I'm just not using these tools in the right way. The following will give you an interactive shell in an environment created from one of my sbuild chroot tarballs:

$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net \
    --uts --mount --fork -- sh -c 'ip link set lo up && ip addr && \
    hostname hoothoot-chroot && \
    tar -C /tmp/buildroot -xf /srv/chroot/unstable-amd64.tar.gz; \
    /usr/sbin/chroot /tmp/buildroot /sbin/runuser -s /bin/bash - josch && \
    umount /tmp/buildroot/proc && rm -rf /tmp/buildroot'
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ whoami
josch
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ hostname
hoothoot-chroot
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ ls -lha /proc | head
total 0
dr-xr-xr-x 218 nobody nogroup    0 Oct 25 19:06 .
drwxr-xr-x  22 root   root     440 Oct  1 08:42 ..
dr-xr-xr-x   9 root   root       0 Oct 25 19:06 1
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 15
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 16
dr-xr-xr-x   9 root   root       0 Oct 25 19:06 7
dr-xr-xr-x   9 josch  josch      0 Oct 25 19:06 8
dr-xr-xr-x   4 nobody nogroup    0 Oct 25 19:06 acpi
dr-xr-xr-x   6 nobody nogroup    0 Oct 25 19:06 asound

Of course instead of running this long command we can also instead write a small shell script and execute that instead. The following does the same things as the long command above but adds some comments for further explanation:

#!/bin/sh

set -exu

# I'm using /tmp because I have it mounted as a tmpfs
rootdir="/tmp/buildroot"

# bring the loopback interface up
ip link set lo up

# show that the loopback interface is really up
ip addr

# make use of the UTS namespace being unshared
hostname hoothoot-chroot

# extract the chroot tarball. This must be done inside the user namespace for
# the file permissions to be correct.
#
# tar will fail to call mknod and to change the permissions of /proc but we are
# ignoring that
tar -C "$rootdir" -xf /srv/chroot/unstable-amd64.tar.gz || true

# run chroot and inside, immediately drop permissions to the user "josch" and
# start an interactive shell
/usr/sbin/chroot "$rootdir" /sbin/runuser -s /bin/bash - josch

# unmount /proc and remove the temporary directory
umount "$rootdir/proc"
rm -rf "$rootdir"

and then:

$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh

As mentioned in the beginning, the tool is nearly equivalent to calling lxc-usernsexec [opts] -- unshare [opts] -- COMMAND but because of the problem with mounting proc (mentioned earlier), lxc-usernsexec and unshare cannot be used with above example. If one tries anyways one will only get:

$ lxc-usernsexec -m b:0:1000:1 -m b:1:558752:1 -- unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh
unshare: mount /tmp/buildroot/proc failed: Invalid argument

I'd be interested in finding out why that is and how to fix it.

View Comments

Why do I need superuser privileges when I just want to write to a regular file

categories: debian, linux

I have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down.

Creating a Debian rootfs disk image for all these devices basically follows the same steps:

  1. create an disk image file, partition it, format the partitions and mount the / partition into a directory
  2. use debootstrap or multistrap to extract a selection of armel or armhf packages into the directory
  3. copy over /usr/bin/qemu-arm-static for qemu user mode emulation
  4. chroot into the directory to execute package maintainer scripts with dpkg --configure -a
  5. copy the disk image onto the sd card

It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways.

So I tried using fakeroot+fakechroot and after some initial troubles I managed to build a foreign architecture rootfs without needing root priviliges for steps two, three and four. I wrote about my solution which still included some workarounds in another article here. These workarounds were soon not needed anymore as upstream fixed the outstanding issues. As a result I wrote the polystrap tool which combines multistrap, fakeroot, fakechroot and qemu user mode emulation. Recently I managed to integrate proot support in a separate branch of polystrap.

Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also wanted to put Debian on it by following the instructions given by the ev3dev project. Even though ev3dev calls itself a "distribution" it only deviates from pure Debian by its kernel, some configuration options and its initial package selection. Otherwise it's vanilla Debian. The project also supplies some multistrap based scripts which create the rootfs and then partition and populate an SD card. All of this is of course done as the superuser.

While the creation of the file/directory structure of the foreign Debian armel rootfs can by now easily be done without superuser priviliges by running multistrap under fakeroot/fakechroot/proot, creating the SD card image still seems to be a bit more tricky. While it is no problem to write a partition table to a regular file, it turned out to be tricky to mount these partition because tools like kpartx and losetup require superuser permissions. Tools like mkfs.ext3 and fuse-ext2 which otherwise would be able to work on a regular file without superuser privileges do not seem to allow to specify the required offsets that the partitions have within the disk image. With fuseloop there exists a tool which allows to "loop-mount" parts of a file in userspace to a new file and thus allows tools like mkfs.ext3 and fuse-ext2 to work as they normally do. But fuseloop is not packaged for Debian yet and thus also not in the current Debian stable. An obvious workaround would be to create and fill each partition in a separate file and concatenate them together. But why do I have to write my data twice just because I do not want to become the superuser? Even worse: because parted refuses to write a partition table to a file which is too small to hold the specified partitions, one spends twice the disk space of the final image: the image with the partition table plus the image with the main partition's content.

So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits?

Gladly it seems that the (at least in my opinion) hardest part of faking chroot and executing foreign architecture package maintainer scripts is already possible without superuser privileges by using fakeroot and fakechroot or proot together with qemu user mode emulation. But then there is still the blocker of creating the disk image itself through some user mode loop mounting of a filesystem occupying a virtual "partition" in the disk image.

Why has all this only become available so very recently and still requires a number of workarounds to fully work in userspace? There exists a surprising amount of scripts which wrap debootstrap/multistrap. Most of them require superuser privileges. Does everybody just accept that they have to put a sudo in front of every invocation and hope for the best? While this might be okay for well tested code like debootstrap and multistrap the countless wrapper scripts might accidentally (be it a bug in the code or a typo in the given command line arguments) write to your primary hard disk instead of your SD card. Such behavior can easily be mitigated by not executing any such script with superuser privileges in the first place.

Operations like loop mounting affect the whole system. Why do I have to touch anything outside of my home directory (/dev/loop in this case) to populate a file in it with some meaningful bits? Virtualization is no option because every virtualization solution again requires root privileges.

One might argue that a number of solutions just require some initial setup by root to then later be used by a regular user (for example /etc/fstab configuration or the schroot approach). But then again: why do I have to write anything outside of my home directory (even if it is only once) to be able to write something meaningful to a file in it?

The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting.

Given all these downsides, why is it still so common to just assume that one is able and willing to use sudo and be done with it in most cases?

I really wonder why technologies like fakeroot and fakechroot have only been developed this late. Has this problem not been around since the earliest days of Linux/Unix?

Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?

View Comments