unshare without superuser privileges
Sun, 25 Oct 2015 18:44 categories: code, debian, linuxTLDR: With the help of Helmut Grohne I finally figured out most of the bits
necessary to unshare everything without becoming root (though one might say
that this is still cheated because the suid root tools newuidmap
and newgidmap
are used). I wrote a Perl script which documents how this is done in practice.
This script is nearly equivalent to using the existing commands lxc-usernsexec
[opts] -- unshare [opts] -- COMMAND
except that these two together cannot be
used to mount a new proc. Apart from this problem, this Perl script might also
be useful by itself because it is architecture independent and easily
inspectable for the curious mind without resorting to sources.debian.net (it is
heavily documented at nearly 2 lines of comments per line of code on average).
It can be retrieved here at
https://gitlab.mister-muffin.de/josch/user-unshare/blob/master/user-unshare
Long story: Nearly two years after my last last rant about everything needing
superuser privileges in
Linux,
I'm still interested in techniques that let me do more things without becoming
root. Helmut Grohne had told me for a while about unshare(), or user namespaces
as the right way to have things like chroot without root. There are also
reports of LXC containers working without root privileges but they are hard to
come by. A couple of days ago I had some time again, so Helmut helped me to get
through the major blockers that were so far stopping me from using unshare in a
meaningful way without executing everything with sudo
.
My main motivation at that point was to let dpkg-buildpackage
when executed
by sbuild
be run with an unshared network namespace and thus without network
access (except for the loopback interface) because like pbuilder I wanted
sbuild to enforce the rule not to access any remote resources during the build.
After several evenings of investigating and doctoring at the Perl script I
mentioned initially, I came to the conclusion that the only place that can
unshare the network namespace without disrupting anything is schroot itself.
This is because unsharing inside the chroot will fail because
dpkg-buildpackage is run with non-root privileges and thus the user namespace
has to be unshared. But this then will destroy all ownership information. But
even if that wasn't the case, the chroot itself is unlikely to have (and also
should not) tools like ip
or newuidmap
and newgidmap
installed. Unsharing
the schroot call itself also will not work. Again we first need to unshare the
user namespace and then schroot will complain about wrong ownership of its
configuration file /etc/schroot/schroot.conf
. Luckily, when contacting Roger
Leigh about this wishlist feature in
bug#802849 I was told that this was already
implemented in its git master \o/. So this particular problem seems to be taken
care of and once the next schroot release happens, sbuild will make use of it
and have unshare --net
capabilities just like pbuilder
already had since
last year.
With the sbuild case taken care of, the rest of this post will introduce the
Perl script I wrote.
The name user-unshare
is really arbitrary. I just needed some identifier for
the git repository and a filename.
The most important discovery I made was, that Debian disables unprivileged user
namespaces by default with the patch
add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
to the
Linux kernel. To enable it, one has to first either do
echo 1 | sudo tee /proc/sys/kernel/unprivileged_userns_clone > /dev/null
or
sudo sysctl -w kernel.unprivileged_userns_clone=1
The tool tries to be like unshare(1) but with the power of lxc-usernsexec(1) to
map more than one id into the new user namespace by using the programs
newgidmap
and newuidmap
. Or in other words: This tool tries to be like
lxc-usernsexec(1) but with the power of unshare(1) to unshare more than just
the user and mount namespaces. It is nearly equal to calling:
lxc-usernsexec [opts] -- unshare [opts] -- COMMAND
Its main reason of existence are:
- as a project for me to learn how unprivileged namespaces work
- written in Perl which means:
- architecture independent (same executable on any architecture)
- easily inspectable by other curious minds
- tons of code comments to let others understand how things work
- no need to install the lxc package in a minimal environment (perl itself might not be called minimal either but is present in every Debian installation)
- not suffering from being unable to mount proc
I hoped that systemd-nspawn
could do what I wanted but it seems that its
requirement for being run as root will not change any time
soon
Another tool in Debian that offers to do chroot without superuser privileges is
linux-user-chroot
but that one cheats by being suid root.
Had I found lxc-usernsexec
earlier I would've probably not written this. But
after I found it I happily used it to get an even better understanding of the
matter and further improve the comments in my code. I started writing my own
tool in Perl because that's the language sbuild was written in and as mentioned
initially, I intended to use this script with sbuild. Now that the sbuild
problem is taken care of, this is not so important anymore but I like if I can
read the code of simple programs I run directly from /usr/bin without having to
retrieve the source code first or use sources.debian.net.
The only thing I wasn't able to figure out is how to properly mount proc into
my new mount namespace. I found a workaround that works by first mounting a new
proc to /proc
and then bind-mounting /proc
to whatever new location for
proc is requested. I didn't figure out how to do this without mounting to
/proc
first partly also because this doesn't work at all when using
lxc-usernsexec
and unshare
together. In this respect, this perl script is a
bit more powerful than those two tools together. I suppose that the reason is
that unshare
wasn't written with having being called without superuser
privileges in mind. If you have an idea what could be wrong, the code has a big
FIXME
about this issue.
Finally, here a demonstration of what my script can do. Because of the /proc
bug, lxc-usernsexec
and unshare
together are not able to do this but it
might also be that I'm just not using these tools in the right way. The
following will give you an interactive shell in an environment created from one
of my sbuild chroot tarballs:
$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net \
--uts --mount --fork -- sh -c 'ip link set lo up && ip addr && \
hostname hoothoot-chroot && \
tar -C /tmp/buildroot -xf /srv/chroot/unstable-amd64.tar.gz; \
/usr/sbin/chroot /tmp/buildroot /sbin/runuser -s /bin/bash - josch && \
umount /tmp/buildroot/proc && rm -rf /tmp/buildroot'
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ whoami
josch
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ hostname
hoothoot-chroot
(unstable-amd64-sbuild)josch@hoothoot-chroot:/$ ls -lha /proc | head
total 0
dr-xr-xr-x 218 nobody nogroup 0 Oct 25 19:06 .
drwxr-xr-x 22 root root 440 Oct 1 08:42 ..
dr-xr-xr-x 9 root root 0 Oct 25 19:06 1
dr-xr-xr-x 9 josch josch 0 Oct 25 19:06 15
dr-xr-xr-x 9 josch josch 0 Oct 25 19:06 16
dr-xr-xr-x 9 root root 0 Oct 25 19:06 7
dr-xr-xr-x 9 josch josch 0 Oct 25 19:06 8
dr-xr-xr-x 4 nobody nogroup 0 Oct 25 19:06 acpi
dr-xr-xr-x 6 nobody nogroup 0 Oct 25 19:06 asound
Of course instead of running this long command we can also instead write a small shell script and execute that instead. The following does the same things as the long command above but adds some comments for further explanation:
#!/bin/sh
set -exu
# I'm using /tmp because I have it mounted as a tmpfs
rootdir="/tmp/buildroot"
# bring the loopback interface up
ip link set lo up
# show that the loopback interface is really up
ip addr
# make use of the UTS namespace being unshared
hostname hoothoot-chroot
# extract the chroot tarball. This must be done inside the user namespace for
# the file permissions to be correct.
#
# tar will fail to call mknod and to change the permissions of /proc but we are
# ignoring that
tar -C "$rootdir" -xf /srv/chroot/unstable-amd64.tar.gz || true
# run chroot and inside, immediately drop permissions to the user "josch" and
# start an interactive shell
/usr/sbin/chroot "$rootdir" /sbin/runuser -s /bin/bash - josch
# unmount /proc and remove the temporary directory
umount "$rootdir/proc"
rm -rf "$rootdir"
and then:
$ mkdir -p /tmp/buildroot/proc
$ ./user-unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh
As mentioned in the beginning, the tool is nearly equivalent to calling
lxc-usernsexec [opts] -- unshare [opts] -- COMMAND
but because of the problem
with mounting proc (mentioned earlier), lxc-usernsexec
and unshare
cannot
be used with above example. If one tries anyways one will only get:
$ lxc-usernsexec -m b:0:1000:1 -m b:1:558752:1 -- unshare --mount-proc=/tmp/buildroot/proc --ipc --pid --net --uts --mount --fork -- ./chroot.sh
unshare: mount /tmp/buildroot/proc failed: Invalid argument
I'd be interested in finding out why that is and how to fix it.
Why do I need superuser privileges when I just want to write to a regular file
Sat, 11 Jan 2014 01:21 categories: debian, linuxI have written a number of scripts to create Debian foreign architecture (mostly armel and armhf) rootfs images for SD cards or NAND flashing. I started with putting Debian on my Openmoko gta01 and gta02 and continued with devices like the qi nanonote, a marvel kirkwood based device, the Always Innovating Touchbook (close to the Beagleboard), the Notion Ink Adam and most recently the Golden Delicious gta04. Once it has been manufactured, I will surely also get my hands dirty with the Neo900 whose creators are currently looking for potential donors/customers to increase the size of the first batch and get the price per unit further down.
Creating a Debian rootfs disk image for all these devices basically follows the same steps:
- create an disk image file, partition it, format the partitions and mount the
/
partition into a directory - use
debootstrap
ormultistrap
to extract a selection of armel or armhf packages into the directory - copy over
/usr/bin/qemu-arm-static
for qemu user mode emulation - chroot into the directory to execute package maintainer scripts with
dpkg --configure -a
- copy the disk image onto the sd card
It was not long until I started wondering why I had to run all of the above steps with superuser privileges even though everything except the final step (which I will not cover here) was in principle nothing else than writing some magic bytes to files I had write access to (the disk image file) in some more or less fancy ways.
So I tried using fakeroot
+fakechroot
and after some initial troubles I
managed to build a foreign architecture rootfs without needing root
priviliges for steps two, three and four. I wrote about my solution which
still included some workarounds in another article here. These
workarounds were soon not needed anymore as upstream fixed the outstanding
issues. As a result I wrote the polystrap
tool which combines
multistrap
, fakeroot
, fakechroot
and qemu user mode emulation. Recently
I managed to integrate proot
support in a separate branch of
polystrap
.
Last year I got the LEGO ev3 robot for christmas and since it runs Linux I also
wanted to put Debian on it by following the instructions given by the ev3dev
project. Even though ev3dev calls itself a "distribution" it only deviates
from pure Debian by its kernel, some configuration options and its initial
package selection. Otherwise it's vanilla Debian. The project also supplies
some multistrap
based scripts which create the rootfs and then
partition and populate an SD card. All of this is of course done as the
superuser.
While the creation of the file/directory structure of the foreign Debian armel
rootfs can by now easily be done without superuser priviliges by running
multistrap under fakeroot
/fakechroot
/proot
, creating the SD card image
still seems to be a bit more tricky. While it is no problem to write a
partition table to a regular file, it turned out to be tricky to mount these
partition because tools like kpartx
and losetup
require superuser
permissions. Tools like mkfs.ext3
and fuse-ext2
which otherwise would be
able to work on a regular file without superuser privileges do not seem to
allow to specify the required offsets that the partitions have within the disk
image. With fuseloop
there exists a tool which allows to "loop-mount"
parts of a file in userspace to a new file and thus allows tools like
mkfs.ext3
and fuse-ext2
to work as they normally do. But fuseloop
is not
packaged for Debian yet and thus also not in the current Debian stable. An
obvious workaround would be to create and fill each partition in a separate
file and concatenate them together. But why do I have to write my data twice
just because I do not want to become the superuser? Even worse: because
parted
refuses to write a partition table to a file which is too small to
hold the specified partitions, one spends twice the disk space of the final
image: the image with the partition table plus the image with the main
partition's content.
So lets summarize: a bootable foreign architecture SD card disk image is nothing else than a regular file representing the contents of the SD card as a block device. This disk image is created in my home directory and given enough free disk space there is nothing stopping me from writing any possible permutation of bits to that file. Obviously I'm interested in a permutation representing a valid partition table and file systems with sensible content. Why do I need superuser privileges to generate such a sensible permutation of bits?
Gladly it seems that the (at least in my opinion) hardest part of faking chroot
and executing foreign architecture package maintainer scripts is already
possible without superuser privileges by using fakeroot
and fakechroot
or
proot
together with qemu user mode emulation. But then there is still the
blocker of creating the disk image itself through some user mode loop mounting
of a filesystem occupying a virtual "partition" in the disk image.
Why has all this only become available so very recently and still requires a
number of workarounds to fully work in userspace? There exists a surprising
amount of scripts which wrap debootstrap
/multistrap
. Most of them require
superuser privileges. Does everybody just accept that they have to put a sudo
in front of every invocation and hope for the best? While this might be okay
for well tested code like debootstrap
and multistrap
the countless wrapper
scripts might accidentally (be it a bug in the code or a typo in the given
command line arguments) write to your primary hard disk instead of your SD
card. Such behavior can easily be mitigated by not executing any such script
with superuser privileges in the first place.
Operations like loop mounting affect the whole system. Why do I have to touch
anything outside of my home directory (/dev/loop
in this case) to populate a
file in it with some meaningful bits? Virtualization is no option because every
virtualization solution again requires root privileges.
One might argue that a number of solutions just require some initial setup by
root to then later be used by a regular user (for example /etc/fstab
configuration or the schroot
approach). But then again: why do I have to
write anything outside of my home directory (even if it is only once) to be
able to write something meaningful to a file in it?
The latter approach also does not work if one cannot become root in the first place or is limited by a virtualized environment. Imagine you are trying to build a Debian rootfs on a machine where you just have a regular user account. Or a situation I was recently in: I had a virtual server which denied me operations like loop mounting.
Given all these downsides, why is it still so common to just assume that one is
able and willing to use sudo
and be done with it in most cases?
I really wonder why technologies like fakeroot
and fakechroot
have only
been developed this late. Has this problem not been around since the earliest
days of Linux/Unix?
Am I missing something and rambling around for nothing? Is this idea a lost cause or something that is worth spending time and energy on to extend and fix the required tools?