Privileged Docker Containers
That –privileged Flag Looks Pretty Practical
Quite some time ago Docker in September 2013/Docker 0.6 announced proudly, that it is now possible to run Docker from within Docker. This was made possible by the new –privileged flag feature. An explanation, I posted as a Stackoverflow answer, on how you can do this you can find here. This feature allowed bypassing a prior constraint of using containers, they were unable to access the host system’s devices. A possible use case for this, aside from running Docker inside Docker, was allowing you to trivially use things like your web-cam and such from within Docker.
Devices and Standard Containers (Safe and Secure)
You can get a quick idea of the degree of containerization for a standard container by just running a standard Debian container bash and checking the visible hard disks and devices.
Just open a standard bash shell:
$ run -it debian:jessie /bin/bash -l
root@e4746be1718c:/#
Then check what devices are hard drives to you in there. On my trusty old Linux laptop I see something like this:
root@e4746be1718c:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-8:1-28836281-7f45eabc67f92f0e056275f099a943bcf104c0b915e95b96fd31ba9144327b5c 99G 197M 94G 1% /
tmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/sda1 440G 364G 54G 88% /etc/hosts
shm 64M 0 64M 0% /dev/shm
So even the standard container is able to see my hard drive on /dev/sda1 apparently. If I check the contents of /dev though, this is what I get:
root@e4746be1718c:/# ls /dev
console fd full fuse kcore mqueue null ptmx pts random shm stderr stdin stdout tty urandom zero
The hard-drive at /dev/sda1 is not there, which in almost all conceivable use cases for Docker, is a very good and important thing, you’ll see!
Devices and Privileged Containers ( You’re running something as root, you better know what you’re doing! )
So let’s move on to the privileged case. In all of this please remember, the Docker daemon always runs as root! As pointed out in this Github ticket for the Docker project, there is no way around that! Even if you can run Docker commands as non-root, the daemon is always running as root and that’s what matters here! You simply cannot set the daemon to run as a non-root process for technological reasons. So lets see what we allow the privileged container, running from a process owned by root, to see and do on our host system.
Again open a standard bash shell, but this time run it in a privileged container:
$ run -it --privileged debian:jessie /bin/bash -l
root@8f766733df83:/#
So far so good, so lets see what we get for our hard drives this time around:
root@8f766733df83:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-8:1-28836281-59c1349e06bfd1b6939dfa667e0203cacaf1466eaf50c56001d767028326504e 99G 197M 94G 1% /
tmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/sda1 440G 364G 54G 88% /etc/hosts
shm 64M 0 64M 0% /dev/shm
No changes here, but lets look at what devices we have available now:
root@8f766733df83:/# ls /dev
autofs dm-6 kvm mem rfkill stdout tty2 tty32 tty45 tty58 uhid vcs7 watchdog
bsg dm-7 loop-control memory_bandwidth rtc0 tty tty20 tty33 tty46 tty59 uinput vcsa watchdog0
btrfs-control dm-8 loop0 mqueue sda tty0 tty21 tty34 tty47 tty6 urandom vcsa1 zero
bus dri loop1 ndctl0 sda1 tty1 tty22 tty35 tty48 tty60 vboxdrv vcsa2
console fb0 loop2 net sda2 tty10 tty23 tty36 tty49 tty61 vboxdrvu vcsa3
cpu fd loop3 network_latency sda5 tty11 tty24 tty37 tty5 tty62 vboxnetctl vcsa4
cpu_dma_latency full loop4 network_throughput sg0 tty12 tty25 tty38 tty50 tty63 vboxusb vcsa5
cuse fuse loop5 null sg1 tty13 tty26 tty39 tty51 tty7 vcs vcsa6
dm-0 hidraw0 loop6 port shm tty14 tty27 tty4 tty52 tty8 vcs1 vcsa7
dm-1 hidraw1 loop7 ppp snapshot tty15 tty28 tty40 tty53 tty9 vcs2 vfio
dm-2 hpet mapper psaux snd tty16 tty29 tty41 tty54 ttyS0 vcs3 vga_arbiter
dm-3 input mcelog ptmx sr0 tty17 tty3 tty42 tty55 ttyS1 vcs4 vhci
dm-4 kcore media0 pts stderr tty18 tty30 tty43 tty56 ttyS2 vcs5 vhost-net
dm-5 kmsg mei0 random stdin tty19 tty31 tty44 tty57 ttyS3 vcs6 video0
Bam, that looks a little different from before doesn’t it ? Now our hard drive isn’t only there in name. We can actually access the device! This means, we should be able to mount it inside the container. Lets try it:
root@8f766733df83:/# mkdir /mountedhd
root@8f766733df83:/# mount /dev/sda1 /mountedhd/
And …
root@8f766733df83:~# ls /mountedhd/
app boot dev home initrd.img.old lib32 libx32 media opt root sbin src sys usr vmlinuz
bin data etc initrd.img lib lib64 lost+found mnt proc run screenshots srv tmp var vmlinuz.old
we got the host file system mounted inside the “container”, making it not a container at all!
Depending on your Docker host configuration you might even be able to see the contents of proc/1/ns/pid and simply nsenter into a shell in the host. And well … if things aren’t configured as the container likes … the container should have an easy time making the config a little more pleasing to it’s nefarious plans.
Bottom Line
Don’t use privileged containers unless you treat them the same way you treat any other process running as root. Newer Docker versions allow you more fine grained control over the containers device access anyhow, check this out for more documentation on container capabilities and rights management.
Corner Cases
If you want to try this out on EC or a VM additional steps might be needed to make the device descriptor appear. On EC2 all you need to do is run:
root@8f766733df83:/# file -s /dev/xvda1
and you got a descriptor of the host’s hd that you can mount.
On Virtualbox/Boot2Docker you need to make the logical volumes appear as proper devices:
root@8f766733df83:/# vgchange -ay
is your friend here. If this doesn’t work you might need to activate the needed kernel module for handling logical volumes the way we want it here:
root@8f766733df83:/# modprobe dm-mod
The fact that this actually works should scare you straight enough to not use privileged containers unless you have to btw ;)