That –privileged Flag Looks Pretty Practical

Quite some time ago Docker in September 2013/Docker 0.6 announced proudly, that it is now possible to run Docker from within Docker. This was made possible by the new –privileged flag feature. An explanation, I posted as a Stackoverflow answer, on how you can do this you can find here. This feature allowed bypassing a prior constraint of using containers, they were unable to access the host system’s devices. A possible use case for this, aside from running Docker inside Docker, was allowing you to trivially use things like your web-cam and such from within Docker.

Devices and Standard Containers (Safe and Secure)

You can get a quick idea of the degree of containerization for a standard container by just running a standard Debian container bash and checking the visible hard disks and devices.

Just open a standard bash shell:

$ run -it debian:jessie /bin/bash -l
root@e4746be1718c:/#

Then check what devices are hard drives to you in there. On my trusty old Linux laptop I see something like this:

root@e4746be1718c:/# df -h
Filesystem                                                                                        Size  Used Avail Use% Mounted on
/dev/mapper/docker-8:1-28836281-7f45eabc67f92f0e056275f099a943bcf104c0b915e95b96fd31ba9144327b5c   99G  197M   94G   1% /
tmpfs                                                                                             7.7G     0  7.7G   0% /dev
tmpfs                                                                                             7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/sda1                                                                                         440G  364G   54G  88% /etc/hosts
shm                                                                                                64M     0   64M   0% /dev/shm

So even the standard container is able to see my hard drive on /dev/sda1 apparently. If I check the contents of /dev though, this is what I get:

root@e4746be1718c:/# ls /dev
console  fd  full  fuse  kcore	mqueue	null  ptmx  pts  random  shm  stderr  stdin  stdout  tty  urandom  zero

The hard-drive at /dev/sda1 is not there, which in almost all conceivable use cases for Docker, is a very good and important thing, you’ll see!

Devices and Privileged Containers ( You’re running something as root, you better know what you’re doing! )

So let’s move on to the privileged case. In all of this please remember, the Docker daemon always runs as root! As pointed out in this Github ticket for the Docker project, there is no way around that! Even if you can run Docker commands as non-root, the daemon is always running as root and that’s what matters here! You simply cannot set the daemon to run as a non-root process for technological reasons. So lets see what we allow the privileged container, running from a process owned by root, to see and do on our host system.

Again open a standard bash shell, but this time run it in a privileged container:

$ run -it --privileged debian:jessie /bin/bash -l
root@8f766733df83:/#

So far so good, so lets see what we get for our hard drives this time around:

root@8f766733df83:/# df -h
Filesystem                                                                                        Size  Used Avail Use% Mounted on
/dev/mapper/docker-8:1-28836281-59c1349e06bfd1b6939dfa667e0203cacaf1466eaf50c56001d767028326504e   99G  197M   94G   1% /
tmpfs                                                                                             7.7G     0  7.7G   0% /dev
tmpfs                                                                                             7.7G     0  7.7G   0% /sys/fs/cgroup
/dev/sda1                                                                                         440G  364G   54G  88% /etc/hosts
shm                                                                                                64M     0   64M   0% /dev/shm

No changes here, but lets look at what devices we have available now:

root@8f766733df83:/# ls /dev
  autofs		 dm-6	  kvm		mem		    rfkill    stdout  tty2   tty32  tty45  tty58  uhid	      vcs7	   watchdog
  bsg		 dm-7	  loop-control	memory_bandwidth    rtc0      tty     tty20  tty33  tty46  tty59  uinput      vcsa	   watchdog0
  btrfs-control	 dm-8	  loop0		mqueue		    sda       tty0    tty21  tty34  tty47  tty6   urandom     vcsa1	   zero
  bus		 dri	  loop1		ndctl0		    sda1      tty1    tty22  tty35  tty48  tty60  vboxdrv     vcsa2
  console		 fb0	  loop2		net		    sda2      tty10   tty23  tty36  tty49  tty61  vboxdrvu    vcsa3
  cpu		 fd	  loop3		network_latency     sda5      tty11   tty24  tty37  tty5   tty62  vboxnetctl  vcsa4
  cpu_dma_latency  full	  loop4		network_throughput  sg0       tty12   tty25  tty38  tty50  tty63  vboxusb     vcsa5
  cuse		 fuse	  loop5		null		    sg1       tty13   tty26  tty39  tty51  tty7   vcs	      vcsa6
  dm-0		 hidraw0  loop6		port		    shm       tty14   tty27  tty4   tty52  tty8   vcs1	      vcsa7
  dm-1		 hidraw1  loop7		ppp		    snapshot  tty15   tty28  tty40  tty53  tty9   vcs2	      vfio
  dm-2		 hpet	  mapper	psaux		    snd       tty16   tty29  tty41  tty54  ttyS0  vcs3	      vga_arbiter
  dm-3		 input	  mcelog	ptmx		    sr0       tty17   tty3   tty42  tty55  ttyS1  vcs4	      vhci
  dm-4		 kcore	  media0	pts		    stderr    tty18   tty30  tty43  tty56  ttyS2  vcs5	      vhost-net
  dm-5		 kmsg	  mei0		random		    stdin     tty19   tty31  tty44  tty57  ttyS3  vcs6	      video0

Bam, that looks a little different from before doesn’t it ? Now our hard drive isn’t only there in name. We can actually access the device! This means, we should be able to mount it inside the container. Lets try it:

root@8f766733df83:/# mkdir /mountedhd
root@8f766733df83:/# mount /dev/sda1 /mountedhd/

And …

root@8f766733df83:~# ls /mountedhd/
app  boot  dev	home	    initrd.img.old  lib32  libx32      media  opt   root  sbin	       src  sys  usr  vmlinuz
bin  data  etc	initrd.img  lib		    lib64  lost+found  mnt    proc  run   screenshots  srv  tmp  var  vmlinuz.old

we got the host file system mounted inside the “container”, making it not a container at all!

Depending on your Docker host configuration you might even be able to see the contents of proc/1/ns/pid and simply nsenter into a shell in the host. And well … if things aren’t configured as the container likes … the container should have an easy time making the config a little more pleasing to it’s nefarious plans.

Bottom Line

Don’t use privileged containers unless you treat them the same way you treat any other process running as root. Newer Docker versions allow you more fine grained control over the containers device access anyhow, check this out for more documentation on container capabilities and rights management.

Corner Cases

If you want to try this out on EC or a VM additional steps might be needed to make the device descriptor appear. On EC2 all you need to do is run:

root@8f766733df83:/# file -s /dev/xvda1

and you got a descriptor of the host’s hd that you can mount.

On Virtualbox/Boot2Docker you need to make the logical volumes appear as proper devices:

root@8f766733df83:/# vgchange -ay

is your friend here. If this doesn’t work you might need to activate the needed kernel module for handling logical volumes the way we want it here:

root@8f766733df83:/# modprobe dm-mod

The fact that this actually works should scare you straight enough to not use privileged containers unless you have to btw ;)