Introduction to Linux namespaces - Part 1: UTS | Yet another enthusiast blog!

As a part of my job at OVH I dealt with Linux Namespaces as a security mechanism in a “yet to be announced” product. I was astonished by both how powerful and poorly documented it is.

[EDIT 2014-01-08] A Chinese translation of this post is available here:

Most of you have probably heard about LXC - LinuX Containers, “Chroot on steroids”. What it basically does is isolate applications from others. A bit like chroot does by isolating applications in a virtual private root but taking the process further. Internally, LXC relies on 3 main isolation infrastructure of the Linux Kernel:

Chroot
Cgroups
Namespaces

I could have entitled this article series “How to build your own LXC” and probably earned a better Google rank but that would have been quite a bit pretentious. In fact LXC does a lot more than isolation. It also brings template management, freezing, and much much more. What this series really about is more of demystifying than reinventing the wheel.

During this series, we will write a minimal C program starting /bin/bash with more isolation from steps to steps.

Let’s start.

What’s really interesting with Linux’ approach to containers is that precisely it does not provide a “back-box/magical” container solution but instead provides individual isolation building blocks called “Namespaces”, new one appearing from releases to release. It also allows you to use solely the one you actually need for your specific application.

As of 3.12, Linux supports 6 Namespaces:

UTS: hostname (this post)
IPC: inter-process communication (in a future post)
PID: “chroot” process tree (in a future post)
NS: mount points, first to land in Linux (in a future post)
NET: network access, including interfaces (in a future post)
USER: map virtual, local user-ids to real local ones (in a future post)

Here is a complete skeleton for cleanly launching /bin/bash from a child process: (error checking stripped for clarity/brevity)

#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>

#define STACK_SIZE (1024 * 1024)

static char child_stack[STACK_SIZE];
char* const child_args[] = {
  "/bin/bash",
  NULL
};

int child_main(void* arg)
{
  printf(" - World !\n");
  execv(child_args[0], child_args);
  printf("Ooops\n");
  return 1;
}

int main()
{
  printf(" - Hello ?\n");
  int child_pid = clone(child_main, child_stack+STACK_SIZE, SIGCHLD, NULL);
  waitpid(child_pid, NULL, 0);
  return 0;
}

Notice the use of the “clone” syscall instead of the more traditional “fork” syscall. This is where the magic (will) happen.

jean-tiare@jeantiare-Ubuntu:~/blog$ gcc -Wall main.c -o ns && ./ns
 - Hello ?
 - World !
jean-tiare@jeantiare-Ubuntu:~/blog$ # inside the container
jean-tiare@jeantiare-Ubuntu:~/blog$ exit
jean-tiare@jeantiare-Ubuntu:~/blog$ # outside the container

Ok, cool. But pretty hard to notice without the comments that we are in a child /bin/bash. Actually, while writing this post, I accidentally exited the parent shell a couple of times…

Wouldn’t it be cool if we could just change, let’s say, the hostname with 0% env vars tricks ? Just plain Namespaces ? Easy, just

add “CLONE_NEWUTS” flag to clone
call “sethostname” from child

// (needs root privileges (or appropriate capabilities))
//[...]
int child_main(void* arg)
{
  printf(" - World !\n");
  sethostname("In Namespace", 12);
  execv(child_args[0], child_args);
  printf("Ooops\n");
  return 1;
}

int main()
{
  printf(" - Hello ?\n");
  int child_pid = clone(child_main, child_stack+STACK_SIZE,
      CLONE_NEWUTS | SIGCHLD, NULL);
  waitpid(child_pid, NULL, 0);
  return 0;
}

Run it

jean-tiare@jeantiare-Ubuntu:~/blog$ gcc -Wall main.c -o ns && sudo ./ns
 - Hello ?
 - World !
root@In Namespace:~/blog$ # inside the container
root@In Namespace:~/blog$ exit
jean-tiare@jeantiare-Ubuntu:~/blog$ # outside the container

And that’s all folks! (for this first article, at least). Getting started with namespaces is pretty damn easy: clone, set appropriate “CLONE_NEW*” flags, setup the new env, done!

Would like to go further ? You might be interested in reading also the excellent LWN article series on namespaces.