Seccomp is basic yet efficient way to filter syscalls issued by a program. It is especially useful when running untrusted third party programs. Actually, it was first introduced in linux 2.6.12 as an essential building block of “cpushare” program. The idea behind this project was to allow anyone with the proper agent installed to rent cpu cycles to third parties, without compromising its the security.
The initial implementation, also known as “mode 1 seccomp” only allowed ‘read
‘, ‘write
‘, ‘_exit
‘ and ‘sigreturn
‘ syscalls to be issued making it only possible to read/write to already opened files and to exit. It is also trivial get started with:
#include <stdio.h> /* printf */ #include <sys/prctl.h> /* prctl */ #include <linux/seccomp.h> /* seccomp's constants */ #include <unistd.h> /* dup2: just for test */ int main() { printf("step 1: unrestricted\n"); // Enable filtering prctl(PR_SET_SECCOMP, SECCOMP_MODE_STRICT); printf("step 2: only 'read', 'write', '_exit' and 'sigreturn' syscalls\n"); // Redirect stderr to stdout dup2(1, 2); printf("step 3: !! YOU SHOULD NOT SEE ME !!\n"); // Success (well, not so in this case...) return 0; }
Build, run, test:
gcc 01-nothing.c -o 01-nothing && ./01-nothing; echo "Status: $?"
Output:
step 1: unrestricted step 2: only 'read', 'write', '_exit' and 'sigreturn' syscalls Processus arrêté Status: 137 <------ 128+9 ==> SIGKILL
See the return status ? Whenever a forbidden syscall is issued, the program is immediately killed.
While this is really cool, this is also somewhat over-restrictive. This is the reason why it saw such a little adoption. Linus Torvald even suggested to ax it out of the kernel!
Fortunately, since linux 3.5, it is also possible to define advanced custom filters based on the BPF (Berkley Packet Filters). These filters may apply on any of the syscall argument but only on their value. In other words, a filter won’t be able to dereference a pointer. For example one could write a rule to forbid any call to ‘dup2
‘ as long as it targets ‘stderr
‘ (fd=2) but would not be able to restrict ‘open
‘ to a given set of files neither bind to a specific interface or port number.
Once installed, each syscall is sent to the filter which tells what action to take:
SECCOMP_RET_KILL
: Immediate kill with SIGSYSSECCOMP_RET_TRAP
: Send a catchable SIGSYS, giving a chance to emulate the syscallSECCOMP_RET_ERRNO
: Forceerrno
valueSECCOMP_RET_TRACE
: Yield decision to ptracer or seterrno
to-ENOSYS
SECCOMP_RET_ALLOW
: Allow
Enough words. Let’s allow the program to redirect its stderr
to stdout
but nothing else. Writing BPF directly is cumbersome and far beyond the scope of this post, we’ll use the libseccomp
helper to make the code easier to write… and read. Error checking stripped for brevity.
Grab the library:
sudo apt-get install libseccomp-dev
Write the code:
#include <stdio.h> /* printf */ #include <unistd.h> /* dup2: just for test */ #include <seccomp.h> /* libseccomp */ int main() { printf("step 1: unrestricted\n"); // Init the filter scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_KILL); // default action: kill // setup basic whitelist seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigreturn), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0); // setup our rule seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(dup2), 2, SCMP_A0(SCMP_CMP_EQ, 1), SCMP_A1(SCMP_CMP_EQ, 2)); // build and load the filter seccomp_load(ctx); printf("step 2: only 'write' and dup2(1, 2) syscalls\n"); // Redirect stderr to stdout dup2(1, 2); printf("step 3: stderr redirected to stdout\n"); // Duplicate stderr to arbitrary fd dup2(2, 42); printf("step 4: !! YOU SHOULD NOT SEE ME !!\n"); // Success (well, not so in this case...) return 0; }
Build, run, test:
gcc 02-bpf-only-dup-sudo.c -o 02-bpf-only-dup-sudo -lseccomp && sudo ./02-bpf-only-dup-sudo; echo "Status: $?"
Output:
step 1: unrestricted step 2: only 'write' and dup2(1, 2) syscalls step 3: stderr redirected to stdout Appel système erroné Status: 159 <------ 128+31 ==> SIGSYS
Just as expected.
As you probably noticed, we ran the previous example as root which somewhat limits the security benefice of syscall filtering as we actually have MORE privileges than before…
This is where it really gets interesting: filters are inherited by child processes so that one could technically apply syscall filters to ‘sudo’ and maybe defeat some of its security measures and gain root on the machine ? To prevent this, one must either be ‘CAP_SYS_ADMIN
‘ (read: root), either explicitly accept to never get any more privileges. For example the ‘setuid
‘ bit of ‘sudo
‘ would not be honored.
This can easily be achieved by adding this snippet before installing the filter:
prctl(PR_SET_NO_NEW_PRIVS, 1);
Another security note, remember the SECCOMP_RET_TRACE
filter return value ? It instructs the kernel to notify the ptracer program, if any, to take the final decision. Hence the “secured” program could be run under a malicious ptracer possibly defeating the security measures. This is why another prctl
is highly recommended to forbid any attempt to attach a ptracer:
prctl(PR_SET_DUMPABLE, 0);
Putting it all together we get:
#include <stdio.h> /* printf */ #include <unistd.h> /* dup2: just for test */ #include <seccomp.h> /* libseccomp */ #include <sys/prctl.h> /* prctl */ int main() { printf("step 1: unrestricted\n"); // ensure none of our children will ever be granted more priv // (via setuid, capabilities, ...) prctl(PR_SET_NO_NEW_PRIVS, 1); // ensure no escape is possible via ptrace prctl(PR_SET_DUMPABLE, 0); // Init the filter scmp_filter_ctx ctx; ctx = seccomp_init(SCMP_ACT_KILL); // default action: kill // setup basic whitelist seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(rt_sigreturn), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0); seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0); // setup our rule seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(dup2), 2, SCMP_A0(SCMP_CMP_EQ, 1), SCMP_A1(SCMP_CMP_EQ, 2)); // build and load the filter seccomp_load(ctx); printf("step 2: only 'write' and dup2(1, 2) syscalls\n"); // Redirect stderr to stdout dup2(1, 2); printf("step 3: stderr redirected to stdout\n"); // Duplicate stderr to arbitrary fd dup2(2, 42); printf("step 4: !! YOU SHOULD NOT SEE ME !!\n"); // Success (well, not so in this case...) return 0; }
Build, run, test:
gcc 03-bpf-only-dup.c -o 03-bpf-only-dup -lseccomp && ./03-bpf-only-dup; echo "Status: $?"
Output:
step 1: unrestricted step 2: only 'write' and dup2(1, 2) syscalls step 3: stderr redirected to stdout Appel système erroné Status: 159 <------ 128+31 ==> SIGSYS
There we are: no more “sudo” to run it
Linux’s seccomp is an extremely powerful tool when dealing with untrusted program’s on Linux. (who said in “shared hosting environment”?). And we only scratched its surface. Please, keep in mind that seccomp is only a tool and should be used in combination with other Linux’s security building blocks such as namespaces and capabilities to unleash its full power.
Example applications:
- prevent “virtual priv esc” -> clone && unshare CLONE_NEW_USER
- prevent std{in,out,err} escape -> block
close
,dup2
- restrict read/write to std{in,out,err}
- change limits (rlimits)
- … -> see man 2 syscalls for more ideas 😉
What you still can’t do:
filter base on filename: no pointer dereference
filter base on port/ip: same reason
Going further:
kernel seccomp documentation and samples (low level BPF)
ptrace interaction: overcome the “What you still can’t do” section.