In previous parts, we’ve covered the basic principles of the bastion. We then explained how delegation was at the core of the system. This time, we’ll dig into some governing principles of how The Bastion is written.

In a nutshell, the main purpose of the bastion is to ensure security, auditability and reliability in all cases. To this end, the bastion is engineered in a very specific way, with some principles that must be respected when implementing new features. Today we’re going to zoom in on how one of the functionalities of the bastion has been implemented to ensure an in-depth security. There are technical details ahead, so viewer discretion is advised!

The operating system is not just a scheduler

One of the engineering principles of the bastion is to leverage the underlying operating system’s security features, as additional guards on top of the code’s logic itself.

Usually, when developing a program, one doesn’t really need to think about the OS it’ll be running on, because all the business logic goes directly into the code. At its basic level, the OS’s job is to ensure the program runs on top of the hardware it has in charge, by abstracting it, along with the other pieces of software that might be sharing this hardware. In other words, most of the time the OS is mainly a scheduler, whose job is to ensure all the programs are running properly, and don’t step on each other’s toes.

To this end, an OS has the notion of user (or “account”), who may be the owner of some running programs and some files on the filesystem, alongside the notion of group (of users), so that e.g. a folder can be written to by several users. We’ll go back to this in a few minutes.

Now, let’s talk about applications. Most of the time, applications needing to handle users have a database with a “users” table, detailing the information about each user. In that case, the application’s code logic handles all the behaviour the program must have with respect to its users. For example, to authenticate a user, it stores a hash of each user password in the database, and checks whether the entered password’s hash matches what is stored in the database. If it does, then it deems the user to be successfully logged in. All this logic is entirely expressed in the code, the operating system plays no role in the process whatsoever.

There is then, only one operating system user dedicated to the application, regardless of how many users exist in the application’s database. The application will run under this OS user, and all files logically pertaining to different users in the application’s functional view, will be owned by this same OS user. It works because the segregation between the functional users is done entirely by the code: even if the application can technically access all its users files, it will only allow, through its code logic, access to the proper files for the proper user.

Code has bugs, but it shouldn’t matter

Now, let’s imagine we’re talking about a program – let’s name it MySuperCloudApp – whose job is to store files for its users, so that they can later fetch them from the cloud. Let’s imagine there is a flaw in the code (of course, this never happens), which doesn’t properly escape the user’s requested file name. If, once logged in as my user, I request a download of the file named myfile.txt, the application will allow it because I’m logged in.

But what happens if I request ../somebodyelse/herfile.txt, instead? If the code hasn’t been engineered to detect and filter out this weird request, it’ll just pass the read command to the underlying filesystems, which will allow it because, remember, the application runs under one OS user and all the actual user logic is handled by the application itself. All the application files are owned by the same OS user, so the request seems completely legitimate from an OS standpoint. I’ve just found a way to steal all the other users files. This type of flaw is called a path traversal, and is, unfortunately, pretty common.

For the bastion, the OS is more than a scheduler: every bastion user is actually mapped to an operating system user underneath. Likewise, every bastion group is mapped to an operating system group underneath. So are all the group roles we’ve talked about in the previous post. This is a strong design choice: we end up with an application that is deeply intertwined with the OS it’s running on, and this comes with some cons. However, for a security asset, which the bastion is, the pros vastly outgrow them.

Had MySuperCloudApp have adopted this design, mapping its application users to actual OS users, then the attack we’ve talked about before wouldn’t have worked. Even if the application’s code was flawed, and passed the read request to the OS below, the OS would have denied it, because down at the OS level, ../somebodyelse/herfile.txt is not owned by the same user. This is where the OS comes to rescue a flawed portion of code (which still needs to be corrected in all cases, of course!).

To take a more Bastion-y example, if a user pertains to groupA, and tricks the code into thinking it also pertains to groupB (because of a flaw in the bastion’s code logic), then it doesn’t matter too much because the OS will deny this user access to groupB‘s keys, as he won’t have access to read the file down to the OS level. So he still won’t be able to access any of groupB‘s servers. Technically, this is done by offloading the authentication part to sshd, which is well-known and does it quite well. When this phase succeeds, sshd creates a session under the proper OS user, and starts the bastion code entry point under this session.

We use the OS as an additional safety net in case there is a logic error or a vulnerability in the code: even if the code is tricked into taking bad decisions, the underlying OS will be there to deny the action, hence nullifying the impact.

In other words, all the OS bastion users have the bastion code declared as their system shell (instead of the usual /bin/sh). We’re even going further than that: the code is engineered in such a way that if a user succeeded in getting a real shell on the bastion, i.e. being able to run any command he’d like on the OS itself, completely bypassing all of the bastion code’s logic and checks, then he shouldn’t be able to do much more that what the normal bastion code logic allows him to. That’s another strong design principle, but helps to drastically reduce the impact of a security vulnerability, should it happen.

Trust no one

For some features to work correctly, the design choices we’ve outlined above implies that the bastion must sometimes create and delete users on the OS level. This can’t be done using unprivileged accounts, hence some parts of the code need to run under elevated privileges.

In The Bastion jargon, those portions of the code are called helpers, and are separated from the other portions of the code, normally running under the OS user corresponding to the functional bastion user who’s running them.

The helpers don’t trust the rest of the bastion code, so they never blindly trust what is passed as input to them, even if theoretically, this input has already been validated by the bastion code launching the helper. Their higher privilege is granted using the sudo command, with a very strict sudoers configuration which ensures that the caller can only run the helpers it’s supposed to run, and with the parameters it’s supposed to be allowed to specify. Once the helper has finished working, it communicates back information to its caller using JSON.

Let’s take the example of the groupAddServer command. As its name implies, this command is used by a group aclkeeper to add a new server to a bastion group. Let’s say the user guybrush is a gatekeeper of the bastion group island. On the OS level, the OS user guybrush will be a member of the island-aclkeeper system group. One part of the sudoers configuration will say this:

%island-aclkeeper ALL=(island) NOPASSWD: /usr/bin/env perl -T /opt/bastion/bin/helper/osh-groupAddServer --group island *

This line translates to:

all the members of the island-aclkeeper system group (i.e. all the aclkeepers of the island bastion group) can run, as the island system user, the osh-groupAddServer perl script, in tainted mode, but with the command line options forced to start with --group island

The island system user is not mapped to a logical user of the bastion, this is a technical account representing the island bastion group. The file listing the servers of the island bastion group is owned by this system user, and only the aclkeepers, through this sudo rule, can impersonate this system user to add a server to their group. Also note, that the Perl taint mode is used here (-T). This is a special mode that instructs Perl to immediately halt execution of the program (here, the helper) if an attempt is made to use a variable influenced (tainted) by the outside environment, without checking for its validity first. This is an additional protection to ensure that an improperly sanitized input can’t make it through the program’s execution flow.

Going down the rabbit hole with minijail

For some plugins, we even went one level deeper. For example, we have a plugin to allow users to connect to a PostgreSQL database, using the classic psql client, but directly from the bastion. The idea is that the password to access the database is known to the bastion, not to the user, so the password can be extremely complex, and change every day if necessary. This is completely transparent to the user, who just connects to the bastion and asks to run the database plugin. This scheme is the same than when using SSH on both sides: as seen in the first post of this series, the ingress connection is between the user and the bastion (SSH), and the egress connection is between the bastion and the remote server. The only difference is that, in this case, the egress connection is not SSH, but SQL.

But how to secure psql so that, when running on the bastion, the user can’t escape from it? The problem is the same with the mysql client. Those programs are engineered to be run from the local computer, where the user can already run any command, so there’s no real reason to add a configuration option to those programs that forbids local execution of arbitrary commands (shell escape). However on the bastion, we don’t want to allow that. Of course maintaining a forked version of these SQL clients is a complete no-no, because the time we would allocate to maintaining these forks would be of better use in other projects. Instead, we’ve used a tool named minijail, whose purpose is to make readily available, to any program, the (not so) recent features from the Linux Kernel – such as namespaces, capabilities, seccomp, the no_new_privs prctl() flag, etc. We’re not going to detail each and every one of these features, there’s a lot of material online about these, but rather zoom in on how we’ve used them in the context of The Bastion.

Let’s start with the conclusion: here is how it looks on the bastion system itself, while somebody is using the database plugin:

Don’t Panic yet, let’s go through this line by line.

The first line (PID 16) is the sshd system daemon. Nothing fancy here, this is your usual friendly daemon, listening on port 22 for incoming SSH connections.

The second line (PID 413) is the privileged process specially spawned when guybrush logged in successfully on the server. This is also completely standard SSH behavior: when somebody logs in, two sshd processes are spawned by the daemon, a privileged one, and an unprivileged one. Both are dedicated to handling the user, while the parent (the daemon) continues listening for new connections.

The third line (PID 417) is the corresponding unprivileged sshd process for guybrush. This one is responsible for starting up guybrush‘s shell as soon as he’s logged in. Note that from now on, and until further notice, all code is executed under the own user’s (absence of) privileges.

The fourth line (PID 418) is guybrush‘s shell. This is where it’s starting to differ from your usual server. In this case, the shell is not /bin/bash or /bin/zsh, but a portion of the code of the Bastion. As explained above, the bastion is declared as the user’s shell, so when somebody logs in, this is what gets executed instead of a more regular POSIX shell. This portion of the code is responsible for parsing the command-line the user specified, and executing the corresponding action, if this action is allowed. In this case, the user passed the -i parameter, which asks the bastion to start in interactive mode. This is a special mode where it’s easier to launch several bastion commands without having to re-authenticate oneself each time. So, this process is listening for commands from the user. Note that, at this stage, the user has already been authenticated by the system – as this is completely delegated to sshd. If the authentication fails, the user’s shell (here, the bastion code) is never executed.

The fifth line (PID 497) is the child of the interactive process, re-executing the users shell (osh.pl) with new parameters: --osh db, which will instruct this instance of the shell that the user wants to run the db bastion command.

The sixth line (PID 502) is the current bastion command the user is executing. This is the db plugin, and we can see part of the command line: --name lechuck, this tells the plugin that the users wants to connect to the database named lechuck.

The seventh line (PID 503) is the ttyrec parent process, as explained in the first post series, the entire console output of the session is being recorded by the bastion – this process is in charge of doing it.

The eighth line (PID 504) is the ttyrec child process, needed for pseudo-tty support, which in turn is needed for the recording. If you really want to know more about pseudo-ttys, head on to man openpty and/or the ttyrec code itself.

The ninth line (PID 505) is the sudo call to start minijail. This is needed because minijail needs to be root for a proper setup of the jail, before downgrading itself to an unprivileged account

The tenth line (PID 506) is sudo‘s child, this one is in charge of starting the subcommand (minijail in that case)

The eleventh line (PID 507) is the invocation of minijail. The complete command line we’re launching is:

/bin/minijail0 --logging=stderr -u guybrush -g guybrush -n -v --uts -d -P /tmp/chroot-guybrush-psql-wsvhp4 -S /etc/bastion/minijail/db-psql.seccomp -b /lib64 -b /lib -b /usr/lib -b /usr/share -k /home/guybrush/.psql /profile bind 0x10100E rw --set-env HOME=/ --set-env USER=guybrush --set-env LOGNAME=guybrush -- /usr/lib/postgresql/11/bin/psql --pset=pager=off -h dbserver.example.org -p 5432 -U lechuck -- lechuck

Quite a beast. But let’s go through this step by step.

This tells minijail to setup a new IPC namespace (--uts), and to set the no_new_privs flag (-n), so that any part of the process it creates (and those processes own children) will never ever be able to be root again, no matter what. Under a no_new_privs process, even having a wildcard sudoers file, or knowing the root password and attempting to use su, is not enough to get back to UID 0. You just can’t.

We also ask minijail to create a new mount namespace (-v) then pivot_root (-P) to a temporary empty directory, /tmp/chroot-guybrush-psql-wsvhp4, so that the whole filesystem becomes completely inaccessible. As we still need to be able to run an SQL client in this environment, we bind-mount a few important directories in this new namespace, such as /lib64, /lib and such, and also just one directory in read-write, located into the users’s own home directory, so that from inside this jail, it can still have its .psql_history and .psqlrc files from past sessions.

We also set a few environments variables, so that the SQL CLI is not lost (HOME, USER, LOGNAME), then setup a seccomp policy on top of all that, to limit which syscalls can be made from this environment. For example, the execve() syscall is forbidden: the SQL CLI can not create any other process, or it’ll get terminated. Last but not least, when all of this has been set up by minijail, it drops its privileges to the guybrush user (-u) and guybrush group (-g), before executing the psql binary.

The twelfth line (PID 508) is the psql process itself, running inside the jail we’ve built above. This way, it is extremely difficult to escape the psql binary and get out of the jail. The whole setup instantly disappears when the user disconnects. The only remains will be his .psql_history and .psqlrc files. Of course, the ttyrec session record of his SQL usage will remain, too (as executed outside of the jail).

This concludes the post, where we’ve been detailing how some design principles help in delivering a resilient and secure system. Next week, in the final post of this series, we’ll be announcing something special. Stay tuned!

Stéphane Lesimple

+ posts

Head of Security Tools Squad