VGS Engineering Blog

The latest updates from our developer community

Secure Compute Part 2: gVisor Runtime on EKS

In this post, we want to build a platform that can run containers securely on an Amazon Elastic Kubernetes Service (EKS) Cluster. The first component we will look at is integrating a secure container runtime. We will explore the gVisor sandboxed container runtime environment and how to integrate it with EKS.

Packaging and Deploying an Application#

Containers have become the standard unit for packaging and running an application and its dependencies. Container-orchestration systems such as Kubernetes are used to deploy and manage containerized applications. We will use Amazon EKS to manage our kubernetes deployments, services and applications. In this way, we can develop our application, package it into a container and deploy it on our kubernetes cluster. Alternatively, as we will show in the next blog post, we can also use a serverless function framework such as OpenFaaS to write functions and deploy them using prebuilt container images, without having to deal with the complexity of building containers from scratch. Either way, containers are at the heart of both solutions.

Container Isolation Technologies#

To run containers securely, we need to isolate them from the host kernel and other processes running on the machine. Traditional containerization technologies such as Docker, containerd, and Linux Containers (LXC) do not provide the proper isolation as they run containers on the host OS kernel directly, leaving room for a large attack surface. This is especially critical in multi-tenant cloud environments and when running untrusted workloads on your cluster. Container isolation technologies such as Google gVisor, OpenStack Kata-containers, Amazon Firecracker and IBM Nabla-containers have emerged to combat this problem. Figure

Container Isolation

gVisor is an application kernel that provides a sandboxed environment for containers by implementing a sandboxed guest kernel which intercepts and services most system calls, isolating containers from the host. Similarly, Nabla-containers use a library OS to reduce the attack surface by customizing the OS constructs to the applications’ needs and reducing the number of system calls to the host. On the other hand, Kata-containers run containers inside lightweight VMs, relying on hardware virtualization for isolation. Finally, Firecracker creates and manages microVMs using hardware virtualization, and it can be used with tools such as Weave Ignite to run container images inside microVMs.

Both Kata-containers and Firecracker require the Linux Kernel-based Virtual Machine (KVM), which can only be supported on bare-metal instances or through nested virtualization. Since EKS manages virtualized EC2 instances with nested virtualization disabled, both these options are not viable. As for Nabla-containers, it only supports container images built specifically for Nabla, and does not support dynamic loading of libraries, along with other discouraging limitations. gVisor, on the other hand, handles OCI images out-of-the-box, integrates easily with EKS, and does not have any hardware virtualization requirements (KVM can be used as a platform to intercept syscalls, but ptrace is the default alternative).

Kubernetes and Container Runtimes#

Figure

Kubernetes Components

As mentioned earlier, we will use EKS to create and manage our Kubernetes clusters. Kubernetes is a container-orchestration system used to deploy and manage containerized applications. A kubernetes cluster is a collection of physical or virtual machines connected using a shared network. Typically one machine is designated to be the master server, where kubernetes cluster management components are deployed, and all other machines, called Nodes, are responsible for running the workloads. Kubernetes workloads are called Pods, which are tightly coupled containers that represent a single application. As we are interested in configuring the container runtime environment for containers, we need to understand the environment in which they are created and executed, and thus understand what the different components on nodes are and how they are configured.

Nodes consist of three components: the container runtime, kubelet and the kube-proxy. The default container runtime used is Docker, but other runtimes such as containerd and CRI-O are also supported. Kubelet is the main component on the node that communicates with the master server and is responsible for running and monitoring Pods. For our purposes, it is the component that identifies and controls the container runtime to be used. Finally, the kube-proxy, which is responsible for maintaining network rules, does not play a role configuring our container runtime environment and so we will not be discussing it in more detail.

Configuring gVisor with containerd on EKS#

We are interested in configuring gVisor (OCI runtime called runsc) as the pod container runtime on our EKS cluster. gVisor integrates with kubernetes using the containerd runtime and containerd-shim. Therefore, we will need to install the containerd CRI on all nodes intended to use runsc runtime handler, and configure kubelet to use the containerd as its container runtime.

The first step is installing containerd. In our EKS setup, we use the current default Amazon Linux 2 AMI (ami-03cb83c4dfe25bd99) for our nodes, which comes with containerd preinstalled. For other AMIs, or constructing your own custom AMI, you can install containerd by following the kubernetes guide, and use the following commands to verify that the prerequisites are correctly configured and that containerd is installed.

# check kernel modules
$ modinfo overlay br_netfilter

# check kernel params
$ sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward net.bridge.bridge-nf-call-ip6tables

# verify containerd running
$ sudo systemctl status containerd

Then, we need to configure the Kubelet service (in /etc/systemd/system/kubelet.service.d) to point to the containerd socket, inform systemd, and restart kubelet.

$ sudo sed -i 's;--container-runtime=docker;--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock;' /etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet

You can learn more about configuring kubelet here.

Finally, we need to install gVisor and gVisor shim, and configure containerd to add runsc runtime handler support. Now, we can use kubernetes runtimeclass to deploy our pods using gVisor.

Putting it all together: Demo my Cluster#

Awesome! So now we know what secure container runtimes are, and how to integrate them into our EKS cluster. Let us put it all together with a step-by-step guide on how to configure your EKS cluster to run untrusted workloads using gVisor from scratch!

0. Prerequisites: Make sure eksctl & kubectl are installed, and your AWS credentials with a registered SSH key and proper IAM roles are setup. Follow the Getting started with Amazon EKS for more information.

1. First, if you have not already, let us create a simple EKS cluster with two nodes and SSH access.

$ export  eks_key=/path/to/key/eks_key_name.pem
$ eksctl create cluster --name my-secure-compute --nodes 2 --region us-west-2 --ssh-access --ssh-public-key eks_key_name

2. To ensure that pods land on nodes that support gVisor, we need to label our nodes appropriately and later specify a nodeselector for pods in our runtime class. We will only be using the first node for our untrusted workloads, and labeling it with runtime=gvisor.

$ export gVisor_node_name=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
$ kubectl label node $gVisor_node_name runtime=gvisor

3. Now, we will SSH into the gVisor node and configure it with containerd and the runsc runtime handler as previously described.

# SSH into gVisor node
$ export gVisor_node_EIP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}')
$ ssh -i $eks_key ec2-user@$gVisor_node_EIP

On gVisor node:

# Install dependencies
$ sudo yum install -y git

# Install Golang
$ wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz
$ sudo tar -C /usr/local -xzf go1.14.4.linux-amd64.tar.gz

$ GOROOT=/usr/local/go
$ GOPATH=$HOME/go
$ PATH=/usr/local/go/bin:$HOME/go/bin:$PATH

# Configure kubelet to point to containerd socket
$ sudo sed -i 's;--container-runtime=docker;--container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock;' /etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf
$ sudo systemctl daemon-reload
$ sudo systemctl restart kubelet

# Install gVisor runsc
$ set -e
$ wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc
$ sudo mv runsc /usr/local/bin
$ sudo chown root:root /usr/local/bin/runsc
$ sudo chmod 0755 /usr/local/bin/runsc

# Install gvisor-containerd-shim
$ git clone https://github.com/google/gvisor-containerd-shim.git
$ cd gvisor-containerd-shim
$ make
$ sudo make install

# Install gVisor runtime (will need to create runtime in gvisor and assign pods to runsc)
$ cat <<EOF | sudo tee /etc/containerd/config.toml
disabled_plugins = ["restart"]
[plugins.linux]
  shim_debug = true
[plugins.cri.containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
EOF

# Restart containerd
$ sudo systemctl restart containerd

4. Back on your local machine, you can verify that the container CRI of your gVisor node is now containerd.

kubectl get nodes -o jsonpath='{.items[0].status.nodeInfo.containerRuntimeVersion}'

5. Now lets create a custom runtime class on our cluster called “gvisor” that uses runsc as its handler and schedules pods on nodes labeled runtime=gvisor.

$ cat <<EOF | tee gvisor-runtime.yaml
apiVersion: node.k8s.io/v1beta1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
scheduling:
  nodeSelector:
    runtime: gvisor
EOF

$ kubectl apply -f gvisor-runtime.yaml

6. Perfect! We are ready to deploy our first application with the runsc runtime. Let us create an nginx pod, and set gVisor as our runtime class.

$ cat <<EOF | tee nginx-gvisor.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-gvisor
spec:
  runtimeClassName: gvisor
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
EOF

$ kubectl apply -f nginx-gvisor.yaml

7. DEBUG: we can verify that our nginx pod is running, and send it a request to verify its functionality.

# Verify pod running on gVisor node
$ kubectl get pod nginx-gvisor -o wide

# Verify pod running correctly
$ kubectl run --rm --restart=Never --stdin --tty --image=curlimages/curl curl -- curl $(kubectl get pod nginx-gvisor -o jsonpath='{.status.podIP}')

We can also verify that the Pod is running using the runsc runtime by verifying that the pod’s container ID is in the list of containers running with runsc on the gVisor node.

# Get nginx-gvisor container ID
$ export CID=$(kubectl get pod nginx-gvisor -o jsonpath='{.status.containerStatuses[0].containerID}' | cut -d '/' -f 3)

# List runsc running containers and find nginx-gvisor container ID on gVisor node
$ ssh -i $eks_key ec2-user@$gVisor_node_EIP 'sudo env "PATH=/usr/local/bin" runsc --root /run/containerd/runsc/k8s.io list' | grep $CID

8. EXTRA: To configure runsc runtime options, we can create a runsc configuration file and point to it inside the containerd runtime configuration. For example, we can configure runsc to dump some logs for debugging.

# SSH into gVisor node
ssh -i $eks_key ec2-user@$gVisor_node_EIP

cat <<EOF | sudo tee /etc/containerd/config.toml
disabled_plugins = ["restart"]
[plugins.linux]
  shim_debug = true
[plugins.cri.containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
[plugins.cri.containerd.runtimes.runsc.options]
  TypeUrl = "io.containerd.runsc.v1.options"
  ConfigPath = "/etc/containerd/runsc.toml"
EOF
#Runsc options config
cat <<EOF | sudo tee /etc/containerd/runsc.toml
[runsc_config]
  debug="true"
  strace="true"
  debug-log="/tmp/runsc/%ID%/"
EOF

# Restart containerd
sudo systemctl restart containerd

And there you have it! Now you can deploy your applications and services on EKS running gVisor.


Next: Part 3 - Serverless Functions with OpenFaaS#

In this blog we learned about gVisor and how to configure an EKS cluster to run kubernetes pods using gVisor’s runsc runtime. In part 3 of this series, we’ll look at how we can simplify application development using OpenFaaS serverless functions, and how to run them securely with gVisor!


References#

  • gVisor info: https://gvisor.dev/docs/

  • Kata containers info: https://katacontainers.io/

  • Firecracker info: https://firecracker-microvm.github.io/

  • Nabla containers info: https://nabla-containers.github.io/

  • gVisor can use ptrace (default) or KVM as platform: https://gvisor.dev/docs/user_guide/platforms/

  • Kata require KVM: https://ubuntu.com/kubernetes/docs/kata

  • Firecracker requires KVM: https://firecracker-microvm.github.io/

  • Nabla containers only runs nabla specific images (https://nabla-containers.github.io/), and has many other limitations (https://github.com/nabla-containers/runnc#limitations)

  • More information about container isolation technologies: https://unit42.paloaltonetworks.com/making-containers-more-isolated-an-overview-of-sandboxed-container-technologies/

  • Kubernetes Components: https://kubernetes.io/docs/concepts/overview/components/

  • Components Deep Dive: https://www.digitalocean.com/community/tutorials/an-introduction-to-kubernetes#:~:text=Kubernetes%2C%20at%20its%20basic%20level,%2C%20scalability%2C%20and%20high%20availability.

author profile
Mohamad El Hajj
Mohamad joined VGS in the summer of 2020 as an engineering intern to explore different avenues for potential secure serverless compute platforms. His work spanned different technologies including gVisor, firecracker, OpenFaaS, AWS Lambda and AWS Fargate. This blog series will demonstrate his findings on gVisor and OpenFaaS, part of a larger collaboration here at VGS in the area of confidential computing.