Kubernetes Operators II: Using Operator-Framework SDK on top of Kubebuilder
16 Sep 2021
The second in a two-part series on building a Kubernetes Operator implements a use case with real value for our Kubernetes cluster security.
In the first instalment of this two-post series , we described how to install and configure our environment to implement a simple Kubernetes operator skeleton using the Operator-Framework SDK. We continue our work today by implementing a real use-case in which we use Clair as a backend for indexing and scanning our deployed Pod container images. Newly discovered Common Vulnerabilities and Exposures (CVEs) will trigger a notifier app via webhooks and inform us on Slack about it.
As described above, our environment needs to be installed and configured. This includes kinD as our local Kubernetes cluster, a local container registry and the Operator-Framework itself. We expect to have an up-and-running kinD cluster and container registry.
For our use case, we need to prepare a Slack-based endpoint to send notifications to, a Slack-Channel to send our notifications to and as a registered Slack-App to receive webhook messages. We won't go into how you create a Slack-Channel and a registered webhook Slack-App here: Just keep the Slack webhook URL in mind for later use.
For the sake of simplicity, we will run Clair and its dependencies in our cluster in combo mode. Make sure the KUBECONFIG environment variable points to the right Kubernetes config file.
Clair needs a config file for all its components: the notifier, matcher and indexer:
Manifests, vulnerabilities and notifications are stored in a database by Clair:
We will run Clair in combo mode for now to have all components in one place. Feel free to deploy all services in separate pods.
Unlike our part 1 Operator, our Clair-Operator will not define a scheduling resource. We will instead create a controller that watches for Pod change events — such as CREATE or UPDATE — and trigger our reconciler function related to that Pod. Handling DELETE events is more difficult because the Pod reference is lost. Our Operator needs to register container images to Clair using container manifests and let Clair create vulnerability reports for it.
First, we initialise our project and create an API in group core and version v1alpha1 as well as our sample Scanner CR, as we did in part 1.
Set up controller
As described above, we need a Pod watcher to trigger our reconciler function. As we know from part 1, the SetupWithManager() function in controllers/scanner_controller.go builds and sets up our controller type and returns it to the caller in the main() function before setting up a signal handler and starting the controller manager.
Custom Resource definition
Let’s have a look at our Custom Resource, our ScannerSpec Type defined in api/v1alpha1/scanner_types.go:
The idea behind the Backend field is to define different scanner backends. In our sample, we will only have a Clair typed Scanner resource. As Notifier, we only provide a Slack notifier, but in order to show the flexibility of our Scanner type, it is showing here. ClairBaseUrl as well as SlackWebhookUrl are self-explanatory and point to the corresponding endpoints.
We are now ready to implement our operator logic, which will be done in the Reconcile() function of our ScannerReconciler type.
First, we need to define variables for a list of Scanners and exactly one Pod — the Pod which creates an Event our Manager was watching for.
Because our scanner is responsible for a specific Pod and not for Scanner objects, we have to request the scanner objects for the specified namespace and store them in our scanner's variable.
If no Scanner CRs are defined for the specified namespace, we can stop here and return to the Manager.
In case a Scanner was found, we need to fetch the Pod which is referenced by its name and namespace and save it into our pod variable.
As described before, handling DELETE events is more difficult and we need to add and remove a custom finalizer to the current Pod.
The finalizePod() function implements a cleanup logic for Clair index database and informs us on Slack. Currently, Clair API does not provide a delete endpoint.
We don't need the Pod itself — only its referenced container images. So let’s iterate through all the containers. Iteration through InitContainers is not implemented here but can be done in the same way.
First, let’s extract the manifest definition to pass it to Clair’s indexing endpoint:
The docker.Inspect() function of module docker returns a struct of type claircore.Manifest needs to be implemented. You can find out how to do this in github.com/quay/clair . Inspect() just extracts the image name and repository to connect to the registry storing that image and gets the Digests and Layers to build the claircore.Manifest type.
In order to avoid reindexing already processed Pods we can define Annotations for better filtering. We use Patch() here instead of Update() to prevent the Pod from being fetched again and continue working with the initialised Pod instance.
Now we can request Clair index API endpoint by using ClairBaseUrl from the current Scanner CR.
With an already indexed container image, we can request the vulnerability report endpoint.
We have three types of Slack notifications to handle. Each time a new Container image is indexed (pod created or updated), we will notify Slack using Webhooks.
We will also send notifications about vulnerability reports for the newly indexed images.
The third option to notify is not a direct part of the Operator. Clair itself can send notification webhooks for newly discovered vulnerabilities for already indexed container images. The payload of that webhook is fixed and cannot be customised via config parameters. So for simplicity let’s add a simple small web server in go and run it in our Kubernetes Cluster to listen for these webhooks and transform them into Slack compliant webhooks.
We need to build a container Image using docker CLI
Let’s deploy our notifier app into our Kubernetes cluster. We use a Secret for our Slack Webhook URL:
What we've achieved
We have a running Clair scanner and notifier for Slack. Our scanner CR is Namespace scoped and will be triggered every time a Pod is created, updated or deleted within that Namespace. Furthermore, we defined Annotations for already existing Pods to prevent reprocessing without container image changes.
A container manifest is created using a docker registry and requests Clair to index it and create a first vulnerability report. To inform users about newly created reports we created a webhook for Slack. Newly discovered vulnerabilities for running and indexed container images create events we receive with our notifier to transform them into Slack webhooks. We did not handle multiple Scanners for the same Namespace nor cover a fine granulated and structured CRD for different Scanner types and backends.
Insight, imagination and expertly engineered solutions to accelerate and sustain progress.