An easier approach to building Kubernetes Operators

26 Aug 2021

Using Operator-Framework SDK on top of Kubebuilder

Kubernetes Operators are quite complex to implement from scratch. They are commonly used to automate managing processes inside and outside of Kubernetes by regularly recurring calls while queued to a control loop.

Today, we will outline an easy way to build an Operator using the Operator-Framework and SDK based on Kubebuilder . We describe how to install and set up a template operator project, which can be built and is deployable into a local Kubernetes cluster. In a later article, we will use that template to implement a real use case that can deploy and run in a production environment.

You can find all the source code accompanying this article at Github .

Requirements

Before we can build our first Operator in Go we need to prepare our local environment by installing KinD as a Kubernetes cluster and run a Docker registry locally where we can push and pull our Controller images.

Our KinD configuration defines some essential parameters for our cluster but also containerd plugin configuration to let Kubelet know where to pull Docker images from.

Setup KinD Cluster

Plain Text

$ cat &lt;&gt; config.yml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: dev
networking:
  apiServerAddress: "127.0.0.1"
  apiServerPort: 6443
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "10.96.0.0/12"
  kubeProxyMode: "ipvs"
nodes:
- role: control-plane
- role: worker
containerdConfigPatches:
- |-
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."localhost:5000"]
    endpoint = ["http://kind-registry:5000"]
EOF
$ kind create cluster --kubeconfig kind.kubeconf --config config.yml
$ export KUBECONFIG=$(pwd)/kind.kubeconf

Run Docker registry

Our Docker registry is run as a container and connected to the KinD network.

Plain Text

$ docker run -d --restart=always -p "127.0.0.1:5000:5000" --name "kind-registry" registry:2
$ docker network connect "kind" "kind-registry" || true

Finally, we need a Configmap to let Pods such as our Controller know where to find our Docker registry:

Plain Text

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: local-registry-hosting
  namespace: kube-public
data:
  localRegistryHosting.v1: |
    host: "localhost:5000"
    help: "https://kind.sigs.k8s.io/docs/user/local-registry/"
EOF

Kubernetes Operators

Operators in Kubernetes are simply Pods acting as Controllers for Custom Resources (CRs) located at the data plane. They follow the Kubernetes principle of control loop. Operators are clients of the Kubernetes API-Server .

Operator types

Kubernetes Operators can be implemented programmatically in any language providing libraries to communicate with the Kubernetes API, declarative with yaml to automate simple tasks where the kubectl command is not enough or just by implementing bash scripts.

There are two Operator-Frameworks for developing Operators in Python or Go. We will focus here on the Go-based Operator-Framework because Go is the native language Kubernetes was originally implemented in and it is the most widely used. Furthermore, using Go brings more developer benefits, such as community support and existing Frameworks and SDKs.

In addition to Go, this SDK also provides declarative ways to define Operators and implement an Operator Lifecycle Manager known as OLM, which provides a declarative way to install, manage and upgrade Operators and their dependencies in a cluster.

The Operator Framework SDK defines three types of Operators, but we will confine ourselves to the Go-based Operator to implement a Kubernetes native automation:

Ansible-based Operators
Go-based Operators
Helm-based Operators

Operator setup

Project initialisation

Let’s create and initialise our project and let the framework bring all components into place. You need to replace domain and repo parameters with your own if you want to publish your Operator.

Plain Text

$ mkdir operator
$ cd operator
$ operator-sdk init --repo github.com/Nearform/operator --domain nearform.com

The --domain attribute is very important and is used as a postfix for all API groups . Unlike the domain attribute, the --repo attribute is used in reverse order for Go package naming.

For now operator-sdk created our project structure, a Dockerfile for our Controller, a Makefile with predefined commands to build and deploy and a main.go file, which acts as the entry point. We can also see a config/folder, which contains all the manifests yaml files an Operator needs to work as expected. This includes a Service Account definition, Roles and RoleBinding objects or other RBAC -related objects, a Deployment and a Service object for the Controller-Manager, as well as some resources for monitoring and metrics. All of these resources are managed via Kustomize .

Let’s now create the first API, Type and Controller of version v1alpha1 grouped in core with a Custom Resource Definition (CRD) named Template.

Plain Text

$ operator-sdk create api --group core --version v1alpha1 --kind Template --resource --controller

There are two more important folders under our project root:

The api folder includes our type definition v1alpha1/template_types.go, which corresponds to our CRD, a v1alpha1/groupversion_info.go, which defines our Schema and GroupVersion.

Each of the api files are generated inside the subfolder named by the version of our API. Our Controller can be found in controller/template_controller.go as well as the test suite for our Operator. We can also find a first generated version of our CRD based on the API template type we created under config/samples/core_v1alpha1_template.yaml

Operator Implementation

For a basic implementation, only three files are important for now. We define our Custom Resource (CR) Template object in api/v1alpha1/template_types.go. Let’s have a closer look at it.

JavaScript

package v1alpha1

import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

type TemplateSpec struct {
	Foo string `json:"foo,omitempty"`
}

type TemplateStatus struct {}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
type Template struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   TemplateSpec   `json:"spec,omitempty"`
	Status TemplateStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true
type TemplateList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []Template `json:"items"`
}

func init() {
	SchemeBuilder.Register(&amp;Template{}, &amp;TemplateList{})
}

The TemplateSpec struct defines our object attributes, which describe our CR itself. The Status of our object, which is later compared and changed by our Controller, is defined in the TemplateStatus struct.

The Template struct follows the common schema definition for Kubernetes objects and references the meta data and a json representation of our CR. The init() function acts like a constructor and registers our CR before it can be used.

The corresponding CRD then looks like the following example:

Plain Text

apiVersion: core.nearform.com/v1alpha1
kind: Template
metadata:
  name: template-sample
spec:
  foo: bar

Every time we add an attribute to TemplateSpec, we also need to add it in our CRD if it's declared as mandatory.

The Template Controller under controllers/template_controller.go implements the logic of our Operator.

JavaScript

package controllers

import (
	"context"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/log"
	corev1alpha1 "github.com/Nearform/operator/api/v1alpha1"
)

type TemplateReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

//+kubebuilder:rbac:groups=core.nearform.com,resources=templates,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core.nearform.com,resources=templates/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=core.nearform.com,resources=templates/finalizers,verbs=update
func (r *TemplateReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)
	return ctrl.Result{}, nil
}

func (r *TemplateReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&amp;corev1alpha1.Template{}).
		Complete(r)
}

We inject the API-Client object as well as the Schema definition into our Reconciler struct. The Controller has only two functions. The SetupWithManager function adds a new Controller-Manager to our Controller to control our Template CR. The Reconciler function itself is the actual logic we have to implement. For now, nothing is happening here, but the Operator is already complete and can be built and deployed.

Build and Deploy

We will use the Makefile intensively for all of our Build, Run and Deployment activities. To show all the defined targets, we can simply run:

Plain Text

$ make help

It’s possible to run the operator controller on our host and outside of Kubernetes. Make sure you have already set your $KUBECONFIG environment variable to the KinD generated config file.

Before running the controller, the CRD must be deployed to Kubernetes.

Plain Text

$make install

$(PROJ_ROOT)/bin/controller-gen "crd:trivialVersions=true,preserveUnknownFields=false" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
go: creating new go.mod: module tmp
Downloading sigs.k8s.io/kustomize/kustomize/v3@v3.8.7
$(PROJ_ROOT)/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/templates.core.nearform.com created

Now we can Run the Controller locally and see what's going on there.

Plain Text

$make run
$(PROJ_ROOT)/bin/controller-gen "crd:trivialVersions=true,preserveUnknownFields=false" rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
$(PROJ_ROOT)/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
api/v1alpha1/template_types.go
go vet ./...
go run ./main.go
[...]INFO controller-runtime.metrics	metrics server is starting to listen	{"addr": ":8080"}
[...]INFO setup starting manager
[...]INFO controller-runtime.manager	starting metrics server	{"path": "/metrics"}
[...]INFO controller-runtime.manager.controller.template Starting EventSource
[...]INFO controller-runtime.manager.controller.template	Starting Controller
[...]INFO controller-runtime.manager.controller.template Starting workers

But we don't have any CR yet defined and deployed to our cluster. Let’s do that while opening a second console. We will use our sample for now:

Plain Text

$kubectl apply -f config/samples/core_v1alpha1_template.yaml

We can find our Custom Resource now in our Cluster:

Plain Text

$kubectl get templates --all-namespaces
NAMESPACE   NAME              AGE
default     template-sample   2m22s

But nothing happened in our Controller. Why is that? Let’s go back to it and get a better understanding of the Reconciler function there. We can see that this function returns an empty Result struct of the sigs.k8s.io/controller-runtime package. What does it look like?

JavaScript

type Result struct {
	Requeue bool
	RequeueAfter time.Duration
}

We did not define Requeue, so it will be false by default. That means our Operator is only triggered once during the first initial run and never queued again. Let’s change that and set Requeue: true and add a logging output as well, then restart our controller.

JavaScript

func (r *TemplateReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	var log = log.FromContext(ctx)
	log.Info("Hello World")
	return ctrl.Result{Requeue: true}, nil
}

Perfect! Now we see the output repeated again and again as long as the Controller is requeued for the control loop. But we never touched our CR and compared its status with any mandatory value.

In our last step for this part we will add a new field max of type int32 to our CRD and set it to 5. We also need a field current of type int32 to hold the current value. Our Controller should then increase our current value in each loop run, as long as it is reaching our max value.

api/v1alpha1/template_types.go

JavaScript

type TemplateSpec struct {
	Max int32 `json:"max",omitempty`
}
type TemplateStatus struct {
	Current int32 `json:"current,omitempty"`
}

controllers/template_controller.go

JavaScript

func (r *TemplateReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	var log = log.FromContext(ctx)
	var template corev1alpha1.Template
	if err := r.Get(ctx, req.NamespacedName, &template); err != nil {
		log.Error(err, "unable to fetch Template")
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

	log.Info("Current: " + strconv.Itoa(int(template.Status.Current)))
	log.Info("Max: " + strconv.Itoa(int(template.Spec.Max)))

	if template.Status.Current < template.Spec.Max {
		log.Info("Increase Current by 1")
		template.Status.Current += 1
		if err := r.Status().Update(ctx, &template); err != nil {
			log.Error(err, "unable to update Template status")
			return ctrl.Result{}, err
		}
	}
	return ctrl.Result{}, nil
}

config/samples/core_v1alpha1_template.yaml

Plain Text

apiVersion: core.nearform.com/v1alpha1
kind: Template
metadata:
  name: template-sample
spec:
  max: 5

Now let’s deploy the new version of our CRD and CR to our cluster and run our controller again.

Plain Text

$make install
$kubectl apply -f config/samples/core_v1alpha1_template.yaml
$make run

What we can see here is that the Status.Update() automatically requeues our CR object. The Status.Current attribute is increased each loop up to Spec.Max . We use Status.Update() instead of updating the Template object itself to prevent accidentally overwriting status fields.

Conclusion

We now have a run and deployable complete Operator created. In a later post, we will implement a real use case and show the nature of an Operator and how it could affect our Kubernetes cluster.

Insight, imagination and expertly engineered solutions to accelerate and sustain progress.

Contact