Kubernetes Controller to Purge Completed Jobs

Rishi Raj Singh
3 min readJan 29, 2021

Kubernetes’ controllers concept lets you extend the cluster’s behavior without modifying the code of Kubernetes itself. Operators are clients of the Kubernetes API that act as controllers for a Custom Resource. An Operator is a special kind of Kubernetes controller process that comes with its own custom-resource definition.

Some of the things that you can use an operator to automate include:

  • deploying an application on demand
  • taking and restoring backups of that application’s state
  • handling upgrades of the application code alongside related changes such as database schemas or extra configuration settings

To create our own operator we’ll use the Operator SDK (https://github.com/operator-framework/operator-sdk). It is a code generator to scaffold a fully functional operator.

Let’s install the SDK (https://sdk.operatorframework.io/) and bootstrap the project.

brew install operator-sdkoperator-sdk init job-purger --domain my.github.com --repo github.com/xxx/job-watcher-operatoroperator-sdk create api --group batch --kind JobWatcher --version v1 --resource true --controller true

Let’s review the api/v1/jobwatch_types.go, the below have been defined in it

  • A separate TTL for completed or failed jobs, after which the job will be deleted.
  • Some namespace and job-name patterns (regex) to identify jobs that may be a candidate for deletion.
  • The delay between two checks for deletion.
type JobWatcherSpec struct {// INSERT ADDITIONAL SPEC FIELDS - desired   state of cluster// Important: Run "make" to   regenerate code after modifying this file// +kubebuilder:validation:Minimum=0// Time to live in seconds for a completed   jobCompletedTTL int64 `json:"completedTTL"`// +kubebuilder:validation:Minimum=0// Time to live in seconds for a failed jobFailedTTL int64 `json:"failedTTL"`// +optional// +kubebuilder:validation:MinItems=0// List of namespaces to watchNamespacePatterns []string `json:"namespaces,omitempty"`// +optional// +kubebuilder:validation:MinItems=0// Job name pattern to watchJobNamePatterns []string `json:"jobNames,omitempty"`// +kubebuilder:validation:Minimum=10// Frequency of the TTL checksFrequency int64 `json:"frequency"`}//   JobWatcherStatus defines the observed state of JobWatchertype JobWatcherStatus struct {// INSERT ADDITIONAL STATUS FIELD - define   observed state of cluster// Important: Run "make" to   regenerate code after modifying this fileLastStarted  metav1.Time `json:"lastStarted,omitempty"`LastFinished metav1.Time `json:"lastFinished,omitempty"`}

The main logic is in the reconciliation loop. It’ll receive a Request as an argument, which only contains the namespace and the name of a resource.

The implemented logic is as below:

  • Fetch the JobWatcher object matching the Request.
  • List and retain namespaces matching one of our namespace patterns.
  • Identify jobs with matching names, and if the job is terminated and the TTL is expired, delete it.
  • Finally, update the status information for CRD
type JobWatcherReconciler struct {client.ClientLog      logr.LoggerScheme *runtime.Scheme}//   +kubebuilder:rbac:groups=batch.esys.github.com,resources=jobwatchers,verbs=get;list;watch;create;update;patch;delete//   +kubebuilder:rbac:groups=batch.esys.github.com,resources=jobwatchers/status,verbs=get;update;patch//   +kubebuilder:rbac:groups=batch.esys.github.com,resources=cronjobs,verbs=get;list;watch;create;update;patch;delete//   +kubebuilder:rbac:groups=batch.esys.github.com,resources=cronjobs/status,verbs=get;update;patch//   +kubebuilder:rbac:groups=batch,resources=jobs,verbs=get;list;watch;create;update;patch;delete//   +kubebuilder:rbac:groups=batch,resources=jobs/status,verbs=getfunc (r *JobWatcherReconciler) Reconcile(req ctrl.Request)   (ctrl.Result, error) {ctx := context.Background()log := r.Log.WithValues("JobWatcher", req.NamespacedName)var watcher batchv1.JobWatcherif err := r.Get(ctx, req.NamespacedName, &watcher); err != nil {log.Error(err, "unable to fetch   JobWatcher", "Request", req)return ctrl.Result{}, client.IgnoreNotFound(err)}watcher.Status.LastStarted = metav1.Time{Time: time.Now()}var namespaces corev1.NamespaceListif err := r.List(ctx, &namespaces); err != nil {log.Error(err, "unable to list   Namespaces")return ctrl.Result{}, err}for _, ns := range namespaces.Items {if err := r.processNamespace(ctx, watcher, ns); err   != nil {r.Log.Error(err, "unable to compile   spec job name as regex", "Patterns", watcher.Spec.NamespacePatterns)}}watcher.Status.LastFinished = metav1.Time{Time: time.Now()}if err := r.Status().Update(ctx, &watcher); err != nil {log.Error(err, "unable to update   JobWatcher status")return ctrl.Result{}, err}return ctrl.Result{RequeueAfter: time.Duration(watcher.Spec.Frequency) * time.Second}, nil}

Reconcile is called either when one of your CRDs change or if the returned ctrl.Result isn’t empty. The cluster role and binding are automatically created if you specify the generator comments: //+kubebuilder:rbac:groups:resources:verbs

In the below method the Operator is managing the JobWatcher CRD through the For method.

func (r *JobWatcherReconciler) SetupWithManager(mgr ctrl.Manager) error   {return ctrl.NewControllerManagedBy(mgr).For(&batchv1.JobWatcher{}).Complete(r)}

Build and Deploy

make install

Manifests are now found under the config folder with all the Kustomize patches, to deploy them to cluster.

The alternative for a real-life deployment is to package the controller as a Docker image (run make docker-build docker-push) and generate a Deployment object with make deploy

--

--