Skip to main content

Health checks

Pre-requisites

You will need an installation of SKE for this section. Go to Configuring SKE and follow the appropriate guide if you haven't done so already.

You will also need a configured and deployed Health Agent. You can do that by following the guide on Configuring the Kubernetes Health Agent.

In this tutorial you will

  1. Generate a Health Definition in the Resource Workflow
  2. Write a Health Check Workflow
  3. Verify the results

How do Health Checks work in SKE?

Health Checks are implemented in SKE via HealthDefinition objects. A Health Definition is the outline of the task that will be performed on a Destination to verify the health of a Resource Request.

The diagram below shows how Health Checks are processed in SKE:

High-level diagram of how Health Check works in Kratix
How Health Checks are processed in SKE

Let's go through the diagram in detail:

  1. When a new resource is requested from a Promise, Kratix will do the usual and execute the Resource Configure Workflow for that Promise.
  2. As part of the outputs of that workflow, you can include a HealthDefinition object, including all the necessary information to execute the Health Check Workflow on the Destination.
  3. Once that HealthDefinition is picked up and applied at the target Destination, the Health Check agent will execute the Health Check Workflow and produce a HealthRecord document.
  4. The HealthRecord is then written to a State Store,
  5. The HealthRecord is then applied back to the Platform cluster via a GitOps agent listening to the State Store.
  6. The Health Check monitor within SKE will then process the HealthRecord and update the resource status with the results.
warning

You will need to set up the Health Agent (and its CRDs) on all Destinations that can run Health Checks.

With that in mind, let's get started!

Generate a Health Definition

In this tutorial, you will define a Resource Health Check for the Redis Promise available in the Kratix Marketplace.

The HealthDefinition object has the following specification:

apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: # unique name for the health definition
namespace: # the healthdefinition namespace
spec:
resourceRef:
name: # the resource name
namespace: # the resource namespace

promiseRef:
name: # the name of the promise

schedule: #cronjob schedule

input: # the inputs the workflow needs

workflow: # a Kratix Pipeline

In order to define a Resource Health Check, we will need to add a new container to the Resource Configure Workflow to generate the appropriate HealthDefinition.

A very basic Resource Health Definition for the Redis Promise could look like this:

resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"

cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace} # this name needs to be unique
namespace: default # this can be any namespace that exists in the Destination
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
schedule: "* * * * *" # runs every minute: https://crontab.guru/#*_*_*_*_*
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
containers:
- image: busybox # this is a placeholder image, we will update this later with our own health check
name: check-redis
EOF

As you can see, the Health Definition is being persistent in /kratix/output as a file called health-definition.yaml. Right now, it's not doing anything useful, but it should be enough to test the end-to-end wiring of the Health Check systems. For simplicity, we made the script above available in the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0 image as basic-definition.

tip

You can build your own container image if you want. The process is similar to the one described in the Writing a Promise guide.

To include it in the Redis Promise, first download the Promise file to your local machine:

curl -Lo redis-promise.yaml https://raw.githubusercontent.com/syntasso/kratix-marketplace/main/redis/promise.yaml

Next, locate the Resource Configure Workflow and add a new container to it:

apiVersion: platform.kratix.io/v1alpha1
kind: Promise
metadata:
name: redis
labels:
kratix.io/promise-version: v0.1.0
spec:
api: # omitted for brevity
workflows:
resource:
configure:
- apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: instance-configure
spec:
containers:
- image: ghcr.io/syntasso/kratix-marketplace/redis-configure-pipeline:v0.1.0
name: redis-configure-pipeline
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: generate-healthdefinition
command: [ "basic-definition" ]
promise: # omitted for brevity
warning

Make sure you add the container to the correct location in the Promise. The container needs to be added to the resource.configure section, so that it is executed when a Resource Request is applied.

You can now apply the updated Promise:

kubectl --context $PLATFORM apply -f redis-promise.yaml

Once the Promise is applied, you can send a Resource Request:

kubectl --context $PLATFORM apply -f https://raw.githubusercontent.com/syntasso/kratix-marketplace/main/redis/resource-request.yaml

At this stage, Kratix should execute the Resource Configure Workflow as expected in the Platform cluster:

$ kubectl --context $PLATFORM get pods
NAME READY STATUS RESTARTS AGE
kratix-redis-example-instance-configure-523d9-blr7b 0/1 Completed 0 10s

If you inspect the State Store, you should see a new Health Definition file there. On the Destination, you should see the new HealthDefinition object getting applied:

$ kubectl --context $WORKER get healthdefinitions
NAME AGE
redis-example-default 10s

The SKE Health Agent running in the Destination will react to the new HealthDefinition and schedule a CronJob to run the Health Check Workflow. You can check the CronJob (it may take a couple of minutes):

$ kubectl --context $WORKER get cronjobs
NAMESPACE NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
default healthcheck-redis-default-example-78709 * * * * * <none> False 1 1s 1m

The CronJob will eventually execute the Workflow defined in the HealthDefinition (you may need to wait a minute or two for it to appear). Eventually, you should see the pod running the Health Check Workflow:

$ kubectl --context $WORKER get pods
NAME READY STATUS RESTARTS AGE
healthcheck-redis-default-example-78709-28947857-8t6b5 0/1 Completed 0 39s
tip

Since the health check is schedule to run every minute, if you keep watching the pods, you should see new ones starting at the turn of each minute.

Since the workflow completed successfully, the SKE Health Agent running in your Destination should have written a HealthRecord to the State Store. Since you have a GitOps agent listening to that State Store, the Health Record will be applied back to the Platform cluster.

At this stage, you should see the Health check results in your Redis resource:

kubectl --context $PLATFORM get redis.marketplace.kratix.io example -oyaml

Towards the end, under status, you should see the following:

status:
healthRecord:
state: healthy

Great! That proves the Health Check is working end-to-end.

However, the Health Check workflow itself is not quite checking anything. Let's fix that.

Writing a Health Check Workflow

When you define a Health Check, you are basically telling SKE to execute a task on the Destination to validate the health of the resource. Similar to how you define a Resource Workflow to instantiate a Resource, you will define a Kratix Pipeline to determine the resource health.

But what does it mean for a resource of a Promise to be healthy? This is for you to define: different promises will have different definitions. Let's take a closer look at the Redis Promise.

The Redis Resource Configure Workflow outputs a RedisFailover resource. This resource is picked up by the Redis Operator, and a series of other resources are created, including a StatefulSet that will be used to provide the Redis service. To validate the health of the resource, we could check if the StatefulSet has the same number of replicas as ready replicas.

The Redis promise generates deterministic names for the Redis resources, based on the Object metadata.name. The StatefulSet, in particular, is named as rfr-$resourceName, where resourceName is the name of the Redis resource. Similar to the Resource Configure Workflow, your container will have access to a file /kratix/input/object.yaml with the contents of the spec.input you defined in the Health Definition. If you go back, you will see we are already defining a name: ${resourceName} in the input field. All we need to do is read that value and use it to get the StatefulSet.

With that in hand, we can write the following Health Check Workflow:

script to be executed in the destination
resourceName=$(yq '.name' /kratix/input/object.yaml)

# Get the Redis statefulset generated from the Resource Request
replicas=$(kubectl -n default get statefulsets rfr-${resourceName} -o jsonpath='{.status.replicas}')
readyReplicas=$(kubectl -n default get statefulsets rfr-${resourceName} -o jsonpath='{.status.readyReplicas}')

state="unhealthy"

if [[ ${replicas} -eq ${readyReplicas} ]]; then
state="healthy"
fi

cat <<EOF > /kratix/output/health-status.yaml
state: ${state}
details:
replicas: ${replicas}
readyReplicas: ${readyReplicas}
EOF

You can create a Container image and add this script to it. To keep the scope of this tutorial simple, we have packaged it into the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0 image, under the check-redis command. All we need to do is update the script that's generating the HealthDefinition to use the image and command:

script to be executed in the platform
resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"

cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace}
namespace: default
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
# runs every minute
schedule: "* * * * *"
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
containers:
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: check-redis
command: [ "check-redis" ]
EOF

There's one more change you'll need to make to the Workflow defined in the HealthDefinition: define the RBAC permissions. By default, the Health Check Workflow container will not have access to much of the Kubernetes API; if you try to execute the Health check as defined above, the workflow will fail when it tries to kubectl get the StatefulSet.

To fix this, you need to add the following RBAC permissions to the Workflow in your HealthDefinition:

rbac:
permissions:
- apiGroups: ["apps"]
verbs: ["get"]
resources: ["statefulsets"]

The updated script will look like this:

resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"

cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace}
namespace: default
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
# runs every minute
schedule: "* * * * *"
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
rbac:
permissions:
- apiGroups: ["apps"]
verbs: ["get"]
resources: ["statefulsets"]
containers:
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: check-redis
command: [ "check-redis" ]
EOF

Once again, we packaged the script above in the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0 image, under the complete-definition command.

Open your Redis Promise and update the container to use the new image and command:

apiVersion: platform.kratix.io/v1alpha1
kind: Promise
metadata:
name: redis
labels:
kratix.io/promise-version: v0.1.0
spec:
api: # omitted for brevity
workflows:
resource:
configure:
- apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: instance-configure
spec:
containers:
- image: ghcr.io/syntasso/kratix-marketplace/redis-configure-pipeline:v0.1.0
name: redis-configure-pipeline
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: generate-healthdefinition
command: [ "complete-definition" ]
promise: # omitted for brevity

You can now apply the updated Promise:

kubectl --context $PLATFORM apply -f redis-promise.yaml

This should automatically trigger the Resource Workflow and apply the new HealthDefinition on the Destination.

Verify the results

With the new Health Definition, the CronJob on the Destination will eventually be updated with the new Health Check Workflow. At the turn of the minute, you should observe new pod starting:

$ kubectl --context $WORKER get pods
NAME READY STATUS RESTARTS AGE
healthcheck-redis-default-example-78709-28951821-dgszb 0/1 Completed 0 2m25s
healthcheck-redis-default-example-78709-28951822-nsx9q 0/1 Completed 0 85s
healthcheck-redis-default-example-78709-28951823-2wcls 0/1 Completed 0 25s

Once it completes, you should now see the Health check results in your Redis resource:

kubectl --context $PLATFORM get redis.marketplace.kratix.io example -oyaml
tip

It may take a couple of minutes for the Resource status to update

Towards the end, under status, you should see the following:

status:
healthRecord:
state: healthy
details:
replicas: 1
readyReplicas: 1

That's it! You've successfully implemented a Resource Health Check!