Health checks
Pre-requisites
You will need an installation of SKE for this section. Go to Configuring SKE and follow the appropriate guide if you haven't done so already.
You will also need a configured and deployed Health Agent. You can do that by following the guide on Configuring the Kubernetes Health Agent.
In this tutorial you will
- Generate a Health Definition in the Resource Workflow
- Write a Health Check Workflow
- Verify the results
How do Health Checks work in SKE?
Health Checks are implemented in SKE via HealthDefinition
objects. A Health Definition is the outline of the task that will be performed on a Destination to verify the health of a Resource Request.
The diagram below shows how Health Checks are processed in SKE:
Let's go through the diagram in detail:
- When a new resource is requested from a Promise, Kratix will do the usual and execute the Resource Configure Workflow for that Promise.
- As part of the outputs of that workflow, you can include a
HealthDefinition
object, including all the necessary information to execute the Health Check Workflow on the Destination. - Once that HealthDefinition is picked up and applied at the target Destination, the Health Check agent will execute the Health Check Workflow and produce a
HealthRecord
document. - The
HealthRecord
is then written to a State Store, - The
HealthRecord
is then applied back to the Platform cluster via a GitOps agent listening to the State Store. - The Health Check monitor within SKE will then process the
HealthRecord
and update the resourcestatus
with the results.
You will need to set up the Health Agent (and its CRDs) on all Destinations that can run Health Checks.
With that in mind, let's get started!
Generate a Health Definition
In this tutorial, you will define a Resource Health Check for the Redis Promise available in the Kratix Marketplace.
The HealthDefinition object has the following specification:
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: # unique name for the health definition
namespace: # the healthdefinition namespace
spec:
resourceRef:
name: # the resource name
namespace: # the resource namespace
promiseRef:
name: # the name of the promise
schedule: #cronjob schedule
input: # the inputs the workflow needs
workflow: # a Kratix Pipeline
In order to define a Resource Health Check, we will need to add a new container to the Resource Configure Workflow to generate the appropriate HealthDefinition.
A very basic Resource Health Definition for the Redis Promise could look like this:
resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"
cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace} # this name needs to be unique
namespace: default # this can be any namespace that exists in the Destination
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
schedule: "* * * * *" # runs every minute: https://crontab.guru/#*_*_*_*_*
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
containers:
- image: busybox # this is a placeholder image, we will update this later with our own health check
name: check-redis
EOF
As you can see, the Health Definition is being persistent in /kratix/output
as a file called health-definition.yaml
. Right now, it's not doing anything useful, but it should be enough to test the end-to-end wiring of the Health Check systems. For simplicity, we made the script above available in the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
image as basic-definition
.
You can build your own container image if you want. The process is similar to the one described in the Writing a Promise guide.
To include it in the Redis Promise, first download the Promise file to your local machine:
curl -Lo redis-promise.yaml https://raw.githubusercontent.com/syntasso/kratix-marketplace/main/redis/promise.yaml
Next, locate the Resource Configure Workflow and add a new container to it:
apiVersion: platform.kratix.io/v1alpha1
kind: Promise
metadata:
name: redis
labels:
kratix.io/promise-version: v0.1.0
spec:
api: # omitted for brevity
workflows:
resource:
configure:
- apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: instance-configure
spec:
containers:
- image: ghcr.io/syntasso/kratix-marketplace/redis-configure-pipeline:v0.1.0
name: redis-configure-pipeline
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: generate-healthdefinition
command: [ "basic-definition" ]
promise: # omitted for brevity
Make sure you add the container to the correct location in the Promise. The container needs to be added to the resource.configure
section, so that it is executed when a Resource Request is applied.
You can now apply the updated Promise:
kubectl --context $PLATFORM apply -f redis-promise.yaml
Once the Promise is applied, you can send a Resource Request:
kubectl --context $PLATFORM apply -f https://raw.githubusercontent.com/syntasso/kratix-marketplace/main/redis/resource-request.yaml
At this stage, Kratix should execute the Resource Configure Workflow as expected in the Platform cluster:
$ kubectl --context $PLATFORM get pods
NAME READY STATUS RESTARTS AGE
kratix-redis-example-instance-configure-523d9-blr7b 0/1 Completed 0 10s
If you inspect the State Store, you should see a new Health Definition file there. On the Destination, you should see the new HealthDefinition object getting applied:
$ kubectl --context $WORKER get healthdefinitions
NAME AGE
redis-example-default 10s
The SKE Health Agent running in the Destination will react to the new HealthDefinition and schedule a CronJob to run the Health Check Workflow. You can check the CronJob (it may take a couple of minutes):
$ kubectl --context $WORKER get cronjobs
NAMESPACE NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
default healthcheck-redis-default-example-78709 * * * * * <none> False 1 1s 1m
The CronJob will eventually execute the Workflow defined in the HealthDefinition (you may need to wait a minute or two for it to appear). Eventually, you should see the pod running the Health Check Workflow:
$ kubectl --context $WORKER get pods
NAME READY STATUS RESTARTS AGE
healthcheck-redis-default-example-78709-28947857-8t6b5 0/1 Completed 0 39s
Since the health check is schedule to run every minute, if you keep watching the pods, you should see new ones starting at the turn of each minute.
Since the workflow completed successfully, the SKE Health Agent running in your Destination should have written a HealthRecord
to the State Store. Since you have a GitOps agent listening to that State Store, the Health Record will be applied back to the Platform cluster.
At this stage, you should see the Health check results in your Redis resource:
kubectl --context $PLATFORM get redis.marketplace.kratix.io example -oyaml
Towards the end, under status
, you should see the following:
status:
healthRecord:
state: healthy
Great! That proves the Health Check is working end-to-end.
However, the Health Check workflow itself is not quite checking anything. Let's fix that.
Writing a Health Check Workflow
When you define a Health Check, you are basically telling SKE to execute a task on the Destination to validate the health of the resource. Similar to how you define a Resource Workflow to instantiate a Resource, you will define a Kratix Pipeline
to determine the resource health.
But what does it mean for a resource of a Promise to be healthy? This is for you to define: different promises will have different definitions. Let's take a closer look at the Redis Promise.
The Redis Resource Configure Workflow outputs a RedisFailover
resource. This resource is picked up by the Redis Operator, and a series of other resources are created, including a StatefulSet that will be used to provide the Redis service. To validate the health of the resource, we could check if the StatefulSet has the same number of replicas as ready replicas.
The Redis promise generates deterministic names for the Redis resources, based on the Object metadata.name
. The StatefulSet, in particular, is named as rfr-$resourceName
, where resourceName
is the name of the Redis resource. Similar to the Resource Configure Workflow, your container will have access to a file /kratix/input/object.yaml
with the contents of the spec.input
you defined in the Health Definition. If you go back, you will see we are already defining a name: ${resourceName}
in the input
field. All we need to do is read that value and use it to get the StatefulSet.
With that in hand, we can write the following Health Check Workflow:
resourceName=$(yq '.name' /kratix/input/object.yaml)
# Get the Redis statefulset generated from the Resource Request
replicas=$(kubectl -n default get statefulsets rfr-${resourceName} -o jsonpath='{.status.replicas}')
readyReplicas=$(kubectl -n default get statefulsets rfr-${resourceName} -o jsonpath='{.status.readyReplicas}')
state="unhealthy"
if [[ ${replicas} -eq ${readyReplicas} ]]; then
state="healthy"
fi
cat <<EOF > /kratix/output/health-status.yaml
state: ${state}
details:
replicas: ${replicas}
readyReplicas: ${readyReplicas}
EOF
You can create a Container image and add this script to it. To keep the scope of this tutorial simple, we have packaged it into the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
image, under the check-redis
command. All we need to do is update the script that's generating the HealthDefinition to use the image and command:
resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"
cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace}
namespace: default
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
# runs every minute
schedule: "* * * * *"
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
containers:
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: check-redis
command: [ "check-redis" ]
EOF
There's one more change you'll need to make to the Workflow defined in the HealthDefinition: define the RBAC permissions. By default, the Health Check Workflow container will not have access to much of the Kubernetes API; if you try to execute the Health check as defined above, the workflow will fail when it tries to kubectl get
the StatefulSet.
To fix this, you need to add the following RBAC permissions to the Workflow in your HealthDefinition:
rbac:
permissions:
- apiGroups: ["apps"]
verbs: ["get"]
resources: ["statefulsets"]
The updated script will look like this:
resourceName=$(yq '.metadata.name' /kratix/input/object.yaml)
namespace="$(yq '.metadata.namespace // "default"' /kratix/input/object.yaml)"
promiseName="redis"
cat <<EOF > /kratix/output/health-definition.yaml
apiVersion: platform.kratix.io/v1alpha1
kind: HealthDefinition
metadata:
name: ${promiseName}-${resourceName}-${namespace}
namespace: default
spec:
resourceRef:
name: ${resourceName}
namespace: ${namespace}
promiseRef:
name: ${promiseName}
# runs every minute
schedule: "* * * * *"
input: |
name: ${resourceName}
workflow:
apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: healthcheck
spec:
rbac:
permissions:
- apiGroups: ["apps"]
verbs: ["get"]
resources: ["statefulsets"]
containers:
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: check-redis
command: [ "check-redis" ]
EOF
Once again, we packaged the script above in the ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
image, under the complete-definition
command.
Open your Redis Promise and update the container to use the new image and command:
apiVersion: platform.kratix.io/v1alpha1
kind: Promise
metadata:
name: redis
labels:
kratix.io/promise-version: v0.1.0
spec:
api: # omitted for brevity
workflows:
resource:
configure:
- apiVersion: platform.kratix.io/v1alpha1
kind: Pipeline
metadata:
name: instance-configure
spec:
containers:
- image: ghcr.io/syntasso/kratix-marketplace/redis-configure-pipeline:v0.1.0
name: redis-configure-pipeline
- image: ghcr.io/syntasso/kratix-docs/redis-health-checks:v0.1.0
name: generate-healthdefinition
command: [ "complete-definition" ]
promise: # omitted for brevity
You can now apply the updated Promise:
kubectl --context $PLATFORM apply -f redis-promise.yaml
This should automatically trigger the Resource Workflow and apply the new HealthDefinition on the Destination.
Verify the results
With the new Health Definition, the CronJob on the Destination will eventually be updated with the new Health Check Workflow. At the turn of the minute, you should observe new pod starting:
$ kubectl --context $WORKER get pods
NAME READY STATUS RESTARTS AGE
healthcheck-redis-default-example-78709-28951821-dgszb 0/1 Completed 0 2m25s
healthcheck-redis-default-example-78709-28951822-nsx9q 0/1 Completed 0 85s
healthcheck-redis-default-example-78709-28951823-2wcls 0/1 Completed 0 25s
Once it completes, you should now see the Health check results in your Redis resource:
kubectl --context $PLATFORM get redis.marketplace.kratix.io example -oyaml
It may take a couple of minutes for the Resource status to update
Towards the end, under status
, you should see the following:
status:
healthRecord:
state: healthy
details:
replicas: 1
readyReplicas: 1
That's it! You've successfully implemented a Resource Health Check!