Troubleshooting
Common issues
Documents are not appearing in my Destination
There are many reasons why a document might not appear in a Destination. A document goes from the Workflow to the Destination in a few steps:
- The Workflow schedules a document by placing it in
/kratix/output
- Kratix schedules this document to a Destination, taking into account any
scheduling selectors specified in
/kratix/metadata/destination-selectors.yaml
or the Promise's static.spec.destinationSelectors
. If no matching Destination exists it will not be scheduled until one is created. See the scheduling documentation for more info - Once scheduled it writes the document to the Destination, using the Destination's auth credentials.
- The system at the Destination must be healthy and able to accept the documents for them to appear (e.g. Kubernetes resources or terraform files)
Let's first check:
-
The workflow outputted the documents
-
Check the Workflow logs for any errors that might not be being handled correctly. You can find the name of the Pipeline pod by running
kubectl get pods --selector kratix.io/resource-name=<request-name>
Then inspect the logs for your container
kubectl logs <workflow-pod-name> -c <container-name>
-
Check the
Work
resource (an internal Kratix resource that is created to contain the outputs of a Workflow) to see if the document is listed inside it.
kubectl get work --selector kratix.io/resource-name=<request-name>Inspect the Work's
.spec
to see if your documents are listedkubectl get work <work-name> -o yaml
Please note that the
workload.content
is compressed to reduce the size of the Work, to inspect the raw content of the document you can decode and decompress withgzip
you can do the following:// Get the filepath of the first scheduled document
kubectl get work <work-name> -o=jsonpath='{.spec.workloadGroups[0].workloads[0].filepath}'
// Get, decode and decompress the content of the first scheduled document
kubectl get work <work-name> -o=jsonpath='{.spec.workloadGroups[0].workloads[0].content}' | base64 -d | gzip -d
-
-
Check the document is scheduled to the correct Destination
-
Check the
WorkPlacement
resource (an internal Kratix resource that is created from theWork
resource, representing the scheduling of aWork
) to see if the document is listed inside it.kubectl get workplacement --selector kratix.io/resource-name=<request-name>
As with the
Work
, theworkload.content
in a WorkPlacement is compressed to reduce the size of the WorkPlacement, to inspect the raw content of the document you can decode and decompress withgzip
you can do the following:// Get the filepath of the first scheduled document
kubectl get workplacement <workplacement-name> -o=jsonpath='{.spec.workloads[0].filepath}'
// Get, decode and decompress the content of the first scheduled document
kubectl get workplacement <workplacement-name> -o=jsonpath='{.spec.workloads[0].content}' | base64 -d | gzip -dIf the
WorkPlacement
doesn't exist, this means that no Destination was found that matches the required scheduling selectors. Kratix has been unable to schedule theWork
. See below for further steps to investigateInspect the Work's
.spec.targetDestinationName
to verify that it is scheduled to the correct Destinationkubectl get workplacement <workplacement-name> -o yaml | grep targetDestinationName
-
-
The document was successfully written to the Destination
-
Check the Destination statestore and see if it contains the desired files. By default the files for a workplacement will be in the directory structure of:
-
Promise:
<destination-name>/dependencies/<promise-name>/<workflow-name>/<files>
-
Resource:
<destination-name>/resources/<namespace>/<promise-name>/<request-name>/<workflow-name>/<files>
If your Destination is of type
.spec.filepath.mode=none
then the files will appear in the top-level directory of the Destination.If it is not appearing, check the logs of the Kratix operator to see if there are any errors connecting to the Destination
kubectl -n kratix-platform-system logs <pod-name> -c manager
-
-
-
The system reconciling the Destination is healthy (e.g. Flux/Argo)
- Check the Destination is healthy and able to accept the document. For example, a Destination that is a Kubernetes cluster should have a healthy GitOps installation that can successfully pull from the Destination
Documents are being scheduled to the wrong Destination
Kratix uses label selectors to match documents to Destinations. If a document is being scheduled to the wrong Destination or no Destination at all, it is likely that the label selectors do not match.
-
Check the labels are correctly applied to the Destinations:
kubectl get destinations --show-labels
-
Check the selectors are correctly specified to the documents. Fetch the Work for the Workflow and inspect the
.spec.workloadGroups[*].destinationSelectors
field.kubectl get work --selector kratix.io/resource-name=<request-name>
kubectl get work <work-name> -o jsonpath='{.spec.workloadGroups[*].destinationSelectors}'If this doesn't match the labels on the Destination, the document will not be scheduled to the Destination. See the scheduling documentation on how to specify Destination selectors.
Kratix is not starting
Kratix is a Kubernetes operator and uses cert-manager to generate certificates for the webhooks.
-
Check the Kratix operator status
kubectl -n kratix-platform-system get pods
If the pod is in a
Pending
state or the pod doesn't exist at all, inspect the deployment for any errorskubectl -n kratix-platform-system describe deployment kratix-platform-controller-manager
If the pod is in a
CrashLoopBackOff
orErrored
state, inspect the logs for any errorskubectl -n kratix-platform-system logs <pod-name> -c manager
If the pod is in a
ErrImagePull
state check the Kubernetes cluster has the correct network settings to pull from the container registry, and that if the image is being hosted in a private registry thatimagePullSecrets
are correctly configuredkubectl -n kratix-platform-system get deployment kratix-platform-controller-manager -o yaml | grep imagePullSecrets
-
If cert manager is not installed or is not working correctly Kratix will not start. Check cert-manager is installed and working correctly:
kubectl -n cert-manager get pods
Resource request is not being deleted
When a resource request is deleted, Kratix will run any delete pipelines
associated with the request and clean up any documents that were created. Kratix uses a Kubernetes
concept called
finalizers
to ensure that all cleanup is done before the resource request is deleted
completely from the system. While a resource is being deleted you will see the
resource in a deletion state with the presence of a .metadata.deletionTimestamp
field on the resource.
- Check the resource request is in a deletion state
kubectl get <resource-kind> <resource-request> -o jsonpath='{.metadata.deletionTimestamp}'
When deleting, Kratix completes three actions as part of cleanup in order. Once an
action has completed it no longer appears in the .metadata.finalizer
field. The cleanup steps Kratix completes are:
kratix.io/delete-workflows
for running the delete pipelines associated with the resource request (if any)kratix.io/work-cleanup
for cleaning up the Work/WorkPlacement resources created by the resource requestkratix.io/workflows-cleanup
for cleaning up the Workflows created by the resource request
As Kratix completes these actions in order, if a resource request is stuck in deletion it is likely that one of the finalizers is not being removed correctly.
Check which finalizers are remaining
kubectl get <resource-kind> <resource-request> -o jsonpath='{.metadata.finalizers}'
Check which finalizer is failing (as they run in order, only one will be failing)
and try to resolve the issue. If you are able to resolve the issue you can
manually remove the finalizer from the resource by editing the resource and
removing the finalizer from the .metadata.finalizers
. Kratix will then
continue to delete and work its way through the remaining finalizers.
-
If the
kratix.io/delete-workflows
finalizer is not being removed, check whether the delete Workflow is failingkubectl get pods --selector kratix.io/resource-name=<resource-request>
-
If the
kratix.io/work-cleanup
finalizer is not being removed, check to see whether theWork
/WorkPlacement
resources are failing to be deletedkubectl get work --selector kratix.io/resource-name=<resource-request>
kubectl get workplacement --selector kratix.io/resource-name=<resource-request> -
If the
kratix.io/workflows-cleanup
finalizer is not being removed, check to see if the Workflows are failing to be deletedkubectl get jobs --selector kratix.io/resource-name=<resource-request>
Promise is not being deleted
When a Promise is deleted, Kratix will run any delete pipelines associated and
cleanup any documents that were created, as well as any resource requests.
Kratix uses a Kubernetes concept called
finalizers
to ensure that all cleanup is done before the Promise is deleted
completely from the system. While a Promise is being deleted you will see the
resource in a deletion state by the presence of a .metadata.deletionTimestamp
field on the resource.
- Check the Promise is in a deletion state
kubectl get promise <promise-name> -o jsonpath='{.metadata.deletionTimestamp}'
When deleting Kratix completes six actions as part of cleanup in order, once an
action has completed it no longer appears in the .metadata.finalizer
field. The cleanup steps Kratix completes are:
kratix.io/"delete-workflows"
for running the delete pipelines associated with the Promise (if any)kratix.io/workflows-cleanup
for cleaning up the Workflows created by the Promisekratix.io/resource-request-cleanup
for cleaning up the resource requests createdkratix.io/dynamic-controller-dependant-resources-cleanup
for cleaning up Kubernetes resources created to manage the Promisekratix.io/dependencies-cleanup
for cleaning up the dependencies created by the Promisekratix.io/api-crd-cleanup
for cleaning up the CRDs created by the Promise
Kratix completes these actions in order, so if a resource request is stuck in deletion it is likely that one of the finalizers is not being removed correctly.
Check which finalizers are remaining
kubectl get <resource-kind> <resource-request> -o jsonpath='{.metadata.finalizers}'
Check what finalizer is failing (as they run in order only one will be failing)
and try to resolve the issue. If you are able to resolve the issue you can
manually remove the finalizer from the resource by editing the resource and
removing the finalizer from the .metadata.finalizers
. Kratix will then
continue to delete and work its way through the remaining finalizers.
When investigating a Promise deletion issue it beneficial to have the logs of the Kratix operator open to see if any errors are being logged
kubectl -n kratix-platform-system logs <pod-name> -c manager | grep "controllers\.promise"
-
If the
kratix.io/delete-workflows
finalizer is not being removed, check to see whether the delete Workflow is failing and fix any issues preventing it from completing.kubectl -n kratix-platform-system get pods --selector kratix.io/promise-name=<promise-name> --selector kratix.io/work-action=delete
-
If the
kratix.io/workflows-cleanup
finalizer is not being removed, check to see whether the Workflows are failing to be deleted and manually cleanup any that are.kubectl -n kratix-platform-system get jobs --selector kratix.io/promise-name=<promise-name>
-
If the
kratix.io/resource-request-cleanup
finalizer is not being removed, check to see if the resource requests are failing to be deleted and resource requests are not delete for how to troubleshoot a resource request deletion failingkubectl get <resource-kind> -A
-
If the
kratix.io/dynamic-controller-dependant-resources-cleanup
finalizer is not being removed, check to see if any Kubernetes resources are failing to be deleted and try to manually clean up any remaining.kubectl get configmaps,roles,rolebindings,clusterroles,clusterrolebindings -A --selector kratix.io/promise-name=<promise-name>
-
If the
kratix.io/api-crd-cleanup
finalizer is not being removed, check to see whether the CRD is failing to be deleted and try to manually delete it.kubectl get crd --selector kratix.io/promise-name=<promise-name>
Workflow Pod isn't appearing
When a Workflow is scheduled, Kratix will create a pod to run the Workflow. If the specification for the Workflow is invalid, Kratix will fail to create the Pod. Check the logs of the Kratix operator to see if there are any errors and fix the relating issues in the Workflow.
kubectl -n kratix-platform-system logs <pod-name> -c manager | grep <promise/resource name>
Workflow Pod stuck in CrashLoopBackOff
When a Workflow is scheduled, Kratix will create a pod to run the Workflow. If
the Pod fails Kubernetes will restart the pod. If the pod is failing multiple
times the pod will eventually go into CrashLoopBackoff
. In this scenario
Kratix will not try to reschedule the pod. You can force Kratix to reschedule a
new pod by triggering a manual
reconciliation
Workflow Pod doesn't have Kubernetes API access
A Workflow is, by default, given limited access to the Kubernetes API. You will see errors in the logs of the Workflow if it is trying to access the API without the correct permissions. To give the Workflow more access you need to create additional RBAC permissions. Check the documentation for how you can configure permissions for Workflows alongside some examples.