New in Kratix: Health Checks and Progressive Rollout

May 27, 2025 · 5 min read

Engineer @ Syntasso

Earlier this year, we introduced an exciting new capability in Kratix: health checks for resources. This addition allows platform teams and app developer to easily observe the status of their requested workloads, without the need to switch context and find it in Destinations.

In this blog, we’ll discuss how you can use it to support progressive rollouts when updating Promises.

What’s the Problem?

When a Promise gets updated, say with a new version of a Helm chart, the standard behavior in Kratix is to reconcile and update all resource requests at once. That’s fine in simple dev environments, but for complex workloads, upgrading everything at once is risky. A failed update could disrupt many environments simultaneously, and debugging becomes difficult when failures are scattered.

Platform engineers need a safer approach: progressive rollouts. Instead of deploying changes to your entire fleet at once, progressive rollout allows teams to introduce updates gradually to limit the impact, gather early feedback, and catch potential bugs before releasing broadly. But for that to work, Kratix needs a way to understand the health of each individual resource during and after an upgrade.

Health Checks to the Rescue

Platforms are often described as a black box - you need to integrate dozens of tools into complicated data ingestion and dashboard platforms to understand what is happening with all your resources. For a long time, Platforms orchestrated by Kratix were no different. The new health check feature enables Promise authors to define custom health validation logic for the resources their Promise provisions, which means not only can you report back basic health data (ready, degraded, failed etc.) but also context relevant that can be used to drive other processes, for example connection strings to access your database resource, or the unique ID name of the application you've just deployed.

Here’s how it works:

You can simply update your existing Promise resource configure workflow to output a HealthDefinition, which contains structured instructions on how to perform health checks against requested resources in a Destination or external system. Kratix reads the health status and automatically updates the status of each resource request.

By including a HealthDefinition, you give Kratix insight into the runtime state of each resource, whether it’s ready, degraded, or failed. With better visibility into Resource health, Promise authors now have the foundation for more complex orchestration, such as progress rollout.

High-level diagram of how Health Check works in Kratix — How Health Checks works in SKE

Health Definition includes all the necessary information to execute the Health Check workflow in a Destination. As part of the Syntasso Kratix Enterprise offering, we provide a SKE Health Agent that can reconcile Health Definition in a Kubernetes Destination. If you're not using SKE, you will need to bring your own agent to act on Health Definitions. You can also refer to our Surfacing health information guide on how to run health checks without an agent in the Platform Cluster.

Progressive Rollouts in Action

Let’s say your Promise provisions PostgreSQL as a service. You’re installing a new version of the Promise that updates some of it default configurations.

In your PostgreSQL Promise resource configure workflow, you can include a container that can

output a HealthDefinition to check Postgres health
use Kubernetes Leases to ensure it only continues the workflow when it acquires the lease

With Health Checks enabled and set up in your Platform and Destinations, when you update the Promise:

Kratix begins updating resource requests
One of the Resource Configure workflow acquires the Lease
Lease is only released after Health checks report a healthy status, ensuring Resources are upgraded one by one
When there's a failed upgrade, Lease are never released and rollout is paused and the failed state is visible in the resource request status
Kratix continues upgrading Resources one by one until all of them are upgraded and reported healthy

This progressive delivery strategy has a myriad of benefits:

Mitigate risk and downtime By gradually exposing changes to a small percentage of users, you limit the blast radius of bugs or failures. If something goes wrong, the impact is contained and easier to recover from.
Improve debuggability Smaller rollouts make it significantly easier to isolate and troubleshoot issues. You’re not drowning in signals from a full-scale deployment, so you can focus on specific failure cases as they arise.
Fast feedback loops for experimental features. You can gather real-world data on performance metrics, error rates, and user feedback from a limited rollout, and use them to validate assumptions before proceeding further.
Increased trust in your platform Stakeholders: developers, operators, and other users, will gain confidence in the platform when they see that changes are deployed safely and reliably, with safeguards in place to detect and respond to issues.

What's next

With the introduction of health checks in Kratix, platform engineers and Promise authors now have a powerful tool to drive safer, smarter, and more observable upgrades. By embedding health logic directly into resource workflows, Kratix can track the status of each individual resource and gate rollout progression accordingly.

This unlocks the ability to perform progressive rollouts - updating one resource at a time, validating its health before proceeding. This dramatically reduces the risk of widespread failures and makes it easier to pinpoint and debug issues when they occur. It also brings greater confidence and control to platform teams managing complex workloads across multiple environments.

If you’re looking to improve the reliability and maintainability of your service updates, health checks are a foundational step forward. Check out our step-by-step guide to get started. We are working to add more Kratix native functionalities to make progressive upgrades easier for Promise authors, and are always keen to hear your thoughts and feedback. Let us know how you’re using them—we’d love to hear from you in our Slack or GitHub communities.

What’s the Problem?​

Health Checks to the Rescue​

Progressive Rollouts in Action​

What's next​

What’s the Problem?

Health Checks to the Rescue

Progressive Rollouts in Action

What's next