How your Resources get from Promise to Destination
Ever wondered how Kratix actually gets your documents from the Platform Cluster to the correct Destination?
The Syntasso Team has recently introduced a change to Kratix to reduce the size of the Work object. While this change is mostly internal, we wanted to share how the innards of Kratix work.
So brace yourself to learn:
- how Kratix moves documents from Platform to Destinations
- what works and workplacements are
- how to inspect works to debug your Promises
You are probably already familiar with how Kratix works at a high level and with the diagram below:
As illustrated above:
- The user sends a new App Request to the Platform Cluster.
- The Promise reacts to that request and triggers the Resource Configure Workflows.
- The Workflow completes and outputs a series of documents to be scheduled to a Destination.
- These documents are written to a specific directory in the State Store.
- In the diagram, the documents are scheduled to a Kubernetes Destination. These type of Destination usually have Flux (or ArgoCD, or another GitOps tool) watching the State Store. The tool picks up the new documents.
- The documents are then processed and applied to the Destination.
- The App becomes operational on the Destination.
In this post, I'm going to expand on points (3) and (4): what happens at the end of the Workflow? How is the document written to the State Store? And how does the change linked above affect this process?
If the diagram is new to you, I recommend checking out the Part I of the Kratix Workshop for an overview of Kratix.
A Dive into Kratix Internals
The casual observers among you may have noticed that, when installing Kratix, a couple of CRDs are also created but not prominently mentioned in the guides or workshops: the Work and the WorkPlacement.
The Work CRD contains the definition of, well, a Work. All the documents output by a workflow are captured in the Work Object as workloads. Each document corresponds to a workload entry in the Work object. These workloads are grouped by the destination selectors specified by both the Workflow and the Promise.
In other words, the Work object encapsulates everything needed to schedule the workloads to the Destinations.
Keen observers may have notice the few extra containers that are included in the
Workflow Job. One of these containers is called work-writer
. As the name
suggests, it handles creating the Work object at the end of the Workflow. 😉
One of the controllers bundled with Kratix is the Work Controller. This controller is responsible for finding out all the available Destinations and selecting the right one for each workload in a Work. It achieves this by monitoring Work objects and creating a WorkPlacement object for each workload.
The Work Controller marks the Work as Unscheduled. You can verify this by
checking the Scheduled
condition in the status
field of the Work Object.
Once a Destination becomes available, the system will automatically try to schedule any unscheduled Work.
The WorkPlacement object serves as a link between a Work (or specifically, a workload group within the Work) and a Destination. It contains a copy of the Workloads and information about the Destination it is scheduled to.
The WorkPlacement controller reacts to WorkPlacements and ensures the workloads are written to the State Store associated with the Destination.
The diagram below illustrates the Work and WorkPlacement objects in details:
In summary:
- The contents of a Workflow output are combined into a single Work object. Each
document has an associated
workload
entry in the Work. - Workloads are grouped by the destination selectors specified by the Workflow and the Promise.
- From the Work object, a WorkPlacement is created for each Workload group.
- The WorkPlacement controller writes the Workloads to the State Store associated with the Destination.
- 🎉
That means the Work object can get quite large, since it's combining all the documents into a single object. But how large is too large?
Reaching etcd limits
The answer is about 1.5MB. While the Kubernetes API accepts up to 3MB of data in a single request, etcd only persist keys up to 1.5MB (by default). Although this is configurable, it's fair to assume that most clusters where Kratix is deployed will use the default settings.
So what happens if a Work object exceeds 1.5MB? The Configure Workflow fails at
the work-writer
container, and the error message isn't particularly helpful:
etcdserver: request is too large
You may see an error message like Request entity too large: limit is 3145728
; that means you are hitting the Kubernetes API limit, not the etcd
one.
While it takes a lot of YAML to be over 1.5MB, you can easily reach such a limit in your Promise. The Prometheus Operator, for example, includes 3.7MB of YAML!
This brings us back to the change introduced by 243. In this update, we introduced gzip compression for the Workload contents before persisting the Work into etcd. This significantly reduces the size of the Work object (gzip documentation mentions an average of 70% reduction in size). For the Prometheus Operator, for example, the size of the Work object went from 3.7MB to about 490KB—an 87% reduction 🎉!
The downside? If you inspect the Work object, you’ll see base64-encoded binary data instead of some nice to read YAML.
You can still read it though. To inspect a workload’s contents, decode the base64 data, then unzip it using this command:
kubectl get work <work-name> \
-o jsonpath='{.spec.workloadGroups[0].workloads[0].content}' \
| base64 -d \
| gzip -d
Check our Troubleshooting guide for more information on how to debug Kratix, including inspecting Works and WorkPlacements.
Despite compression, large Work objects may still pose challenges. While this update provides temporary relief, we’ll need to revisit the structure to allow users to have unbounded fun with their Promises. But that’s a story for another day.
Conclusion
In this post, we dived into the internals of Kratix to understand how a document moves from the Platform Cluster to a Destination. We saw how the Work and WorkPlacement objects are used to schedule and write documents to the State Store. We also saw how the recent change to compress the Workload contents has helped reduce the size of the Work object.
We hope this post has given you a better understanding of how Kratix works under the hood. If you have any questions or feedback (or want to see more blog posts like this) please don't hesitate to reach out to us on Slack or GitHub.