Date: 2023-12-26
The source code for this lab exercise is available on GitHub.
In our last article Investigating a failed VolumeSnapshot with NFS on Kubernetes, we saw how using NFS as a storage backend for stateful workloads on Kubernetes is not suitable in a production context due to fundamental limitations of NFS itself. So how are we to run our stateful applications on Kubernetes, if at all?
A common deployment model for running stateful applications on Kubernetes is the cloud native hybrid architecture where stateful components of an application such as a database or object storage run on virtual machines (VMs) for optimal stability and performance, while stateless components such as the frontend web UI or REST API server run on Kubernetes for maximal resiliency, elasticity and high availability. While this deployment model combines the best of both worlds, configuration is complex compared to deploying the entire application in-cluster since the stateless components running on Kubernetes must be explicitly configured to point to the out-of-cluster stateful components instead of leveraging the native service discovery mechanisms offered by Kubernetes.
Another option is to leverage a Kubernetes-native distributed storage solution such as Rook Ceph as the storage backend for stateful components running on Kubernetes. This has the benefit of simplifying application configuration while addressing business requirements for data backup and recovery such as the ability to take volume snapshots at a regular interval and perform application-level data recovery in case of a disaster.
Rook Ceph is a CNCF Graduated project. As its name might suggest, Rook Ceph is comprised of two major components: Ceph and Rook. Ceph is the distributed storage system itself while Rook is the Kubernetes operator for automated setup and lifecycle management of Ceph clusters which greatly simplifies the deployment and administration of Ceph clusters on Kubernetes.
In the lab to follow, we’ll quickly provision a 3-node kubeadm cluster (1 master, 2 workers) on the cloud provider of your choice using an automation stack comprised of OpenTofu and Ansible, then deploy Rook Ceph using the official Helm charts and confirm that we are now able to successfully create CSI volume snapshots from PVCs by reusing the MinIO example from our last article.
Continue reading at donaldsebleung.com