Why we decided to build a K8ssandra operator - Part 3 - K8ssandra, Apache Cassandra® on Kubernetes

In the first and second posts in this series in this series, we’ve shared recent conversations with K8ssandra core team members about the initial decision to create K8ssandra as a series of Helm charts for installing and managing an Apache Cassandra cluster and associated tools in Kubernetes. We’ve discussed why we decided to build an operator to help manage K8ssandra clusters, and the ongoing role we see for Helm in the project.

This post continues the story with my follow-up conversation with John Sanda about the K8ssandra operator under construction for the K8ssandra 2.0 release. We discuss some of the tactical decisions in implementing an operator, such as choosing an operator framework, the implications for project structure and testing, and most importantly, how this will enable more developers to contribute to the project. What follows is a lightly edited version of our conversation.

How Kubernetes operators improve developer experience

Jeff Carpenter: What do we think that building an operator is going to enable us to do in the K8ssandra project? What opportunities will that open up?

John Sanda: From an engineering standpoint, working in a full programming language like Go is going to be more appealing than working on templates. I’m curious to see how that carries over with new contributors.

John Sanda: This will also improve testing. There’s a lot of test coverage tools out there, for example, we’re using SonarCloud. However, we can’t use SonarCloud with Helm templates, so we don’t really have a good way to measure the level of coverage we have in our tests right now. You also don’t have the same level of support in IDEs that you would for a static language.

John Sanda: Coding in Go will allow us to use the full arsenal of the language. For example, in Helm, we can’t create helper functions as we can in Go. In Helm, we can only create helper templates.

Extending operators with new features

Jeff Carpenter: You listed several developer experience improvements, which are important, but how will building an operator facilitate adding new features?

John Sanda: The biggest feature this will enable is support for multi-cluster deployments. It would be really difficult to try to do that just via Helm. It’s possible, but we would still be running a bunch of smaller controllers. We’d be missing that uniform orchestration layer on top of it, and we’d be very frustrated at the end of it.

Jeff Carpenter: Are there a lot of interactions between the different components? Do the components have dependencies? Many of the other components seem to have dependencies on Cassandra – is that where the frustration comes from?

John Sanda: From a usability standpoint, one of the goals is to have a “k8ssandra-cluster” CRD. This will have a status field to give you an overview of the state of the object, whether it’s ready, not ready, initializing, and so forth. The K8ssandra cluster status field will summarize the health of all the objects that make up that cluster: the Cassandra cluster, Stargate, Reaper, and anything else deployed as part of it. You can’t do that with Helm.

John Sanda: When users are trying to install things and having trouble, and particularly new things, it’s very difficult to see what’s going on. You’re just grasping at straws to try to get things working: “I finally got my Kubernetes cluster up. Now I’m trying to get K8ssandra installed. Now I got it to deploy. Okay, how do I know it is working? What do I check?” At this point, you have to check the status of the individual components.

Jeff Carpenter: That’s right, I’ve been on the consuming end of this myself: “How are my Cassandra pods doing? Okay, I see three are in the Ready state. Now Stargate is starting up…,” and so on.

K8ssandra operator implementation decisions

Jeff Carpenter: What choices have you made in building out the operator, such as the implementation language and framework?

John Sanda: I want to emphasize upfront that Kubernetes is language-agnostic. The two main constructs in Kubernetes are HTTP and containers. It’s best to think of it this way: “I’m deploying containers that speak HTTP.” Don’t think of it in terms of “I’m deploying a Jar file” or “I’m deploying a Go binary with a gRPC API”.

John Sanda: However, in terms of the actual implementation, the K8ssandra operator is built on the Go ecosystem. Everything builds on top of the Kubernetes client Go library. There are client implementations in other languages, but the fact that Kubernetes itself uses the Go client gives that client a big advantage. That’s one of the reasons you see most operators being written in Go.

John Sanda: There are two popular frameworks that provide the scaffolding and helper libraries for developing operators: Operator SDK and KubeBuilder. More recent versions of Operator SDK are almost like a shim on top of KubeBuilder. At this time last year, there was a different set of libraries, scaffolding, and project layout with Operator SDK versus KubeBuilder. Now, Operator SDK uses the same project layout and underlying scaffolding and libraries as KubeBuilder.

Jeff Carpenter: It sounds as if Operator SDK and KubeBuilder are converging, at least in terms of what the resulting code looks like?

John Sanda: Yes, that’s right. For the K8ssandra operator, we’re working directly with Operator SDK, which is also what cass-operator uses.

Managing the K8ssandra operator roadmap

Jeff Carpenter: How do you see the K8ssandra operator maturing over time? How do we prioritize development?

John Sanda: We’re seeing a lot of increase in contributions to K8ssandra right now, especially in terms of issue creation. That’s going to help speed up the maturation of the operator. For example, I mentioned private registry support. Well, that’s table stakes for having something that’s mature and really production-ready. Now we’re starting to really pick up the momentum with the operator development. It’s a huge bonus to have a growing user community. That’s going to help us recognize the things that we need to do in order to speed up that maturation process.

John Sanda: In terms of priorities, the focus right now is on two things. One is the migration – porting over the existing functionality that we have in the Helm charts, and making sure the operator has feature parity. The second is multi-cluster support.

Jeff Carpenter: Some folks will be familiar that we’ve recently released K8ssandra 1.3 in association with Cassandra 4.0. It sounds like there could be more Helm-based maintenance releases in this 1.X series, but most of the current development work is focused on the K8ssandra operator for a 2.0 release, as opposed to adding new features in the 1.X series?

John Sanda: Correct. We’re trying to address bugs or gaps in the 1.X release stream, but trying to focus any major new feature work towards the operator.

Jeff Carpenter: Will the 2.0 release focus only on feature parity with 1.3 or will it have additional features, like multi-cluster support?

John Sanda: Yes, it will have multi-cluster support. We’re also working on a new controller – an operator for Stargate. This will address some of the bugs we’ve had around deploying and managing Stargate with Helm.

Summary

We’ve got high hopes for the K8ssandra operator, and how our choices of Operator SDK and Go will lead to increased productivity and help to expand the K8ssandra developer community. While we’re focused initially on multi-cluster support, there are other great ideas for what could come next on the roadmap.
Next week we’ll conclude this series with a discussion of how to test Kubernetes operators. We look forward to sharing more with you then, and as always please give us your feedback and questions on the Forums or our Discord server.

K8ssandra, Apache Cassandra® on Kubernetes

Why we decided to build a K8ssandra operator – Part 3

How Kubernetes operators improve developer experience

Extending operators with new features

K8ssandra operator implementation decisions

Managing the K8ssandra operator roadmap

Summary

k8ssandra-operator v1.10.0 is available

Medusa 0.16 was released

Centralized Reaper Deployment Mode in k8ssandra-operator

k8ssandra-operator v1.5.0 is available

Announcing a Turnkey Solution for Cassandra CDC Integration on Kubernetes

Introduce yourself

FAQ | Becoming a trusted member

Local installation on VM: reaper and stargate are stuck

In Rack topology, Why Affinity Rules are preferred over TopologySpreadConstraint?

Medusa-restore start on every restart of cassandra DC