Why we decided to build a K8ssandra operator

Part 1 of 4

As a distribution of Apache Cassandra built from multiple open source components, K8ssandra was originally installed and managed via a collection of Helm charts. This has continued through the most recent 1.3 release. While the project has leveraged operators for components including Cassandra (cass-operator) and Medusa (medusa-operator), there has not been a Kubernetes operator to manage all of these components as a holistic system.

Recently, the K8ssandra core team finalized a decision they had debated for months: to take on the effort of creating an operator for the K8ssandra project. I was able to sit down with K8ssandra core team members John Sanda and Jeff DiNoto and get their thoughts on this move toward building a K8ssandra operator for the 2.0 release. There was so much good detail in our conversation that we decided to share it in a series of four posts, with minimal edits so you can experience the full flow.

In the first post in this series, John and I discuss how the decision to build an operator came about, the origins of K8ssandra as a project based on Helm, and the shortcomings of Helm for managing complex deployments on Kubernetes.

Why We Chose Helm for K8ssandra

Jeff Carpenter: When we first started building K8ssandra, we used cass-operator as the foundation to deploy the Cassandra nodes. Then we started using Helm to help us install the rest of this ecosystem of components around the Cassandra cluster with components like Reaper, Medusa, Stargate, and so on. Can you talk me through how we decided to use Helm?

John Sanda: A big part of it was timing. The plans came together for the K8ssandra project not too long before Kubecon North America 2019. We thought it would be nice if we could have something ready to go that people could discuss, and even use at a workshop we were doing at KubeCon. We knew we needed to iterate quickly, and that a good bit of our audience was coming from a background that didn’t have Kubernetes expertise and experience. A package management tool and installer like Helm is a lot easier to grasp than an operator and custom resources (CRDs), especially if you’re not familiar already with Kubernetes constructs. It makes it easier for folks to get a better grasp of what K8ssandra is trying to do from the start.

Jeff Carpenter: There are obvious benefits here in terms of usability and understandability for a developer who’s less Kubernetes savvy to understand what’s happening. But do you also get some testing benefits out of it?

John Sanda: To be clear, I don’t want to say Helm is for “less Kubernetes savvy” people, because a big part of the Kubernetes ecosystem uses Helm. I can still relate to my first experience trying to wrap my head around a lot of Kubernetes concepts. For me, it’s just a lot easier to grasp. People have experience with different platforms, package management, and installation tools. That makes Helm much more relatable and understood. It’s standard in that regard, versus the concepts of operators, controllers, custom resources. If you don’t already have experience with Kubernetes, that’s a lot to digest. But with Helm, you can say “You know apt, or Yum”, or to a Mac user, for example, “You know Homebrew, right? This is something similar for Kubernetes.” I think most people will get it.

Jeff Carpenter: I noticed that when I do a K8ssandra installation, there are several custom resources that are defined. For example, I get a Cassandra datacenter that comes from cass-operator. There’s a CassandraBackup resource that is associated with Medusa, and a Medusa operator that manages that resource. But those custom resources are all part of the projects that K8ssandra includes. Is it correct to say that K8ssandra itself has no custom resources of its own because there’s no K8ssandra operator yet?

John Sanda: Correct. K8ssandra installs a bunch of different things, and there’s no uniform, overarching operator to manage all of it – yet.

Helm Challenges

Jeff Carpenter: What are some of the challenges that we ran into with Helm?

John Sanda: There’s some difficulty when you start writing complex logic. Helm has good support for control flow, with loops and if statements. But when you start getting more involved than that, it’s harder to read and reason through the code, and indentation becomes an issue. The tooling and support get more challenging. In fact, I remember prior to the K8ssandra 1.0 release, trying to finish up some work to support secrets, and peer-reviewing the changes got really complicated. It would have been much easier to do with just a regular programming language, insofar as being easier to read and follow the code.

John Sanda: There are some limitations with reuse that you don’t get in a general-purpose programming language. The scope is limited, so when you declare a variable, it’s limited to the scope of the template where you declare it. If I have a variable defined in my Cassandra datacenter template, and I want to reuse that variable in my Stargate template, I can’t. I have to recreate it. I can’t keep my code dry. That’s ripe for problems technically.

John Sanda: You run into some repetition with the Helm properties due to the way the inheritance model works. Here’s another example: we just rolled out support for private registries in K8ssandra 1.3. There are places where we could do some things to improve on this, but there’s a lot of repetition, where we have to redefine multiple properties over again for different images. There are tools built on top of Helm to address this. One is called Helmfile, which adds a configuration layer on top of Helm. But in the end, it’s just simpler to do those types of things with an operator.

John Sanda: Similarly, Helm has a nice big library of helper template functions, but it doesn’t cover every use case, and there is no interface to define your own functions. You can define your own custom templates, which allow for a lot of reuse, but that’s not the same as a function. We ran into some problems with that.

John Sanda: Even to do a multi-datacenter cluster within the same Kubernetes cluster, trying to implement the changes that we needed within the Helm templates would have been tough. We have control flow logic for going through each of the datacenters in the racks to do things, which would have to be repeated across various templates. Again this is not a function call that we repeat, it’s copying and pasting code.

John Sanda: Another area that was a big struggle for us: we tried to implement an umbrella chart design pattern, which I think is the right thing to do. We had the top-level K8ssandra chart and various sub-charts. We wanted to implement additional sub-charts for Reaper and for Stargate. The separation of concerns helps in a number of ways, but we didn’t implement this pattern because of the problems that I described previously, where we would have had to repeat various pieces of logic and variables. This gets into the way scoping and inheritance works.

John Sanda: Let’s say I have a chart A that depends on chart B and chart B depends on chart C. Chart C declares a set of properties, they bubble up to chart B and then up to chart A. But I can’t go the reverse order. To make this concrete, we want to define authentication settings that would apply not just to Cassandra, but also to Stargate and Reaper. I wanted to define those in one place: ideally, that top-level chart where the user is given an interface interacts with it and then we push values down to sub-charts. This is not possible with the Helm inheritance model.

John Sanda: This also caused problems when trying to implement support for multi-datacenter clusters. At that point, we decided it’d be a lot easier to implement this with an operator, than trying to do it in Helm templates.

Summary

In this conversation, we talked about how the K8ssandra team was able to quickly deliver the first few releases of the project using Helm, but also some of the limitations we ran into that became more and more challenging to deal with. In the next post, we’ll discuss the decision to start building a K8ssandra operator and the ongoing role we see for Helm within the K8ssandra project.

K8ssandra, Apache Cassandra® on Kubernetes

Why we decided to build a K8ssandra operator

Part 1 of 4

Why We Chose Helm for K8ssandra

Helm Challenges

Summary

k8ssandra-operator v1.10.0 is available

Medusa 0.16 was released

Centralized Reaper Deployment Mode in k8ssandra-operator

k8ssandra-operator v1.5.0 is available

Announcing a Turnkey Solution for Cassandra CDC Integration on Kubernetes

Introduce yourself

FAQ | Becoming a trusted member

Local installation on VM: reaper and stargate are stuck

In Rack topology, Why Affinity Rules are preferred over TopologySpreadConstraint?

Medusa-restore start on every restart of cassandra DC