The K8ssandra team has been heads down these last few months, improving k8ssandra-operator and all its subprojects, to the point where we dropped the ball on publishing blog posts, especially ones for v1.3.0 and v1.4.0.

As we’re releasing k8ssandra-operator v1.5.0, let’s go through the most interesting changes that shipped since v1.3.0.

Unstructured database yaml configuration

We added flexibility to the k8ssandra-operator, making the cassandra.yaml and dse.yaml structures fully dynamic in the K8ssandraCluster CRD. This allows new settings to be supported without any action on our end to update the CRDs, lowering the maintenance effort and the delay between Apache Cassandra™ releases and new settings availability in K8ssandra.

We also made it possible to add arbitrary additional jvm options in specific jvm*.options file for more flexibility:

      - config:
          jvmOptions:
            additionalJvm11ServerOptions:
              - '-XX:+UseConcMarkSweepGC'
              - '-XX:+CMSParallelRemarkEnabled'
              - '-XX:+UseCMSInitiatingOccupancyOnly'
            additionalJvm8ServerOptions:
              - '-XX:ThreadPriorityPolicy=42'
            additionalJvmServerOptions:
              - '-Dio.netty.maxDirectMemory=0'

Commit log and data on separate volumes

The above changes, along with container and volume injection from v1.2.0 made it possible to support JBOD (which we don’t advise to use but could be required in some infrastructures), and allow placing the commit log and data directories on separate volumes.

The following manifest uses additional PVCs (which will be managed by the statefulset) for the commit log and a second data directory:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  cassandra:
    serverVersion: 4.0.6
    datacenters:
      - metadata:
          name: dc1
        k8sContext: kind-k8ssandra-0
        size: 3
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: standard
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 5Gi
        extraVolumes:
          pvcs:
            - name: sts-extra-vol
              mountPath: "/var/lib/extra"
              pvcSpec:
                storageClassName: standard
                accessModes:
                  - ReadWriteOnce
                resources:
                  requests:
                    storage: 1Gi
            - name: commitlog-vol
              mountPath: "/var/lib/cassandra/commitlog2"
              pvcSpec:
                storageClassName: standard
                accessModes:
                  - ReadWriteOnce
                resources:
                  requests:
                    storage: 100Mi
        config:
          cassandraYaml:
            data_file_directories:
              - /var/lib/cassandra/data
              - /var/lib/extra/data
            commitlog_directory: "/var/lib/cassandra/commitlog2"

Safe DC decommissions and improved troubleshooting

Before v1.3.0, you could decommission a DC by removing it from the K8ssandraCluster manifest, even if keyspaces were still replicated on it. k8ssandra-operator would remove the replicas and decommission the nodes in that DC.
While this is convenient, it could have dramatic consequences on multi-tenant clusters where some teams may lose data due to miscommunication when removing datacenters.
k8ssandra-operator will now prevent such removals and provide a comprehensive error message telling which keyspace is preventing the DC decommission.

Talking about error messages, we made it easier to troubleshoot the operator without requiring to dive into its logs. We added a new error status field which will contain any error that would fail a reconcile. This field is now shown in the output of the kubectl get command when listing K8ssandraCluster objects.
When everything is working fine, the output should look something like:

% kubectl get k8c          
NAME   ERROR
test   None

In case the reconcile fails, you will get an error message:

% kubectl get k8c
NAME   ERROR
test   admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec

This error message will also appear in the kubernetes events:

% kubectl get events
LAST SEEN   TYPE      REASON                  OBJECT                                                       MESSAGE
55m         Normal    CreatedResource         cassandradatacenter/dc1                                      Created service test-dc1-service
…
38s         Warning   Reconcile Error         k8ssandracluster/test admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec

Per node configuration

It is often necessary to have the ability to test Cassandra configuration changes on a single node to ensure that it provides the expected improvements without impacting the whole cluster.

Since v1.4.0, it is now possible to achieve this using per node configuration.To use this feature, create a configmap named <cluster name>-<dc name>-per-node-config with the following format:

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-dc1-per-node-config
data:
  test-dc1-r1-sts-1_cassandra.yaml: |
      concurrent_reads: 64

The above will override the concurrent_reads setting to a value of 64 in cassandra.yaml for the second pod (ordinal 1) in cluster test, DC dc1 and rack r1. The format for configmap entries is: <cluster name>-<dc name>-<rack name>-sts-<ordinal position of the pod>_<file to override>.

The files that can be overridden this way should be in yaml format and under the cassandra configuration directory.

The operator will discover the configmap, given it is named correctly and placed in the same namespace as the K8ssandraCluster object.

Vector integration

Vector is a popular tool which was created by Timber Technologies to build observability (logs and metrics) pipelines, and was acquired by DataDog in 2021. Starting with v1.5.0, k8ssandra-operator will support deploying Vector agent as a sidecar to the Cassandra, Stargate and Reaper pods, preconfigured to scrape metrics from the available endpoints.

The Vector agent is fully configurable through the CRD under .spec.cassandra.telemetry, .spec.reaper.telemetry and .spec.stargate.telemetry.

There you can add sources, transforms and sinks in a semi-structured way:

cassandra:  
  telemetry:
      vector:
        enabled: true
        components:
          sinks:
            - name: console_output
              type: console
              inputs:
                - cassandra_metrics
              config: |
                target = "stdout"
                [sinks.console_output.encoding]
                codec = "json"
        scrapeInterval: 30s
        resources:
          requests:
            cpu: 1000m
            memory: 1Gi
          limits:
            memory: 2Gi

Source components will be preconfigured to scrape the metrics, named respectively cassandra_metrics, stargate_metrics and reaper_metrics depending which pod the Vector agent is in.
The above configuration should display the scraped Cassandra metrics in json format in the output of the vector-agent container:

{"name":"org_apache_cassandra_metrics_table_bloom_filter_off_heap_memory_used","tags":{"cluster":"Weird Cluster Name","datacenter":"dc1","host":"95c50ce3-2c91-46bb-9500-536f0959241b","instance":"10.244.2.8","keyspace":"system","rack":"default","table":"view_builds_in_progress"},"timestamp":"2023-02-02T10:28:08.670070700Z","kind":"absolute","gauge":{"value":0.0}}
{"name":"org_apache_cassandra_metrics_table_bloom_filter_off_heap_memory_used","tags":{"cluster":"Weird Cluster Name","datacenter":"dc1","host":"95c50ce3-2c91-46bb-9500-536f0959241b","instance":"10.244.2.8","keyspace":"system_traces","rack":"default","table":"sessions"},"timestamp":"2023-02-02T10:28:08.670070700Z","kind":"absolute","gauge":{"value":0.0}}
{"name":"org_apache_cassandra_metrics_table_bloom_filter_off_heap_memory_used","tags":{"cluster":"Weird Cluster Name","datacenter":"dc1","host":"95c50ce3-2c91-46bb-9500-536f0959241b","instance":"10.244.2.8","keyspace":"system","rack":"default","table":"local"},"timestamp":"2023-02-02T10:28:08.670070700Z","kind":"absolute","gauge":{"value":8.0}}
{"name":"org_apache_cassandra_metrics_table_bloom_filter_off_heap_memory_used","tags":{"cluster":"Weird Cluster Name","datacenter":"dc1","host":"95c50ce3-2c91-46bb-9500-536f0959241b","instance":"10.244.2.8","keyspace":"system_schema","rack":"default","table":"indexes"},"timestamp":"2023-02-02T10:28:08.670070700Z","kind":"absolute","gauge":{"value":8.0}}
{"name":"org_apache_cassandra_metrics_table_bloom_filter_off_heap_memory_used","tags":{"cluster":"Weird Cluster Name","datacenter":"dc1","host":"95c50ce3-2c91-46bb-9500-536f0959241b","instance":"10.244.2.8","keyspace":"system_auth","rack":"default","table":"network_permissions"},"timestamp":"2023-02-02T10:28:08.670070700Z","kind":"absolute","gauge":{"value":0.0}}

The source/transform/sink config field is a multiline string that can take any arbitrary config entry for any component. Its content will be written as is to the generated vector.toml file. This is what our manifest will generate as toml:

data_dir = "/var/lib/vector"

[api]
enabled = false
  
[sources.cassandra_metrics]
type = "prometheus_scrape"
endpoints = [ "http://localhost:9000/metrics" ]
scrape_interval_secs = 30


[sinks.console_output]
type = "console"
inputs = ["cassandra_metrics"]
target = "stdout"
[sinks.console_output.encoding]
codec = "json"

This will be a major improvement over previous versions which only supported the creation of ServiceMonitors if the Prometheus operator was installed. This new feature allows sending metrics to any system supported as sink in Vector, also enabling remote writes to Prometheus instances outside of the Kubernetes cluster or pushing metrics to DataDog.

Revamped metrics endpoint

Since the very first version, K8ssandra has used MCAC to expose metrics in a Prometheus compliant format. 

While MCAC can work with Kubernetes, it wasn’t designed for it. Over the past couple years, we listed some inefficiencies and pain points which eventually led to a redesign.

collectd

MCAC’s dependency on collectd is convenient for non containerized environments as it provides necessary OS level metrics (CPU usage, load average, disk usage, …), it is unnecessary for containerized environments. Kubernetes has more standard plugins to provide such metrics (node_exporter, vector host_metrics source) which are well integrated with Prometheus.

Metrics names

The names of the metrics generated were creating multiple issues such as filters that would be based on Cassandra metrics names but dashboards would be using different names that collectd outputs after being renamed internally by the agent. The DataDog agent couldn’t efficiently filter metrics due to their names as well, making it impossible to keep their number below the maximum accepted number of series DataDog allows.

Dropped metrics

Bug reports also came in early on when K8ssandra v1 was shipped as metrics would get dropped as being too old, which we still haven’t totally fixed to this day. The reason seems to be that collectd is a bottleneck when the number of metrics gets too high, as it pulls metrics from the agent and then stores them as files, which will get consumed when the metrics get scraped by an external system such as Prometheus. It then builds a backlog that cannot be processed fast enough to catch up resulting in all metrics being dropped indefinitely. It was also reported that out of order data points were coming in and being rejected, most likely because of duplicate metrics names due to renaming rules.

Our requirements for replacing MCAC

  • Better integration with Kubernetes
  • Avoid using JMX exporters of any kind for performance reasons
  • Reduce the number of dependencies in the K8ssandra project
  • Rely on external solutions for machine level metrics
  • Rework metrics names for simplicity and efficiency
  • Make it easier to reason about for Prometheus savvy users

In v1.5.0, we’re introducing a new metrics endpoint that is now embedded into the management-api, hooking directly into the Cassandra metrics registry and exposing the metrics through a Prometheus compliant http endpoint at http://localhost:9000/metrics.

The target architecture uses a pull-only model, deferring to the consumer for scraping intervals and only exposing up to date metrics.

We believe that this solution will generate less bugs and production incidents, while being easier to investigate and maintain.

Prometheus compliant relabeling and filtering rules can be configured directly in the agent through a config map containing a yaml configuration file.

MCAC remains the default endpoint and we aim at completely removing MCAC from k8ssandra-operator in an upcoming release. New dashboards will soon be published which use the metrics names exposed by the new endpoint.

DSE support

We have been focusing for the past couple of years on making it easier to run OSS Apache Cassandra™ in Kubernetes. DataStax customers could only use cass-operator to run their workloads in Kubernetes and there was a growing demand for supporting DSE (DataStax Enterprise) in K8ssandra.
In v1.3.0, we shipped the basic features allowing us to use DSE instead of Apache Cassandra as the underlying database in K8ssandra, and perfected our support in v1.4.0 and v1.5.0.

Cluster level tasks

cass-operator exposes the CassandraTask API which allows performing DC wide operations such as rolling restarts or rebuilds.
K8ssandraTasks are a thin wrapper around cass-operator’s own CassandraTask.
When a K8ssandraTask is created:

  • k8ssandra-operator spawns a CassandraTask for each datacenter in the cluster;
  • cass-operator picks up and executes the CassandraTasks;
  • k8ssandra-operator monitors the CassandraTask statuses, and aggregates them into a unified status on the K8ssandraTask.

Apache Cassandra 4.1 support

Starting with v1.4.1, k8ssandra-operator supports Cassandra 4.1.

Note that Stargate (v1 or v2) do not support Cassandra 4.1 yet and you will need to disable Stargate in order to start K8ssandra clusters using this version.

Deep merge cluster and dc level configuration

Until k8ssandra-operator v1.5.0, defining Cassandra YAML settings at the DC level, under .spec.cassandra.datacenters[?].config.cassandraYaml, would override all the settings defined at the global level, under .spec.cassandra.config.cassandraYaml. This was fairly inconvenient as it is frequent that some settings are defined at a global level but others are defined/overridden only at the DC level.

We modified this behavior to perform deep merge operations between the global and dc level DatacenterOptions struct, which will retain the global settings that aren’t overridden at the DC level but favor DC level values in case they are.

Considering the following manifest:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
spec:
  cassandra:
    serverVersion: 4.0.7
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      cassandraYaml:
        concurrent_reads: 64
        concurrent_writes: 64
        concurrent_counter_writes: 64
      jvmOptions:
        heapSize: 512Mi
        heapNewGenSize: 256Mi
        gc: CMS
    mgmtAPIHeap: 64Mi
    datacenters:
      - metadata:
          name: dc1
        k8sContext: kind-k8ssandra-0
        config:
          cassandraYaml:
            concurrent_reads: 32
        size: 2
      - metadata:
          name: dc2
        k8sContext: kind-k8ssandra-1
        config:
          jvmOptions:
            heapSize: 1Gi
            heapNewGenSize: null
            gc: G1GC
        size: 2

Prior to v1.5.0, dc1 would end up with the following config:

        config:
          cassandraYaml:
            concurrent_reads: 32

And dc2:

        config:
          jvmOptions:
            heapSize: 1Gi
            heapNewGenSize: null
            gc: G1GC

In the process they would lose all the settings which are defined at the cluster level.
Starting with v1.5.0, dc1 will instead get the following settings post merge:

    config:
      cassandraYaml:
        concurrent_reads: 32
        concurrent_writes: 64
        concurrent_counter_writes: 64
      jvmOptions:
        heapSize: 512Mi
        heapNewGenSize: 256Mi
        gc: CMS

And dc2:

    config:
      cassandraYaml:
        concurrent_reads: 64
        concurrent_writes: 64
        concurrent_counter_writes: 64
      jvmOptions:
        heapSize: 1Gi
        heapNewGenSize: null
        gc: G1GC

This is done by leveraging Alex Dutra’s Goalesce library which gives an impressive level of control on the merge behavior, depending on the type of objects and required outcome.

Community contributions

We’re thrilled to see contributions coming from community members. In this batch of releases, we want to highlight a few of them.

Encryption stores secret configurability

In order to enable client/server encryption in K8ssandra, you have to provide the truststore, keystore and certificates through a secret.

Until this contribution, the format for the secret was hardcoded in the operator, which didn’t give enough flexibility to use secrets generated by cert-manager for example.

Thanks to github user @dnugmanov, it is now possible to make the name of the secret’s entries configurable and accept cert-manager secrets, simplifying the workflows deploying clusters and generating certificates on the fly using tools as popular as cert-manager.

The best of all is that this was made while keeping backward compatibility!

Modular Secrets Backend

K8ssandra currently relies on Kubernetes as the only secrets provider. Secrets are stored in kubernetes and mounted in the various containers as environment variables. While being a fairly standard way of managing secrets, some companies’ security policies don’t consider it compliant with their requirements. To be fair, it’s not the most secure system.

DataDog is one such company and we’re incredibly happy to have them contribute a modular secrets backend which will accommodate their requirements.

The work there will span over multiple months as it requires some deep changes in k8ssandra-operator, but it will make k8ssandra-operator more secure and allow for secrets to be injected by external secrets providers such as Hashicorp Vault, using annotations and webhooks.

The internal secrets provider will get refactored to use the same runtime injection system, to avoid complexifying the codebase and use a common way to manage secrets.

If you’re interested in following or even participating in that work, you can track this epic.

Configurable labels and annotations

As part of the Modular Secrets Backend effort, DataDog’s Steve Seidman also contributed a feature which has been awaited for a long time: the ability to place custom labels and annotations on the Kubernetes objects (deployments, statefulsets, pods, services) that k8ssandra-operator creates.

Starting with v1.5.0, you can specify those labels and annotations in the .spec.cassandra.metadata, .spec.reaper.metadata and .spec.stargate.metadata structs.

Upgrade Now

As you can see, we have been busy building features, fixing bugs and strengthening the community around K8ssandra. We invite all K8ssandra users to check the changelogs and release notes and upgrade to v1.5.0 (see our installation documentation).

Let us know what you think of K8ssandra-operator by joining us on the K8ssandra Discord or K8ssandra Forum today. For exclusive posts on all things data, follow DataStax on Medium.