K8ssandra Architecture - Monitoring
When running applications in Kubernetes, observability is key. K8ssandra includes Prometheus and Grafana for the storage and visualization of metrics associated with the Cassandra cluster.
Cassandra node-level metrics are reported in the Prometheus format, covering everything from operations per second and latency, to compaction throughput and heap usage. Examples of these metrics are shown in the Grafana dashboard below.
Let’s walk through this architecture from left to right. We’ll provide links to the Kubernetes documentation so you can dig into those concepts more if you’d like to.
The Cassandra nodes in a K8ssandra-managed cluster are organized in one or more datacenters, each of which is composed of one or more racks. Each rack represents a failure domain with replicas being placed across multiple racks (if present). In Kubernetes, racks are represented as StatefulSets. (We’ll focus here on details of the Cassandra node related to monitoring, you can see other details about Cassandra nodes such as how storage is managed on the Cassandra architecture page.)
Each Cassandra node is deployed as its own pod. The pod runs the Cassandra daemon in a Java VM. Each Apache Cassandra pod is configured with the DataStax Metrics Collector for Apache Cassandra, which is implemented as a Java agent running in that same VM. The Metrics Collector is configured to expose metrics on the standard Prometheus port (9103).
One or more Prometheus instances are deployed in another StatefulSet, with the default configuration starting with a single instance. Using a StatefulSet allows each Prometheus node to connect to a Persistent Volume (PV) for longer term storage. The default K8ssandra chart configuration does not use PVs. By default, metric data collected in the cluster is retained within Prometheus for 24 hours.
An instance of the Prometheus Operator is deployed using a Replica Set. The
kube-prometheus-stack also defines several useful Kubernetes custom resources (CRDs) that the Prometheus Operator uses to manage Prometheus. One of these is the
ServiceMonitor. K8ssandra uses
ServiceMonitor resources, specifying labels selectors to indicate the Cassandra pods to connect to in each datacenter, and how to relabel each metric as it is stored in Prometheus. K8ssandra provides a
ServiceMonitor for Stargate when it is enabled. Users may also configure
ServiceMonitors to pull metrics from the various operators, but pre-configured instances are not provided at this time.
AlertManager is an additional resource provided by
kube-prometheus-stack that can be configured to specify thresholds for specific metrics that will trigger alerts. Users may enable, and configure, AlertManager through the
values.yaml file. See the
kube-prometheus-stack example for more information.
An instance of Grafana is deployed in a Replica Set. The
GrafanaDataSource is yet another resource defined by
kube-prometheus-stack, which is used to describe how to connect to the Prometheus service. Kubernetes config maps are used to populate
GrafanaDashboard resources. These dashboards can be combined or customized.
Ingress or port forwarding can be used to expose access to the Prometheus and Grafana services external to the Kubernetes cluster.
Check out the monitoring tasks for more detailed instructions.
Next architecture topic: Repairs with Cassandra Reaper
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.