Monitoring Mirror Maker and kafka connect cluster
The goal of this note is to go over some of the details on how to monitor Mirror Maker 2.0 metrics using Prometheus and present them with Grafana dashboards.
Prometheus is an open source systems monitoring and alerting toolkit that, with Kubernetes, is part of the Cloud Native Computing Foundation. It can monitor multiple workloads but is normally used with container workloads.
The following figure presents the prometheus generic architecture as described from the product main website. Basically the Prometheus server hosts jobs to poll HTTP end points to get metrics from the components to monitor. It supports queries in the format of PromQL
, that product like Grafana can use to present nice dashboards, and it can push alerts to different channels when some metrics behave unexpectedly.
In the context of data replication between kafka clusters, we want to monitor the mirror maker 2.0 metrics like the worker task states, source task metrics, task errors,... The following figure illustrates the components involved: The source Kafka cluster, the Mirror Maker 2.0 cluster, which is based on Kafka Connect, the Prometheus server and the Grafana.
As all those components run on kubernetes, most of them could be deployed via Operators using Custom Resource Definitions.
To support this monitoring we need to do the following steps:
- Add metrics configuration to your Mirror Maker 2.0 cluster
- Package the mirror maker 2 to use JMX Exporter as Java agent so it exposes JMX MBeans as metrics accessibles via HTTP.
- Deploy Prometheus using Operator
- Optionally deploy Prometheus Alertmanager
- Expose MirrorMaker 2 API as external route
- Configure Prometheus to access MirrorMaker metrics
- Deploy Grafana and configure dashboard
For Kafka monitoring study we recommend reading this article from Ana Giordano.
Installation and configuration
Prometheus deployment inside Kubernetes uses operator as defined in the coreos github. The CRDs define a set of resources: the ServiceMonitor, PodMonitor, and PrometheusRule.
Inside the Strimzi github repository, we can get a prometheus.yml file to deploy prometheus server using the Prometheus operator. This configuration defines, ClusterRole, ServiceAccount, ClusterRoleBinding, and the Prometheus resource instance. We have defined our own configuration in this file.
For your own deployment you have to change the target namespace, and the rules
You need to deploy Prometheus and all the other elements inside the same namespace or OpenShift project as the Kafka Cluster or the Mirror Maker 2 Cluster.
To be able to monitor your own on-premise Kafka cluster, you need to enable Prometheus metrics. An example of Kafka cluster Strimzi based deployment with Prometheus setting can be found in our kafka cluster definition. The declarations are under the metrics
stanza and define the rules for exposing the Kafka core features.
Install Prometheus Operator
We recommend reading Prometheus operator product documentation.
At a glance the Prometheus operator deploy and manage a prometheus server and watches new pods to monitor when they are scheduled within k8s.
Source: prometheus-operator architecture
After creating a namespace or reusing the Kafka cluster namespace, you need to deploy the Prometheus operator and the related service account, cluster role, role binding... We have reuse the monitoring/install/bundle.yaml from Prometheus operator github, but doing updates with a namespace sets for our project (e.g jb-kafka-strimzi
) and renaming the cluster role and binding from 'prometheus-operator' to 'prometheus-operator-strimzi` to avoid role conflict with existing prometheus deployment on OpenShift, as those roles are at the cluster level. Once done we deploy all those components:
oc apply -f bundle.yaml
# Authorise the prometheus-operator to do cluster work
oc adm policy add-cluster-role-to-user prometheus-operator --serviceaccount prometheus-operator -n eda-strimzi-kafka24
When you apply those configurations, the following resources are visibles:
Resource | Description |
---|---|
ClusterRole | RBAC role for cluster-scoped resources. To grant permissions to Prometheus to read the health endpoints exposed by the Kafka and ZooKeeper pods, cAdvisor and the kubelet for container metrics. |
ServiceAccount | For the Prometheus pods to run under. A service account provides an identity for processes that run in a Pod. |
ClusterRoleBinding | To bind the ClusterRole to the ServiceAccount. |
Deployment | To manage the Prometheus Operator pod. |
ServiceMonitor | To define the service to monitor with the Prometheus pod. |
Prometheus | To manage the configuration of the Prometheus pod. |
PrometheusRule | To manage alerting rules for the Prometheus pod. |
Secret | To manage additional Prometheus settings. |
Service | To allow applications running in the cluster to connect to Prometheus (for example, Grafana using Prometheus as datasource) |
- To delete the operator do:
oc delete -f bundle.yaml
Deploy prometheus
Note
The following section is including the configuration of a Prometheus server monitoring a full Kafka Cluster. For Mirror Maker 2 or Kafka Connect monitoring, the configuration will have less rules, and parameters. See next section.
Deploy the prometheus server by first changing the namespace and also by adapting the original examples/metrics/prometheus-install/prometheus.yaml file.
curl -s https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/examples/metrics/prometheus-install/prometheus.yaml | sed -e "s/namespace: myproject/namespace: eda-strimzi-kafka24/" > prometheus.yml
If you are using AlertManager (see section below) Define the monitoring rules of the kafka run time: KafkaRunningOutOfSpace, UnderReplicatedPartitions, AbnormalControllerState, OfflinePartitions, UnderMinIsrPartitionCount, OfflineLogDirectoryCount, ScrapeProblem (Prometheus related alert), ClusterOperatorContainerDown, KafkaBrokerContainersDown, KafkaTlsSidecarContainersDown
curl -s
https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/examples/metrics/prometheus-install/prometheus-rules.yaml sed -e "s/namespace: default/namespace: jb-kafka-strimzi/" > prometheus-rules.yaml
oc apply -f prometheus-rules.yaml oc apply -f prometheus.yaml # once deploye, get the state of the server with oc get prometheus NAME VERSION REPLICAS AGE prometheus 1 52s
The Prometheus server configuration uses service discovery to discover the pods (Mirror Maker 2.0 pod or kafka, zookeeper pods) in the cluster from which it gets metrics. In fact the following configuration is set in prometheus.yaml
file. The approach is to deploy one Prometheus server instance per namespace where multiple applications are running. The app label needs to be set on all components to be monitored.
serviceMonitorSelector: matchLabels: app: strimzi
or use a monitor all approach:
serviceMonitorSelector: {}
Monitoring rules can be added via config map that is referenced in the prometheus.yaml
file as :
additional-scrape-configs.
additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml
Access the expression browser
To access from web browser we can expose the prometheus server via a route using the service operator
defined in the prometheus.yaml
file:
apiVersion: v1 kind: Service metadata: labels: prometheus: prometheus name: prometheus-operated namespace: eda-strimzi-kafka24 spec: ports: - name: web port: 9090 targetPort: web selector: app: strimzi prometheus: prometheus sessionAffinity: ClientIP
Using the OpenShift console, we can add a route to this service and then access it:
http://prometheus-route-eda-strimzi-kafka24.gse-eda-demo-43-f......us-east.containers.appdomain.cloud/graph
Configure monitoring
To start monitoring our Kafka 2.4 cluster we need to add some monitoring prometheus scrapper definitions, named service monitor. An example of such file can be found here
oc apply -f strimzi-service-monitor.yaml oc describe servicemonitor # the results will look as a list of ServiceMonitor configurations Name: kafka-service-monitor Namespace: eda-strimzi-kafka24 Labels: app=strimzi ...
MirrorMaker 2 monitoring
To monitor MirrorMaker 2 we need to:
- expose the MirrorMaker with an external route so we can validate
/metrics
are available. - define the following yaml file which defines a new Prometheus scrapper job with the URL of the exposed MirrorMaker instance on the prometheus port.
- job_name: 'MirrorMaker2' static_configs: - targets: - mm2-cluster-mirrormaker2-api.eda-strimzi-kafka24.svc.cluster.local:9404
and then configure the secret:
oc create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml
Prometheus should display the mirror maker metrics:
Grafana
Grafana provides visualizations of Prometheus metrics. Again we will use the Strimzi dashboard definition as starting point to monitor Kafka cluster but also mirror maker.
- Deploy Grafana to OpenShift and expose it via a service:
oc apply -f grafana.yaml
In case you want to test grafana locally run: docker run -d -p 3000:3000 grafana/grafana
Configure Grafana dashboard
To access the Grafana portal you can use port forwarding like below or expose a route on top of the grafana service.
- Use port forwarding:
export PODNAME=$(oc get pods -l name=grafana | grep grafana | awk '{print $1}') kubectl port-forward $PODNAME 3000:3000
Point your browser to http://localhost:3000.
- Expose the route via cli
Add the Prometheus data source with the URL of the exposed routes. http://prometheus-operated:9090
Alert Manager
As seen in previous section, when deploying prometheus we can set some alerting rules on elements of the Kafka cluster. Those rule examples are in the file The prometheus-rules.yaml
. Those rules are used by the AlertManager component.
Prometheus Alertmanager is a plugin for handling alerts and routing them to a notification service, like Slack. The Prometheus server is a client to the Alert Manager.
- Download an example of alert manager configuration file
curl -s https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/examples/metrics/prometheus-install/alert-manager.yaml > alert-manager.yaml
- Define a configuration for the channel to use, by starting from the following template
curl -s https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/master/examples/metrics/prometheus-alertmanager-config/alert-manager-config.yaml > alert-manager-config.yaml
-
Modify this file to reflect the remote access credential and URL to the channel server.
-
Then deploy the secret that matches your config file .
oc create secret generic alertmanager-alertmanager --from-file=alertmanager.yaml=alert-manager-config.yaml
oc create secret generic additional-scrape-configs --from-file=./local-cluster/prometheus-additional.yaml --dry-run -o yaml | kubectl apply -f -