What are Kubernetes Monitoring Tools?

Question

Accepted Answer

Kubernetes monitoring tools help you gain visibility into your containers, pods, and clusters. Monitoring tools help you ensure reliability and troubleshoot issues as they occur, monitor and enforce security, manage costs, fine-tune performance, and minimize chargebacks.

Monitoring is especially important in a containerized environment, because resources are ephemeral, the environment is complex, and it can be difficult to identify and troubleshoot problems.

Traditional monitoring tools are typically not effective in a containerized environment. A new generation of cloud native monitoring tools has emerged—these tools can be deployed as part of Kubernetes clusters and are able to gather relevant metrics from across the Kubernetes environment.

In this article, you will learn:
5 Reasons Kubernetes Monitoring is Important
Top 6 Open-Source Kubernetes Monitoring Tools
Kubernetes Dashboard
Prometheus
Jaeger
Elastic Stack (ELK)
kubewatch
cAdvisor
Enterprise-Grade Kubernetes Monitoring and Observability with Tigera
5 Reasons Kubernetes Monitoring is Important
Here are several reasons your organization needs a robust monitoring strategy for Kubernetes:

Reliability and troubleshooting – Kubernetes applications, especially those that use cloud-native or microservices architectures, can be particularly complex. If issues occur, it can be difficult to pinpoint the cause of the problem. With proper Kubernetes monitoring you can see where problems occur or are about to occur, and access data that can help you take action to prevent or fix the issue.
Kubernetes performance tuning – Understanding what’s happening in your Kubernetes cluster can help you optimize hardware without compromising application performance.
Cost management – When running Kubernetes on a public cloud infrastructure, it is important to keep track of how many nodes (compute instances) you are running, because this number will determine your hourly cost. Even if you are not running on a public cloud, it is important to know if your resources are exhausted or underutilized.
Chargebacks – In some cases, you may want to know which groups are using which resources. Kubernetes monitoring can provide insights into usage statistics, which you can leverage to analyze chargebacks and showbacks or perform a general Kubernetes cost analysis.
Security – In today’s threat environment, it is critical to know what is running and where, discover pods, containers or jobs that should not exist, and look for malicious ingress and egress traffic. Kubernetes monitoring is an essential part of a container security strategy.
Top 6 Open-Source Kubernetes Monitoring Tools
The following open-source tools are at the forefront of cloud-native monitoring technology. Let’s briefly review their features and capabilities.

Kubernetes Dashboard

Image Source: Kubernetes.io
The Kubernetes dashboard is the primary web-based user interface for monitoring Kubernetes. It provides a reliable way to visualize important information from the containers and pods running in your clusters. It is an integral part of the Kubernetes environment, allowing you to view and handle all aspects of monitoring a Kubernetes cluster.

The Kubernetes dashboard provides metrics and visualizations of:

Deployment of applications into pods
Applications running in pods
Issues with applications running in pods
Resource utilization for Kubernetes pods
It also allows you to make changes to the Kubernetes environment:

Modify the amount of resources used by the cluster
Change and update the state of container resources in a cluster
GitHub repo: http://github.com/kubernetes/dashboard

Prometheus

Image Source: Prometheus
Prometheus is a popular open-source monitoring tool for Kubernetes. Prometheus was originally a SoundCloud venture and is now one of only a few Graduated Projects managed by the Cloud Native Computing Foundation (CNCF). The tool has evolved into a standard for monitoring Kubernetes. Prometheus retrieves resource metrics using time series from specific endpoints.

Prometheus is divided into three components that perform different tasks: the Prometheus server, AlertManager, and exporters. The Prometheus server handles service deployment, extracts metrics from exporters, and stores them in a database for monitoring. AlertManager is used to set up alerts and send notifications when certain triggers are activated. Exporters are independent containers that use APIs to create and export metrics.

Other features of Prometheus include:

A multi-dimensional data model allowing users to access data in time series format through identifiers and key/value pairs.
Query language called PromQL that can be used to analyze the multi-dimensional data mentioned above.
No need for distributed storage.
Collects time series data using a pull model over HTTP.
Several modes are available with different types of graphs and dashboards.
GitHub repo: http://github.com/prometheus/prometheus

Jaeger

Image Source: Jaeger
Jaeger is an end-to-end distributed tracing solution that was open sourced by Uber Engineering, and is currently an incubating project with the CNCF. It lets you monitor and troubleshoot transactions in complex distributed systems. In modern microservices architectures, most operational issues are within the scope of networking and observability.

If there is a service failure, you cannot determine how requests passed from one service to another over the network when completing a single business transaction. This makes debugging very difficult. Jaeger uses tracing to analyze root cause, optimize performance and latency, and monitor distributed transactions. Jaeger works with Istio, a popular service mesh implementation open sourced by Google.

GitHub repo: http://github.com/jaegertracing/jaeger-kubernetes

Elastic Stack (ELK)

Image Source: Elastic
The ELK stack is a popular open-source solution for enterprise search and log management, which can handle Kubernetes logs as well. You can use it for both monitoring and log management.

ELK consists of a set of three tools and one data collection agent:

Elasticsearch is a NoSQL database and analytics engine that can store any type of logs, including Kubernetes logs.
Logstash is used to capture and process logs before sending them to Elasticsearch.
Kibana is the dashboard component, which lets you visualize log data (the screenshot above is a Kibana dashboard).
Beats are agents that can be deployed on Kubernetes infrastructure, and are used to send logs and metrics for processing.
ELK has beats that support Kubernetes and Docker, with auto-discovery capabilities. These beats help you monitor applications and system-level performance by collecting many types of logs and metrics.

GitHub repo: http://github.com/deviantony/docker-elk

Related content: Read our guide to Kubernetes logging.

kubewatch
kubewatch helps you keep track of certain Kubernetes events. It then sends notifications to collaboration tools like PagerDuty and Slack. This tool looks for changes occurring to specific pre-specified Kubernetes resources, such as pods, DaemonSets, deployments, ReplicaSets, replication controllers, secrets, configuration maps, and services. kubewatch is easy to configure and can be deployed manually, or automatically via Helm charts.

GitHub repo: http://github.com/bitnami-labs/kubewatch

cAdvisor
The cAdvisor agent can help you collect, process, and export information about containers running in your environment. cAdvisor is deployed on the node level, not per pod. It can auto-discover all containers running on a particular machine and collect system metrics like CPU, network, and memory.

cAdvisor is offered as a built-in, native monitoring feature of Kubernetes. It is also integrated into the kubelet binary, meaning it exists by default on every Kubernetes node. Another advantage of cAdvisor is that it exposes Prometheus metrics out of the box, making it easy to use. However, it does not offer robust functionality in comparison to end-to-end monitoring solutions.

cAdvisor also supports workloads running outside Docker. It shows resource usage of containers and can display metrics on a web-based user interface.

GitHub repo: http://github.com/google/cadvisor

Enterprise-Grade Kubernetes Monitoring and Observability with Calico
Open-source tools are a great way to start your monitoring journey, but they have their limitations. Calico Cloud and Calico Enterprise offer the following advanced features for Kubernetes observability, which go beyond open-source, cloud-native monitoring tools:

Dynamic Service Graph – A point-to-point, topographical representation of traffic flow and policy that shows how workloads within the cluster are communicating, and across which namespaces. Also includes advanced capabilities to filter resources, save views, and troubleshoot service issues.
DNS Dashboard – Helps accelerate DNS-related troubleshooting and problem resolution in Kubernetes environments by providing an interactive UI with exclusive DNS metrics.
L7 Dashboard – Provides a high-level view of HTTP communication across the cluster, with summaries of top URLs, request duration, response codes, and volumetric data for each service.
Dynamic Packet Capture – Captures packets from a specific pod or collection of pods with specified packet sizes and duration, in order to troubleshoot performance hotspots and connectivity issues faster.
Application-level Observability – Provides a centralized, all-encompassing view of service-to-service traffic in the Kubernetes cluster to detect anomalous behavior like attempts to access applications or restricted URLs, and scans for particular URLs.
Unified Controls – A single, unified management plane provides a centralized point-of-control for unified security and observability on multiple clouds, clusters, and distros. Users can monitor and observe across environments with a single pane of glass.

Kubernetes Monitoring Tools

6 Great Kubernetes Monitoring Tools and Why You Need Them