The following content shows ECK capabilities and how easy we can implement a Kubernetes Observability use case with the help of the Elastic Stack. The manifests are not intended to be used in production.
Components:
- GKE Cluster
- kube-state-metrics
- ECK (Elastic Cloud on Kubernetes) operator
- Elasticsearch Cluster with Kibana
- Elastic Agent acting as Fleet Server (Deployment)
- Elastic Agent for kubernetes node level metrics and logs (DaemonSet)
- Elastic Agent for kubernetes cluster level metrics (Deployment)
- APM Server (for APM demo)
- Example application (for APP demo)
There's also an interesing guide here which doesn't use ECK: https://www.elastic.co/guide/en/starting-with-the-elasticsearch-platform-and-its-solutions/current/getting-started-kubernetes.html
Before installing ECK on GKE ensure your user has cluster-admin permissions.
kubectl create clusterrolebinding cluster-admin-binding-gke --clusterrole=cluster-admin --user=$(gcloud auth list --filter=status:ACTIVE --format="value(account)")
- Deploy CRDs:
kubectl create -f https://download.elastic.co/downloads/eck/2.13.0/crds.yaml- Deploy Operator:
kubectl apply -f https://download.elastic.co/downloads/eck/2.13.0/operator.yamlThe following commands should be run from the resources directory of the repository:
cd resourcesKube-state-metrics is an external component used by some of the Kubernetes Integration metricsets in charge of cluster monitoring.
Installation instructions can be found here. In our case:
kubectl apply -f kube-state-metrics-v2.12.0/After the installation, KSM will be available via http://kube-state-metrics.kube-system.svc.cluster.local:8080.
Deploy Elasticsearch and Kibana:
kubectl apply -f 01_es_quickstart.yaml -f 02_kb_quickstart.yamlReview all objects created by ECK:
- Services
- Secrets
- ConfigMaps
- StatefulSets for Elasticsearch
- Deployment for Kibana
ECK creates and maintains a lot of secrets:
elasticuser password- CA certificate for HTTPS endpoints (separate secrets for Elasticsearch and Kibana)
- Configuration files
- Passwords for certain users
The elastic password can be obtained from one of the created Secrets:
kubectl get secret elasticsearch-quickstart-es-elastic-user -o=jsonpath='{.data.elastic}' | base64 --decode- Access Kibana via the Kubernetes service
kibana-quickstart-kb-http.
kubectl apply -f 10_RBAC_ElasticAgent.yamlkubectl apply -f 11_FleetServer.yamlkubectl apply -f 12_ElasticAgent-DaemonSet.yamlVerify all is running properly.
You will see that the DaemonSet agents are providing system monitoring information but nothing about Kubernetes. That's because the created policy didn't include Kubernetes integration. We will add it manually in the next step.
Open the Fleet Policy used by the DaemonSet agents and add Kubernetes integration with the following inputs configuration:
- All Kubelet API related metrics
- All
kube-state-metricsrelated metrics (Leader Election enabled). Ensure to use the right host for all the state metricsets:kube-state-metrics.kube-system.svc.cluster.local:8080 - Kubernetes Events (Leader Election enabled)
- Apiserver metrics (Leader Election enabled)
- Container logs
After applying the changes all the data should be flowing. Interesting views:
- Discover for troubleshooting, filtering data or checking available values of different fields.
- Kubernetes Overview Dashboard
- Kubernetes Pods Dashboard
- Fleet UI
- Observability UI
Note: If the cluster level metrics (with leader election) are not flowing, you might need to delete the lease so it's recreated:
kubectl delete lease elastic-agent-cluster-leaderThis could happen if an unexpected agent has acquired the lease (for example the Fleet Server).
We can add an APM Standalone Server as described in the ECK documentation.
Prerequisite: The APM Server, even when being run in standalone mode (not managed by Fleet) requires the APM Integration to be added in Kibana.
- Deploy APM Server:
kubectl apply -f 40_apm_server-8.14.1.yaml- Obtain the token for the clients connections:
kubectl get secret/apm-server-quickstart-apm-token -o go-template='{{index .data "secret-token" | base64decode}}'- Obtain the URL for the clients connections:
kubectl get service --selector='common.k8s.elastic.co/type=apm-server'Configure the URL and Token in the application (app-example/app-deployment.yaml), like:
- name: ELASTIC_APM_SERVER_URL
value: "https://apm-server-quickstart-apm-http.default.svc.cluster.local:8200"
- name: ELASTIC_APM_SECRET_TOKEN
value: "db2KXI7Z011CvE54t5DJ37nQ"Deploy the application:
kubectl apply -f apm-example/Generate some traffic:
kubectl port-forward svc/flask-counter-svc 5000:5000
# and in another window
curl localhost:5000- Deploy 2 extra clusters that will be sending monitoring data to the previously created clusters.
kubectl apply -f 20_es_prod_tiers_example.yaml
kubectl apply -f 21_es_dev_qa_example.yaml- Create a central monitoring cluster and ship logs and metrics from all clusters to this new one instead.
kubectl apply -f 30_stack_monitoring_dedicated_cluster.yaml
# Then apply changes in the other manifest to send monitoring data to this new cluster insteadIn order to delete all generated resources:
kubectl delete -f resources/
kubectl delete -f resources/apm-example
kubectl delete -f resources/kube-state-metrics-v2.12.0If we also want to uninstall ECK:
kubectl delete -f https://download.elastic.co/downloads/eck/2.13.0/operator.yaml
kubectl delete -f https://download.elastic.co/downloads/eck/2.13.0/crds.yaml- Elastic agent possible issues:
Check for lease adquisition, needed for the data streams that require a leader election.
k logs elastic-agent-quickstart-agent-8gbgt | grep -i lease
I0703 14:19:14.082880 996 leaderelection.go:248] attempting to acquire leader lease default/elastic-agent-cluster-leader...
I0703 14:19:42.378425 996 leaderelection.go:258] successfully acquired lease default/elastic-agent-cluster-leaderCheck that all expected metricsets are being provided, for example state_*:
{"log.level":"info","@timestamp":"2024-07-03T14:52:09.977Z","message":"Non-zero metrics in the last 30s","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"kubernetes/metrics-default","type":"kubernetes/metrics"},"log":{"source":"kubernetes/metrics-default"},"log.logger":"monitoring","log.origin":{"file.line":187,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"metricbeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":0}}}},"cpu":{"system":{"ticks":780,"time":{"ms":140}},"total":{"ticks":5400,"time":{"ms":1060},"value":5400},"user":{"ticks":4620,"time":{"ms":920}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":22},"info":{"ephemeral_id":"6c01ac15-4c51-41a5-8d41-8872aee5ae7d","uptime":{"ms":631602},"version":"8.14.1"},"memstats":{"gc_next":98676648,"memory_alloc":53293520,"memory_sys":26364944,"memory_total":742908704,"rss":228708352},"runtime":{"goroutines":314}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":23}},"output":{"events":{"acked":1872,"active":0,"batches":3,"duplicates":2,"total":1874},"read":{"bytes":49958,"errors":3},"write":{"bytes":578975,"latency":{"histogram":{"count":56,"max":981,"mean":103.03571428571429,"median":67,"min":49,"p75":83.75,"p95":287.5499999999994,"p99":981,"p999":981,"stddev":145.7396117696747}}}},"pipeline":{"clients":23,"events":{"active":211,"published":1854,"total":1854},"queue":{"acked":1874}}},"metricbeat":{"kubernetes":{"apiserver":{"events":993,"success":993},"container":{"events":57,"success":57},"node":{"events":3,"success":3},"pod":{"events":33,"success":33},"proxy":{"events":18,"success":18},"state_container":{"events":201,"success":201},"state_daemonset":{"events":81,"success":81},"state_deployment":{"events":36,"success":36},"state_namespace":{"events":24,"success":24},"state_node":{"events":9,"success":9},"state_persistentvolume":{"events":9,"success":9},"state_persistentvolumeclaim":{"events":9,"success":9},"state_pod":{"events":108,"success":108},"state_replicaset":{"events":45,"success":45},"state_resourcequota":{"events":18,"success":18},"state_service":{"events":42,"success":42},"state_statefulset":{"events":9,"success":9},"state_storageclass":{"events":9,"success":9},"system":{"events":9,"success":9},"volume":{"events":141,"success":141}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.42,"15":0.65,"5":0.44,"norm":{"1":0.105,"15":0.1625,"5":0.11}}}}},"ecs.version":"1.6.0"}
There's an issue where Elastic Agent might fail to update the token information on the state directory (mounted as a hostPath volume).
- Divide the Kubernetes Observability workload in a DaemonSet + Deployment:
- DaemonSet for node level metrics and logs
- Deployment for cluster level metrics (kube-state-metrics related metrics, events, api-server, etc.)
That would require 2 different policies, both with Kubernetes integration, but each of them configured with a different set of inputs.
- Agents showing with the real hostname on Fleet UI instead of the pod name (for the DaemonSet).