Skip to content

Commit 0459075

Browse files
mrnicegyu11YuryHrytsuksanderegg
authored
✨ Add: Grafana tempo (#910)
* wip * Add csi-s3 and have portainer use it * Change request @Hrytsuk 1GB max portainer volume size * Fix wrong filename * Fix registry local deploy * Traefik local deployment fixes * Fix local deployment graylog provisioning * Fix j2, double venv * Add python version * Idempotency for admin-panels * Remove faulty command * Local deploy fixes * Clean Up Local Minio * init work * Remove unused code * Update Minio * Arch Linux Certificates Customization * Add grafana terrform tooling * Make osparc-config dotenv-precommit pass: Use all caps env-vars * Refactoring: jinja2 takes .env file path as explicit argument (like in osparc-config) * Make CI_ENV_FILE vailable in makefile * Refactor makefile targets * Add grafana terraform gitignore * Rename envvar: TF_STATE_S3_GRAFANAKEY * Remove old scripts, makefile targets * Remove unused files * undue arch style commit * Remove references to Tempo * CHange request YH: Stop trying tor ecah grafana eventually * Change request YH: Move tf scripts to terraform folder * Change request YH: stricter check * Add files remove typo * Add terraform fmt pre-commit hook * Use ansible.env file in lieu of ci.env if available * Rename and refactor * wip * wip * remove line * Makefile repo base dir without git * Grafana terraform ceph fixes * Fix indentation * Add manual to traefik redirect capture all rule (#933) * Introduce rolling docker config / secret update concept 🎉 🚀 (#952) * fixes * update comment * Update traefik router hardcoded priorities (#953) * Update traefik router hardcoded priorities * remove hardcoded priority from adminpanels * Configure redis replicas via ENV (#957) * Filestash: remove special docker node label (#959) * rabbit: configurable replicas (#964) * rabbit: configurable replicas * clean up * 💄 minor: Change DNS Server to Quad9 (#967) * wip * Add csi-s3 and have portainer use it * Change request @Hrytsuk 1GB max portainer volume size * Arch Linux Certificates Customization * Change DNS server for aws to swiss privacy focused one * revert wrong commit --------- Co-authored-by: Dustin Kaiser <[email protected]> * single replica (#968) * Remove docker api proxy from validate simcore settings (#972) * Add appmotiongateway add dalco * Add appmotiongateway add dalco - 2 * Add appmotiongateway add dalco - 3 * Seperate dalco-staging: disable redis special handling (#976) * wip * Add csi-s3 and have portainer use it * Change request @Hrytsuk 1GB max portainer volume size * Arch Linux Certificates Customization * Remove dalco special staging handling * remove accidental commit * remove accidental commit * Remove dalco staging special handling --------- Co-authored-by: Dustin Kaiser <[email protected]> * Fix deploy ops failure * Make curl in ensure_grafana_online_ timeout after 10s * Timeout in wait_graylog_is_online * Fix osparc.local pydantic validation failure director-v0 * Move create tempo bucket function to monitoring stack makefile * wip * fix faulty commit * Add tempo as exporter target to otlp collector --------- Co-authored-by: Dustin Kaiser <[email protected]> Co-authored-by: Yury Hrytsuk <[email protected]> Co-authored-by: Sylvain <[email protected]>
1 parent 245882f commit 0459075

File tree

10 files changed

+127
-14
lines changed

10 files changed

+127
-14
lines changed

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ docs/_build
129129
/services/monitoring/pgsql_query_exporter_config.yaml
130130
/services/monitoring/docker-compose.yml
131131
/services/monitoring/smokeping_prober_config.yaml
132-
132+
services/monitoring/tempo_config.yaml
133133

134134
# Simcore: Contains location of repo.config file on the machine and of the whole config directory
135135
.config.location

Makefile

-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@ down-maintenance: ## Stop the maintenance mode
7171
fi \
7272
,)
7373

74-
7574
# Misc: info & clean
7675
.PHONY: info info-vars info-local
7776
info: ## Displays some important info

services/jaeger/opentelemetry-collector-config.yaml

+5-1
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,15 @@ receivers:
88
exporters:
99
otlphttp:
1010
endpoint: ${TRACING_OPENTELEMETRY_COLLECTOR_EXPORTER_ENDPOINT} # Adjust to your Jaeger endpoint
11+
otlp:
12+
endpoint: http://tempo:4317
13+
tls:
14+
insecure: true
1115
service:
1216
pipelines:
1317
traces:
1418
receivers: [otlp]
15-
exporters: [otlphttp]
19+
exporters: [otlphttp,otlp]
1620
processors: [batch,probabilistic_sampler,filter/drop_healthcheck]
1721
telemetry:
1822
logs:

services/monitoring/Makefile

+22-8
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,24 @@ REPO_BASE_DIR := $(abspath $(dir $(abspath $(lastword $(MAKEFILE_LIST))))../..)
99
# TARGETS --------------------------------------------------
1010
include ${REPO_BASE_DIR}/scripts/common.Makefile
1111

12+
define create-s3-bucket
13+
# ensure bucket is available in S3...
14+
@set -o allexport; \
15+
source .env; \
16+
echo Creating bucket "$${TEMPO_S3_BUCKET}";\
17+
${REPO_BASE_DIR}/scripts/create-s3-bucket.bash "$${TEMPO_S3_BUCKET}" && \
18+
set +o allexport; \
19+
# bucket is available in S3
20+
endef
21+
1222
.PHONY: up
1323
up: .init .env config.prometheus ${TEMP_COMPOSE} ## Deploys or updates current stack "$(STACK_NAME)". If MONITORED_NETWORK is not specified, it will create an attachable network
1424
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE} $(STACK_NAME)
1525
$(MAKE) grafana-import
1626

1727
.PHONY: up-local
1828
up-local: .init .env config.prometheus.simcore ${TEMP_COMPOSE}-local ## Deploys or updates current stack "$(STACK_NAME)". If MONITORED_NETWORK is not specified, it will create an attachable network
29+
@$(create-s3-bucket)
1930
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE}-local $(STACK_NAME)
2031
$(MAKE) grafana-import
2132

@@ -49,28 +60,28 @@ up-master: .init .env config.monitoring config.prometheus.ceph.simcore ${TEMP_C
4960
@docker stack deploy --with-registry-auth --prune --compose-file ${TEMP_COMPOSE}-master ${STACK_NAME}
5061
$(MAKE) grafana-import
5162

52-
${TEMP_COMPOSE}: docker-compose.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
63+
${TEMP_COMPOSE}: docker-compose.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
5364
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< > $@
5465

55-
${TEMP_COMPOSE}-letsencrypt-http: docker-compose.yml docker-compose.letsencrypt.http.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
66+
${TEMP_COMPOSE}-letsencrypt-http: docker-compose.yml docker-compose.letsencrypt.http.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
5667
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.http.yml > $@
5768

58-
${TEMP_COMPOSE}-letsencrypt-dns: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
69+
${TEMP_COMPOSE}-letsencrypt-dns: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
5970
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.dns.yml > $@
6071

61-
${TEMP_COMPOSE}-dalco: docker-compose.yml docker-compose.dalco.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
72+
${TEMP_COMPOSE}-dalco: docker-compose.yml docker-compose.dalco.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
6273
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.dalco.yml > $@
6374

64-
${TEMP_COMPOSE}-public: docker-compose.yml docker-compose.public.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
75+
${TEMP_COMPOSE}-public: docker-compose.yml docker-compose.public.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
6576
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.public.yml > $@
6677

67-
${TEMP_COMPOSE}-aws: docker-compose.yml docker-compose.aws.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
78+
${TEMP_COMPOSE}-aws: docker-compose.yml docker-compose.aws.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
6879
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.aws.yml > $@
6980

70-
${TEMP_COMPOSE}-master: docker-compose.yml docker-compose.master.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
81+
${TEMP_COMPOSE}-master: docker-compose.yml docker-compose.master.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
7182
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.master.yml > $@
7283

73-
${TEMP_COMPOSE}-local: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml
84+
${TEMP_COMPOSE}-local: docker-compose.yml docker-compose.letsencrypt.dns.yml config.monitoring .env pgsql_query_exporter_config.yaml smokeping_prober_config.yaml tempo_config.yaml
7485
@${REPO_BASE_DIR}/scripts/docker-stack-config.bash -e .env $< docker-compose.letsencrypt.dns.yml > $@
7586

7687
docker-compose.yml: docker-compose.yml.j2 .env .venv pgsql_query_exporter_config.yaml
@@ -137,6 +148,9 @@ pgsql_query_exporter_config.yaml: pgsql_query_exporter_config.yaml.j2 ${REPO_CON
137148
smokeping_prober_config.yaml: smokeping_prober_config.yaml.j2 ${REPO_CONFIG_LOCATION} .env .venv
138149
$(call jinja, $<, .env, $@);
139150

151+
tempo_config.yaml: tempo_config.yaml.j2 ${REPO_CONFIG_LOCATION} .env .venv
152+
$(call jinja, $<, .env, $@);
153+
140154
.PHONY: grafana/assets
141155
grafana/assets: ${REPO_CONFIG_LOCATION}
142156
@$(MAKE_C) grafana assets

services/monitoring/docker-compose.yml.j2

+26
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ networks:
1717
configs:
1818
alertmanager_config:
1919
file: ./alertmanager/config.yml
20+
tempo_config:
21+
file: ./tempo_config.yaml
2022
node_exporter_entrypoint:
2123
file: ./node-exporter/docker-entrypoint.sh
2224
prometheus_config:
@@ -398,3 +400,27 @@ services:
398400
reservations:
399401
memory: 32M
400402
cpus: "0.1"
403+
tempo:
404+
image: grafana/tempo:2.6.1
405+
command: "-target=scalable-single-binary -config.file=/etc/tempo.yaml"
406+
configs:
407+
- source: tempo_config
408+
target: /etc/tempo.yaml
409+
networks:
410+
- monitored
411+
deploy:
412+
labels:
413+
- traefik.enable=true
414+
- traefik.docker.network=${PUBLIC_NETWORK}
415+
- traefik.http.services.tempo.loadbalancer.server.port=9095
416+
- traefik.http.routers.tempo.rule=Host(`${MONITORING_DOMAIN}`) && PathPrefix(`/tempo`)
417+
- traefik.http.routers.tempo.priority=10
418+
- traefik.http.routers.tempo.entrypoints=https
419+
- traefik.http.routers.tempo.tls=true
420+
- traefik.http.middlewares.tempo_replace_regex.replacepathregex.regex=^/tempo/?(.*)$$
421+
- traefik.http.middlewares.tempo_replace_regex.replacepathregex.replacement=/$${1}
422+
- traefik.http.routers.tempo.middlewares=ops_whitelist_ips@swarm, ops_gzip@swarm, tempo_replace_regex
423+
resources:
424+
limits:
425+
memory: 2000M
426+
cpus: "2.0"

services/monitoring/grafana/terraform/datasources.tf

+8
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,11 @@ resource "grafana_data_source" "prometheuscatchall" {
1515
is_default = false
1616
uid = "RmZEr52nz"
1717
}
18+
19+
resource "grafana_data_source" "tempo" {
20+
type = "tempo"
21+
name = "tempo"
22+
url = var.TEMPO_URL
23+
basic_auth_enabled = false
24+
is_default = false
25+
}

services/monitoring/grafana/terraform/main.tf.j2

+3-3
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,10 @@ terraform {
1717
skip_credentials_validation = true
1818
skip_requesting_account_id = true
1919
skip_metadata_api_check = true
20-
skip_region_validation = true
21-
skip_s3_checksum = true
20+
skip_region_validation = true
21+
skip_s3_checksum = true
2222
use_path_style = true
23-
endpoints = {
23+
endpoints = {
2424
s3 = "{{ GRAFANA_TERRAFORM_STATE_BACKEND_S3_ENDPOINT }}"
2525
}
2626
{% endif %}

services/monitoring/grafana/terraform/variables.tf

+4
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@ variable "GRAFANA_URL" {
22
description = "grafana_url"
33
sensitive = false
44
}
5+
variable "TEMPO_URL" {
6+
description = "tempo_url"
7+
sensitive = false
8+
}
59
variable "GRAFANA_AUTH" {
610
description = "Username:Password"
711
sensitive = true

services/monitoring/template.env

+6
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,9 @@ MONITORING_PROMETHEUS_PGSQL_GID_MONITORED=${MONITORING_PROMETHEUS_PGSQL_GID_MONI
2121
MONITORING_PROMETHEUS_SMOKEPING_TARGETS=${MONITORING_PROMETHEUS_SMOKEPING_TARGETS}
2222
PUBLIC_NETWORK=${PUBLIC_NETWORK}
2323
MONITORED_NETWORK=${MONITORED_NETWORK}
24+
TEMPO_S3_BUCKET=${TEMPO_S3_BUCKET}
25+
STORAGE_DOMAIN=${STORAGE_DOMAIN}
26+
S3_REGION=${S3_REGION}
27+
S3_ACCESS_KEY=${S3_ACCESS_KEY}
28+
S3_SECRET_KEY=${S3_SECRET_KEY}
29+
TF_VAR_PROMETHEUS_CATCHALL_URL=${TF_VAR_PROMETHEUS_CATCHALL_URL}
+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
server:
2+
http_listen_port: 3200
3+
4+
distributor:
5+
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
6+
otlp:
7+
protocols:
8+
http:
9+
grpc:
10+
11+
#ingester:
12+
# max_block_duration: 5m # cut the headblock when this much time passes. this should probably be left alone normally
13+
14+
compactor:
15+
compaction:
16+
block_retention: 96h # overall Tempo trace retention.
17+
18+
metrics_generator:
19+
registry:
20+
external_labels:
21+
source: tempo
22+
cluster: {{ MACHINE_FQDN }}
23+
storage:
24+
path: /var/tempo/generator/wal
25+
remote_write:
26+
- url: {{ TF_VAR_PROMETHEUS_CATCHALL_URL }}/api/v1/write
27+
28+
storage:
29+
trace:
30+
backend: s3 # backend configuration to use
31+
wal:
32+
path: /var/tempo/wal # where to store the wal locally
33+
s3:
34+
bucket: {{ TEMPO_S3_BUCKET }} # how to store data in s3
35+
endpoint: {{STORAGE_DOMAIN}}
36+
region: {{S3_REGION}}
37+
access_key: {{S3_ACCESS_KEY}}
38+
secret_key: {{S3_SECRET_KEY}}
39+
insecure: false
40+
tls_insecure_skip_verify: true
41+
# For using AWS, select the appropriate regional endpoint and region
42+
# endpoint: s3.dualstack.us-west-2.amazonaws.com
43+
# region: us-west-2
44+
45+
querier:
46+
frontend_worker:
47+
frontend_address: localhost:9095
48+
49+
overrides:
50+
defaults:
51+
metrics_generator:
52+
processors: ['service-graphs', 'span-metrics']

0 commit comments

Comments
 (0)