Skip to content

Commit d0bed5f

Browse files
authored
[minor] Automated Job cleanup when auto_delete: false is set (#257)
https://jsw.ibm.com/browse/MASCORE-5637
1 parent a457194 commit d0bed5f

37 files changed

+1261
-504
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ site
77
.venv
88
.DS_Store
99
build/bin/awktest.sh
10+
.venv

CONTRIBUTING.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,27 @@
11
Contributing to MAS Gitops
22
===============================================================================
33

4+
5+
Documentation
6+
-------------------------------------------------------------------------------
7+
8+
9+
Versioned documentation is published automatically here: [https://ibm-mas.github.io/gitops/](https://ibm-mas.github.io/gitops/).
10+
Documentation source is located in the `docs` folder.
11+
12+
To view your local documentation updates before pushing to git, run the following:
13+
14+
```
15+
python3.9 -m venv .venv
16+
source .venv/bin/activate
17+
pip install --upgrade pip
18+
pip install mkdocs
19+
pip install mkdocs-redirects
20+
pip install mkdocs-macros-plugin
21+
pip install mkdocs-drawio-file
22+
mkdocs serve
23+
```
24+
425
Pre-Commit Hooks
526
-------------------------------------------------------------------------------
627

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ Documentation
88
[https://ibm-mas.github.io/gitops/](https://ibm-mas.github.io/gitops/)
99

1010
[https://github.com/ibm-mas/gitops-demo/tree/002](https://github.com/ibm-mas/gitops-demo/tree/002)
11+

build/bin/verify-job-definitions.sh

Lines changed: 51 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,9 @@ Job name accordingly:
1818
- The \$_job_config_values constant is defined
1919
- The \$_job_version constant is defined
2020
- The \$_job_hash constant is defined and has the correct value
21-
- The \$_job_name constant is defined and has the correct value
22-
- The \$_job_name constant is used as the name of the Job
21+
- The \$_job_name constant is defined, has the correct value and is used as the name of the Job
22+
- The \$_job_cleanup_group is constant defined and assigned to the mas.ibm.com/job-cleanup-group Job label
23+
- each template file contains only a single Job definition
2324
2425
[PATH]... can be either:
2526
- A single directory: the script will check all files under this directory (recursive)
@@ -127,7 +128,7 @@ for file in ${files}; do
127128
done <<< "$(sed -En 's/.*quay\.io\/ibmmas\/cli:(.*)/\1/p' $file)"
128129

129130

130-
# Experimental: attempt to dynamically detect if we can relax job naming restrictions for this file
131+
# Attempt to dynamically detect if we can relax job naming restrictions for this file
131132
# The following awk commands exits 0 if and only if:
132133
# - File does not contain a Job resource
133134
# Jobs are currently the only resource we use where immutability of the image field is a problem.
@@ -219,14 +220,29 @@ for file in ${files}; do
219220
problems=${problems}' Missing {{- $_job_name := "..." }}\n'
220221
fi
221222
222-
# Check all jobs actually use $_job_name
223+
# Check there is exactly one Job resource defined in the file
224+
awkout=$(awk 'BEGIN { job_count=0; }
225+
/^[[:space:]]*kind:[[:space:]]+Job/ { job_count++ }
226+
END {
227+
if(job_count != 1) {
228+
printf "Exactly 1 Job should be defined in each template file, but %s were found", job_count
229+
exit 1
230+
}
231+
}' $file \
232+
)
233+
rc=$?
234+
if [[ $rc != 0 ]]; then
235+
problems=${problems}' '${awkout}'\n'
236+
fi
237+
238+
# Check the job actually uses $_job_name
223239
awkout=$(awk 'BEGIN { job_count=0; valid_name_count=0; }
224240
/^[[:space:]]*kind:[[:space:]]+Job/ { inJob=1; job_count++ }
225241
/^---/ { inJob=0 }
226242
inJob && /name:[[:space:]]+\{\{[[:space:]]*\$_job_name[[:space:]]*\}\}/ { valid_name_count++ }
227243
END {
228244
if(valid_name_count!=job_count) {
229-
print "At least one Job does not have name: {{ $_job_name }}"
245+
print "The Job does not have name: {{ $_job_name }}"
230246
exit 1
231247
}
232248
}' $file \
@@ -235,6 +251,36 @@ for file in ${files}; do
235251
if [[ $rc != 0 ]]; then
236252
problems=${problems}' '${awkout}'\n'
237253
fi
254+
255+
256+
257+
# Check $_job_cleanup_group constant is defined
258+
grep -Eq '^[[:space:]]*\{\{-?[[:space:]]+\$_job_cleanup_group[[:space:]]*:=[^}]+\}' $file
259+
rc=$?
260+
if [[ $rc != 0 ]]; then
261+
problems=${problems}' Missing {{- $_job_cleanup_group := ... }}\n'
262+
fi
263+
264+
# Check mas.ibm.com/job-cleanup_group: $_job_cleanup_group label is applied to the Job
265+
awkout=$(awk 'BEGIN { state=0; found=0 }
266+
/^---/ { state=0 }
267+
/^[[:space:]]*spec:/ { state=0 }
268+
/^[[:space:]]*kind:[[:space:]]+Job/ { state=1; }
269+
state==1 && /^[[:space:]]*metadata:/ { state=2; }
270+
state==2 && /^[[:space:]]+labels:/ { state=3; }
271+
state==3 && /^[[:space:]]+mas\.ibm\.com\/job-cleanup-group[[:space:]]*:[[:space:]]+\{\{[[:space:]]*\$_job_cleanup_group[[:space:]]*\}\}/ { found=1 }
272+
END {
273+
if(found!=1) {
274+
print "The Job does not have the mas.ibm.com/job-cleanup-group: {{ $_job_cleanup_group }} label"
275+
exit 1
276+
}
277+
}' $file \
278+
)
279+
rc=$?
280+
if [[ $rc != 0 ]]; then
281+
problems=${problems}' '${awkout}'\n'
282+
fi
283+
238284
fi
239285
240286
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
apiVersion: v2
2+
name: job-cleaner
3+
description: A CronJob to delete old versions of Jobs created by ArgoCD
4+
type: application
5+
version: 1.0.0
6+
7+
dependencies:
8+
- name: junitreporter
9+
version: 1.0.0
10+
repository: "file://../../sub-charts/junitreporter/"
11+
condition: junitreporter.devops_mongo_uri != ""
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
MAS SaaS Job Cleaner
2+
===============================================================================
3+
4+
Deploys the `mas-saas-job-cleaner-cron` CronJob, responsible for cleaning up orphaned Job resources in the cluster. It works by grouping Jobs in the cluster according to the `mas.ibm.com/job-cleanup-group` label, then deleting all Jobs from each group except for the one with the latest `creationTimestamp`.
5+
6+
For safety, the CronJob is assigned a ServiceAccount that can only list and delete Job resources (so it can never delete any other type of resource). Furthermore, the logic ensures that only Job resources with the `mas.ibm.com/job-cleanup-group` label can be deleted.
7+
8+
The `mas-devops-saas-job-cleaner` command executed by this CronJob is defined in [python-devops](https://github.com/ibm-mas/python-devops/blob/stable/bin/mas-devops-saas-job-cleaner).
9+
10+
11+
> In MaS SaaS, Job resources are routinely orphaned (i.e. marked for deletion by ArgoCD) since, when an update is required to an immutable Job field (e.g. its image tag), a new version of the Job resource must be created with a different name. When [auto_delete: false](https://ibm-mas.github.io/gitops/main/accountrootmanifest/#auto_delete) is set, ArgoCD will (by design) not perform this cleanup for us. Over time, Job resources will accumulate and put pressure on the K8S API server.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
{{- /*
2+
Use the build/bin/set-cli-image-tag.sh script to update this value across all charts.
3+
*/}}
4+
{{- $_cli_image_tag := "13.17.0" }}
5+
6+
7+
{{- $ns := "job-cleaner" }}
8+
9+
---
10+
apiVersion: rbac.authorization.k8s.io/v1
11+
kind: ClusterRole
12+
metadata:
13+
name: mas-saas-job-cleaner-role
14+
annotations:
15+
argocd.argoproj.io/sync-wave: "02"
16+
{{- if .Values.custom_labels }}
17+
labels:
18+
{{ .Values.custom_labels | toYaml | indent 4 }}
19+
{{- end }}
20+
rules:
21+
- apiGroups:
22+
- batch
23+
resources:
24+
- jobs
25+
verbs:
26+
- list
27+
- delete
28+
29+
---
30+
# Service account that is authorized to read k8s secrets (needed by the job)
31+
kind: ServiceAccount
32+
apiVersion: v1
33+
metadata:
34+
name: "mas-saas-job-cleaner-sa"
35+
namespace: "{{ $ns }}"
36+
annotations:
37+
argocd.argoproj.io/sync-wave: "02"
38+
{{- if .Values.custom_labels }}
39+
labels:
40+
{{ .Values.custom_labels | toYaml | indent 4 }}
41+
{{- end }}
42+
43+
44+
---
45+
kind: ClusterRoleBinding
46+
apiVersion: rbac.authorization.k8s.io/v1
47+
metadata:
48+
name: mas-saas-job-cleaner-rolebinding
49+
annotations:
50+
argocd.argoproj.io/sync-wave: "03"
51+
{{- if .Values.custom_labels }}
52+
labels:
53+
{{ .Values.custom_labels | toYaml | indent 4 }}
54+
{{- end }}
55+
subjects:
56+
- kind: ServiceAccount
57+
name: mas-saas-job-cleaner-sa
58+
namespace: {{ $ns }}
59+
roleRef:
60+
apiGroup: rbac.authorization.k8s.io
61+
kind: ClusterRole
62+
name: mas-saas-job-cleaner-role
63+
64+
65+
66+
---
67+
kind: CronJob
68+
apiVersion: batch/v1
69+
metadata:
70+
name: "mas-saas-job-cleaner-cron"
71+
namespace: "{{ $ns }}"
72+
annotations:
73+
argocd.argoproj.io/sync-wave: "04"
74+
{{- if .Values.custom_labels }}
75+
labels:
76+
{{ .Values.custom_labels | toYaml | indent 4 }}
77+
{{- end }}
78+
spec:
79+
schedule: '0 0 * * *'
80+
suspend: false
81+
concurrencyPolicy: Forbid
82+
jobTemplate:
83+
spec:
84+
template:
85+
metadata:
86+
{{- if .Values.custom_labels }}
87+
labels:
88+
{{ .Values.custom_labels | toYaml | indent 12 }}
89+
{{- end }}
90+
spec:
91+
containers:
92+
- name: "mas-saas-job-cleaner"
93+
image: quay.io/ibmmas/cli:{{ $_cli_image_tag }}
94+
imagePullPolicy: IfNotPresent
95+
command:
96+
- /bin/sh
97+
- -c
98+
- |
99+
set -e
100+
mas-devops-saas-job-cleaner --label mas.ibm.com/job-cleanup-group --log-level INFO
101+
restartPolicy: OnFailure
102+
serviceAccountName: "mas-saas-job-cleaner-sa"
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
---

cluster-applications/010-redhat-cert-manager/templates/04-postsync-update-sm_Job.yaml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Increment this value whenever you make a change to an immutable field of the Job
3434
E.g. passing in a new environment variable.
3535
Included in $_job_hash (see below).
3636
*/}}
37-
{{- $_job_version := "v2" }}
37+
{{- $_job_version := "v3" }}
3838

3939
{{- /*
4040
10 char hash appended to the job name taking into account $_job_config_values, $_job_version and $_cli_image_tag
@@ -45,6 +45,27 @@ immutable field of any existing Job resource.
4545

4646
{{- $_job_name := join "-" (list $_job_name_prefix $_job_hash )}}
4747

48+
{{- /*
49+
Set as the value for the mas.ibm.com/job-cleanup-group label on the Job resource.
50+
51+
When the auto_delete flag is not set on the root application, a CronJob in the cluster uses this label
52+
to identify old Job resources that should be pruned on behalf of ArgoCD.
53+
54+
Any Job resources in the same namespace that have the mas.ibm.com/job-cleanup-group with this value
55+
will be considered to belong to the same cleanup group. All but the most recent (i.e. with the latest "creation_timestamp")
56+
Jobs will be automatically deleted.
57+
58+
$_job_cleanup_group can usually just be based on $_job_name_prefix. There are some special cases
59+
where multiple Jobs are created in our templates using a Helm loop. In those cases, additional descriminators
60+
must be added to $_job_cleanup_group.
61+
62+
By convention, we sha1sum this value to guarantee we never exceed the 63 char limit regardless of which discriminators
63+
are required here.
64+
65+
*/}}
66+
{{- $_job_cleanup_group := cat $_job_name_prefix | sha1sum }}
67+
68+
4869
{{ $ns := "cert-manager-operator"}}
4970
{{ $aws_secret := "aws"}}
5071
{{ $role_name := "postsync-rhcm-update-sm-r" }}
@@ -142,8 +163,9 @@ metadata:
142163
namespace: {{ $ns }}
143164
annotations:
144165
argocd.argoproj.io/sync-wave: "015"
145-
{{- if .Values.custom_labels }}
146166
labels:
167+
mas.ibm.com/job-cleanup-group: {{ $_job_cleanup_group }}
168+
{{- if .Values.custom_labels }}
147169
{{ .Values.custom_labels | toYaml | indent 4 }}
148170
{{- end }}
149171
spec:

cluster-applications/020-ibm-dro/templates/08-postsync-update-sm_Job.yaml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Increment this value whenever you make a change to an immutable field of the Job
2626
E.g. passing in a new environment variable.
2727
Included in $_job_hash (see below).
2828
*/}}
29-
{{- $_job_version := "v2" }}
29+
{{- $_job_version := "v3" }}
3030

3131
{{- /*
3232
10 char hash appended to the job name taking into account $_job_config_values, $_job_version and $_cli_image_tag
@@ -37,6 +37,26 @@ immutable field of any existing Job resource.
3737

3838
{{- $_job_name := join "-" (list $_job_name_prefix $_job_hash )}}
3939

40+
{{- /*
41+
Set as the value for the mas.ibm.com/job-cleanup-group label on the Job resource.
42+
43+
When the auto_delete flag is not set on the root application, a CronJob in the cluster uses this label
44+
to identify old Job resources that should be pruned on behalf of ArgoCD.
45+
46+
Any Job resources in the same namespace that have the mas.ibm.com/job-cleanup-group with this value
47+
will be considered to belong to the same cleanup group. All but the most recent (i.e. with the latest "creation_timestamp")
48+
Jobs will be automatically deleted.
49+
50+
$_job_cleanup_group can usually just be based on $_job_name_prefix. There are some special cases
51+
where multiple Jobs are created in our templates using a Helm loop. In those cases, additional descriminators
52+
must be added to $_job_cleanup_group.
53+
54+
By convention, we sha1sum this value to guarantee we never exceed the 63 char limit regardless of which discriminators
55+
are required here.
56+
57+
*/}}
58+
{{- $_job_cleanup_group := cat $_job_name_prefix | sha1sum }}
59+
4060

4161
{{ $ns := .Values.dro_namespace}}
4262
{{ $aws_secret := "aws"}}
@@ -125,8 +145,9 @@ metadata:
125145
namespace: {{ $ns }}
126146
annotations:
127147
argocd.argoproj.io/sync-wave: "028"
128-
{{- if .Values.custom_labels }}
129148
labels:
149+
mas.ibm.com/job-cleanup-group: {{ $_job_cleanup_group }}
150+
{{- if .Values.custom_labels }}
130151
{{ .Values.custom_labels | toYaml | indent 4 }}
131152
{{- end }}
132153
spec:

cluster-applications/060-custom-sa/templates/04-postsync-update-sm_Job.yaml

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,26 @@ immutable field of any existing Job resource.
3838
{{- $_job_name := join "-" (list $_job_name_prefix $_job_hash )}}
3939

4040

41+
{{- /*
42+
Set as the value for the mas.ibm.com/job-cleanup-group label on the Job resource.
43+
44+
When the auto_delete flag is not set on the root application, a CronJob in the cluster uses this label
45+
to identify old Job resources that should be pruned on behalf of ArgoCD.
46+
47+
Any Job resources in the same namespace that have the mas.ibm.com/job-cleanup-group with this value
48+
will be considered to belong to the same cleanup group. All but the most recent (i.e. with the latest "creation_timestamp")
49+
Jobs will be automatically deleted.
50+
51+
$_job_cleanup_group can usually just be based on $_job_name_prefix. There are some special cases
52+
where multiple Jobs are created in our templates using a Helm loop. In those cases, additional descriminators
53+
must be added to $_job_cleanup_group.
54+
NOTE: this is one of those cases; we need a separate cleanup group for each per-sa-key Job.
55+
56+
By convention, we sha1sum this value to guarantee we never exceed the 63 char limit regardless of which discriminators
57+
are required here.
58+
59+
*/}}
60+
{{- $_job_cleanup_group := cat $_job_name_prefix $key | sha1sum }}
4161

4262
---
4363
apiVersion: batch/v1
@@ -47,8 +67,9 @@ metadata:
4767
namespace: {{ $.Values.custom_sa_namespace }}
4868
annotations:
4969
argocd.argoproj.io/sync-wave: "064"
50-
{{- if $.Values.custom_labels }}
5170
labels:
71+
mas.ibm.com/job-cleanup-group: {{ $_job_cleanup_group }}
72+
{{- if $.Values.custom_labels }}
5273
{{ $.Values.custom_labels | toYaml | indent 4 }}
5374
{{- end }}
5475
spec:
@@ -148,4 +169,4 @@ spec:
148169
defaultMode: 420
149170
optional: false
150171
backoffLimit: 4
151-
{{- end }}
172+
{{- end }}

0 commit comments

Comments
 (0)