-
Notifications
You must be signed in to change notification settings - Fork 75
delete-old-image: support cutoff date #2012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8aa3f6f
to
7e5b9c2
Compare
thinking of #2009, running a dry-run on prod, which has 48k repositories, running ./scripts/delete-old-images.py prod --delete-before=2021-02-01 --dry-run would delete 68,722 images, totalling about 73TB. |
The JupyterLab updated happened, did you use this script to clean up the registry for that? -> Should we merge this as is? (Can always come back to improve it later) |
I never did run this, and part of reducing costs related to jupyterhub/team-compass#463. I'm going to have a go at just deleting the very old images today (before 2021-02-01) which is still a huge fraction of our current storage costs ($1600/mo) since operational costs on GCP may soon become tight! |
7e5b9c2
to
bf71612
Compare
supports cutoff date via `--delete-before`, deleting images older than a certain age also begins support for cleaning out other registries. OVH almost works, but is extremely slow. Docker Hub (turing) needs special auth
ddcd11b
to
505478e
Compare
for easier poking around what will/should be culled
b881d0d
to
d96247c
Compare
instead of deprecated asyncio.get_event_loop
would be nice if GCR tracked last pull time, but only Harbor does that
I tried running this for OVH to deal with #2514, but Harbor doesn't actually support the part of the docker registry API that's needed to do this. Fortunately, Harbor has its own API, which is even easier to use and has more useful metadata for culling images (basically what Harbor's internal GC is meant to be doing, but isn't for some reason). So I ran a much simpler script: import os
import requests
from dateutil.parser import parse as parse_date
harbor_url = "https://2lmrrh8f.gra7.container-registry.ovh.net/api/v2.0"
project_name = "mybinder-builds"
date_cutoff = parse_date("2023-02-01T00:00Z")
username = "mybinder-admin"
# export HARBOR_PASSWORD=$(terraform output -raw registry_admin_password)
password = os.environ["HARBOR_PASSWORD"]
# s = requests.Session()
# s.auth = (username, password)
while True:
r = requests.get(
harbor_url + f"/projects/{project_name}/repositories",
params=dict(sort="update_time", page_size="100"),
auth=(username, password),
)
repos = r.json()
r.raise_for_status()
for repo in repos:
project_name, repo_name = repo["name"].split("/", 1)
print(repo)
if parse_date(repo["update_time"]) > date_cutoff:
break
r = requests.delete(
harbor_url + f"/projects/{project_name}/repositories/{repo_name}",
auth=(username, password),
)
r.raise_for_status() It took a while, but it accomplished mostly the same result, and with far fewer API calls (deleted whole repos not updated in the last 6 weeks, rather than artifacts). |
Should we go for a merge of this @minrk? I figure it doesn't require much review if its a manually used utility script. So right now, maybe we could do:
|
for more information, see https://pre-commit.ci
@consideRatio added readme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very clear improvement without risk of breaking changes as these are manually run files.
Let's go for a merge!
supports cutoff date via
--delete-before
, deleting images older than a certain age, and caching for more reasonable performance across multiple triesalso begins support for cleaning out other registries. OVH almost works, but is extremely slow. Docker Hub (turing) needs special auth