Skip to content

delete-old-image: support cutoff date #2012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 25, 2023

Conversation

minrk
Copy link
Member

@minrk minrk commented Sep 6, 2021

supports cutoff date via --delete-before, deleting images older than a certain age, and caching for more reasonable performance across multiple tries

also begins support for cleaning out other registries. OVH almost works, but is extremely slow. Docker Hub (turing) needs special auth

@minrk minrk force-pushed the delete-older-images branch from 8aa3f6f to 7e5b9c2 Compare September 6, 2021 15:08
@minrk
Copy link
Member Author

minrk commented Sep 6, 2021

thinking of #2009, running a dry-run on prod, which has 48k repositories, running

./scripts/delete-old-images.py prod --delete-before=2021-02-01 --dry-run

would delete 68,722 images, totalling about 73TB.

@betatim
Copy link
Member

betatim commented Sep 15, 2021

The JupyterLab updated happened, did you use this script to clean up the registry for that? -> Should we merge this as is? (Can always come back to improve it later)

@minrk
Copy link
Member Author

minrk commented Dec 20, 2021

I never did run this, and part of reducing costs related to jupyterhub/team-compass#463. I'm going to have a go at just deleting the very old images today (before 2021-02-01) which is still a huge fraction of our current storage costs ($1600/mo) since operational costs on GCP may soon become tight!

supports cutoff date via `--delete-before`, deleting images older than a certain age

also begins support for cleaning out other registries. OVH almost works, but is extremely slow. Docker Hub (turing) needs special auth
@minrk minrk force-pushed the delete-older-images branch 2 times, most recently from ddcd11b to 505478e Compare November 21, 2022 10:03
for easier poking around what will/should be culled
@minrk minrk force-pushed the delete-older-images branch from b881d0d to d96247c Compare November 21, 2022 11:18
instead of deprecated asyncio.get_event_loop
would be nice if GCR tracked last pull time, but only Harbor does that
@minrk minrk marked this pull request as ready for review November 21, 2022 11:42
@minrk
Copy link
Member Author

minrk commented Mar 16, 2023

I tried running this for OVH to deal with #2514, but Harbor doesn't actually support the part of the docker registry API that's needed to do this.

Fortunately, Harbor has its own API, which is even easier to use and has more useful metadata for culling images (basically what Harbor's internal GC is meant to be doing, but isn't for some reason). So I ran a much simpler script:

import os

import requests
from dateutil.parser import parse as parse_date

harbor_url = "https://2lmrrh8f.gra7.container-registry.ovh.net/api/v2.0"
project_name = "mybinder-builds"
date_cutoff = parse_date("2023-02-01T00:00Z")

username = "mybinder-admin"
#  export HARBOR_PASSWORD=$(terraform output -raw registry_admin_password)
password = os.environ["HARBOR_PASSWORD"]

# s = requests.Session()
# s.auth = (username, password)
while True:
    r = requests.get(
        harbor_url + f"/projects/{project_name}/repositories",
        params=dict(sort="update_time", page_size="100"),
        auth=(username, password),
    )

    repos = r.json()
    r.raise_for_status()
    for repo in repos:
        project_name, repo_name = repo["name"].split("/", 1)
        print(repo)
        if parse_date(repo["update_time"]) > date_cutoff:
            break
        r = requests.delete(
            harbor_url + f"/projects/{project_name}/repositories/{repo_name}",
            auth=(username, password),
        )
        r.raise_for_status()

It took a while, but it accomplished mostly the same result, and with far fewer API calls (deleted whole repos not updated in the last 6 weeks, rather than artifacts).

@consideRatio
Copy link
Member

consideRatio commented Apr 11, 2023

Should we go for a merge of this @minrk? I figure it doesn't require much review if its a manually used utility script. So right now, maybe we could do:

@minrk
Copy link
Member Author

minrk commented Apr 25, 2023

@consideRatio added readme

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very clear improvement without risk of breaking changes as these are manually run files.

Let's go for a merge!

@consideRatio consideRatio merged commit a6a7d8f into jupyterhub:main Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants