This repository contains automation scripts for sourcing and cataloging metadata from the CKAN ecosystem, including extensions and instances worldwide. The collected data powers the CKAN Ecosystem Catalog
1get_URL.py
- Discovers CKAN extensions on GitHub2refresh.py
- Updates extension metadata3update_catalog.py
- Synchronizes data with the catalog4uploadDataset.py
- Upload the file to datasets page
0.csv
- Base dataset of CKAN instances1-Name-Process.py
- Processes site names and converts titles to link-friendly identifiers.2-CKANActionAPI copy.py
- Fetches data of instances via CKAN Action API3-siteType.py
- Categorizes site types4-Description.py
- Extracts site descriptions5-Use AI To deduct Location copy.py
- Infers geographic locations6-Geocode using OpenStreetMap Nominatim API.py
- Geocodes locations7-tstamp.py
- Adds timestamps to metadata
The repository includes a GitHub Actions workflow (.github/workflows/update-ckan-metadata.yml
) that automatically fetches and updates extension metadata on a scheduled basis.
-
Install dependencies:
pip install -r requirements.txt
-
Configure CKAN API key
-
Run the extension metadata collection:
python 1get_URL.py python 2refresh.py python 3update_catalog.py