DataStore for 360 Giving data
Example via Debian or Ubuntu packages:
In this example we create a user test and password test for dev usage.
$ sudo apt-get install postgresql-16 postgresql-server-dev-16
$ sudo -u postgres createuser -P -e test --interactive
$ createdb -U test -W 360givingdatastore
Note: Special Postgresql extensions are used for indexes, if you don't want to have these installed in the database or the database user doesn't have permissions to install extensions then set environment variable SKIP_SPECIAL_DB_INDEX=true before you run migrate. In development you can also set the DATABASE_HOST, DATABASE_NAME,DATABASE_USER and DATABASE_PASSWORD environmental variables.
Example via Docker:
docker run --name datastorepostgres -p 5432:5432 -e POSTGRES_PASSWORD=test -e POSTGRES_USER=test -e POSTGRES_DB=360givingdatastore -d postgres # Will set it up and start it
docker start datastorepostgres # If already set up, will start it
docker stop datastorepostgres # Stop it running
$ virtualenv --python=python3.12 ./.ve/
$ source ./.ve/bin/activate
$ pip install -r requirements.txt
$ export DJANGO_SETTINGS_MODULE=settings.settings_dev
$ manage.py migrate # see note above about Special Postgres extensions
$ manage.py createsuperuser
$ manage.py runserver
Note: before loading grant data you may wish to load additional_data sources
$ manage.py load_datagetter_data ../path/to/data/dir/from/datagetter/
Create/update the Recipient/Funder model entries from grant data.
$ python manage.py manage_entities_data --update
A number of the sources for additional_data have their own local caches which need to be kept up-to-date.
To better understand additional data, refer to 360Giving Datastore - additional data.
For a script which combines all the steps, see datastore/additional_data/sources/update_all_sources.sh
Occasionally we also need to update the upstream URLs where data is fetched from, found in datastore/additional_data/sources/*.py.
Our API docs / schema are based on OpenAPI 3.0 (as generated by drf-spectacular). OpenAPI 3.0 is incompatible with the JSON Schema used by 360G, so we keep a copy of 360G's schema converted into OpenAPI 3.0 format. When 360G updates their standard/schema, we should update this copy too.
To do this, first install the CLI tool used to convert JSON Scheam to OpenAPI 3.0:
npm install -g --save @openapi-contrib/json-schema-to-openapi-schema
When the schema changes, copy from standard repo to static/, and convert from JSON Schema to OpenAPI 3.0, e.g.:
STANDARD_VERSION=1.3
cd datastore/static/
curl https://raw.githubusercontent.com/ThreeSixtyGiving/standard/${STANDARD_VERSION}/schema/360-giving-schema.json > 360-giving-schema-${STANDARD_VERSION}-jsonschema.json
json-schema-to-openapi-schema convert 360-giving-schema-${STANDARD_VERSION}-jsonschema.json > 360-giving-schema-${STANDARD_VERSION}-openapi.jsonand update the TSG_SCHEMA_STATICFILE setting in settings.py.
Downloads codelists from the ThreeSixtyGiving/standard GitHub repo.
./manage.py load_codelist_codesLook at the datastore_num_current_grants_with_beneficiary_location_geocode_without_lookup metric of the getter run before and after updating geodata, it should go down.
./manage.py load_geocode_names # CHD Data
./manage.py load_geolookups # from https://github.com/drkane/geo-lookups
./manage.py load_nspl# Got to delete the old org data before loading in the new
./manage.py delete_org_data --no-prompt
./additional_data/sources/load_all_org_data.shThere are many useful management commands see:
$ manage.py --help
Developers can also use Docker Compose to get a local development environment.
docker-compose -f docker-compose.dev.yml up
The website should be available at http://localhost:8000
Use Ctrl-C to exit.
Whilst leaving the up command running, you should use docker-compose run with the commands from the above sections.
eg; instead of running:
$ manage.py load_geocode_names
Run:
$ docker-compose -f docker-compose.dev.yml run datastore-web python datastore/manage.py load_geocode_names
Run:
$ docker-compose -f docker-compose.dev.yml run -e PGPASSWORD=postgres postgres psql -h postgres -U postgres
$ pip install -r ./requirements_dev.txt
You will also need the chromedriver for your machine's chromimum based browser. see https://chromedriver.chromium.org/downloads
Alternatively edit the selenium test setup in test_browser to use your preferred selenium setup.
$ ./manage.py test tests
$ flake8
$ black --check ./
Note: You may want to run this with SKIP_SPECIAL_DB_INDEX=true to avoid the need for the test database user to have permissions for installing postgresql extensions when running the tests.
You can run any particular tests individually e.g.:
$ manage.py test tests.test_additional_data_tsgorgtype
see manage.py test --help for more info
Note that the OrgInfoCache entries for the test funders/recipients also needs to be included in the test data fixture.
./manage.py dumpdata --output db/fixtures/test_data.json db additional_data.OrgInfoCacheWe target python3.12 for our requirements.
Use pip-compile provided by pip-tools package to process requirements .in files.
This module is the central datastore for 360 Giving data. It contains the models which define the database and the ORM for accessing, creating and updating the grant data.
A key function is managing the Latest data which represent the created datasets that are built from datagetter grant data. These datasets are used in GrantNav.
Management commands here allow for loading and managing datasets as well as a mechanism for external scripts to update the current status of the system (status is used in the UI and for GrantNav API).
This contains the API endpoints that are used to control the system from the UI, indicate the status and data download url for GrantNav updates as well as an experimental REST API built using django-rest-framework.
Templates and staic html/js live here, there is a basic dashboard which shows the current status of the system as well as a mechanism to trigger a full datarun (fetch and load).
During the load of grant data (datagetter data) that is done by the db module command load_datagetter_data each grant is passed to the create method of the AdditionalDataGenerator, here various sources are used to add to an additional_data object that is available on the Grant model.
additional_data data sources come in various forms, static files which are loaded, as well as caches of data in our local database (for example postcode lookups).
The generator ensures a particular order to additional_data fields being added which allows for dependencies of one source to another.
The monitoring module keeps a timeseries history of snapshots of statistics and metrics about the grant data.
It exports a management command create_monitoring_snapshot which creates a new snapshot of the current state of grants, as well as list_monitoring_snapshots and delete_monitoring_snapshot helper commands.
These snapshots can be used both on a daily basis to monitor for changes or power live dashboards, or for historical analysis of changes over time.
An API is exposed from the api module.
Provides a prometheus endpoint to monitor vital metrics on the datastore
An example datarun script. This is an orchestrator of running a datagetter, updating the statuses and loading the data into the datastore.
Django Settings for the datastore. Includes location for data run logs, the data run script / pid
Various cross-module tests.