A Python library for tracking and versioning structural changes in Sri Lankan government gazettes, with optional FastAPI backend.
Supports:
- MinDep: Ministries and Departments (structural changes: ADD, TERMINATE, MOVE)
- Person: Person-portfolio assignments (appointments, removals)
- Process extracted details for ministries/departments and person-portfolio assignments (JSON format)
- Detect and classify transactions:
ADD,TERMINATE,MOVE,RENAME - Maintain versioned state snapshots for rollback/history
- Output CSVs for transactions (to seed Neo4j)
- [Optional] Expose a FastAPI backend for interactive review and state management
pip install git+https://github.com/sehansi-9/gztprocessor.git
# or, for development:
pip install -e .
# To enable API features:
pip install "git+https://github.com/sehansi-9/gztprocessor.git#egg=gztprocessor[api]"
If you want to use the interactive API:
-
Install API dependencies:
pip install gztprocessor[api]
-
Run the API server:
uvicorn main:app --reload
-
The API exposes endpoints for loading, reviewing, applying, and resetting state.
See the API Endpoints section below.
or
Fork the repository on GitHub to your own account.
Clone your fork locally and navigate into the project folder:
git clone https://github.com/<your-username>/gztprocessor.git
cd gztprocessor- Install Python Dependencies
Install the required Python packages for both core functionality and API support:
python -m pip install --upgrade pip
python -m pip install nltk rapidfuzz fastapi uvicorn- Run the FastAPI Backend
Start the backend server:
uvicorn main:app --reloadThe API will be available at http://127.0.0.1:8000
- Setup and Run the Frontend
Open a new terminal, navigate to the frontend folder, install dependencies, and start the development server:
cd gztp-frontend
npm install
npm run devYou can now access the frontend application in your browser (usually at http://localhost:5173).
gztprocessor/
├── database_handlers/
├── gazette_processors/
├── db_connections/
├── state_managers/
├── schemas/
├── csv_writer.py
└── __init__.py
main.py
MANIFEST.in
pyproject.toml
readme.md
{
"ministers": [
{
"name": "Minister of X",
"departments": ["Dept A", "Dept B", "Dept C"]
}
]
}{
"ADD": [ ... ],
"OMIT": [ ... ]
}{
"ADD": [ ... ],
"TERMINATE": [ ... ],
"RENAME": [ ... ]
}- See the
request_body/directory for real examples.
{
"ministers": [
{
"name": "Minister of X",
"departments": [
{ "name": "Dept A" },
{ "name": "Dept B", "previous_ministry": "Minister of Y" },
{ "name": "Dept C" , "previous_ministry": "Minister of G"}
]
},
{
"name": "Minister of Z",
"departments": [
{ "name": "Dept D", "previous_ministry": "Minister of X" },
{ "name": "Dept E" }
]
}
]
}
{
"transactions": {
"moves": [ ... ],
"adds": [ ... ],
"terminates": [ ... ]
}
}{
"transactions": {
"moves": [ ... ],
"adds": [ ... ],
"terminates": [ ... ]
}
}- CSVs are generated in
output/, organized by type, date, and gazette number. - Sample MinDep CSV row:
transaction_id,parent,parent_type,child,child_type,rel_type,date 2297-78_tr_01,Minister of Labour,minister,Vocational Training Authority,department,AS_DEPARTMENT,2022-09-16 - Sample Person CSV row:
transaction_id,parent,parent_type,child,child_type,rel_type,date 2067-09_tr_01,"Ministry of Science, Technology & Research",minister,Hon. John Doe,person,AS_APPOINTED,2018-04-12
- Snapshots are saved as JSON in
state/mindep/andstate/person/. - MinDep Example:
{ "ministers": [ { "name": "Minister of Finance", "departments": [ "Department of Treasury", "Inland Revenue", "Customs" ] } ] } - Person Example:
{ "persons": [ { "person_name": "Hon. John Doe", "portfolios": [ { "name": "Ministry of Roads and Highways", "position": "Minister" } ] } ] }
| Endpoint | Method | Description |
|---|---|---|
/mindep/state/latest |
GET | Get latest saved state (gazette number, date, state) |
/mindep/state/{date} |
GET | Get state(s) for a specific date; returns gazette numbers if multiple |
/mindep/state/{date}/{gazette_number} |
GET | Get a specific state by date and gazette number |
/mindep/initial/{date}/{gazette_number} |
GET | Preview contents of initial gazette |
/mindep/initial/{date}/{gazette_number} |
POST | Create initial state in DB & save snapshot (Body: JSON with ministers array) |
/mindep/amendment/{date}/{gazette_number} |
GET | Detect transactions from amendment |
/mindep/amendment/{date}/{gazette_number} |
POST | Apply confirmed transactions to DB & snapshot (Body: JSON with transactions object) |
/mindep/state/reset |
DELETE | Deletes all MinDep state files and DB |
/person/state/latest |
GET | Get latest saved persons and their portfolios |
/person/state/{date} |
GET | Get state(s) for a specific date; returns gazette numbers if multiple |
/person/state/{date}/{gazette_number} |
GET | Get a specific person and portfolio state by date and gazette number |
/person/{date}/{gazette_number} |
GET | Preview predicted transactions from person gazette |
/person/{date}/{gazette_number} |
POST | Apply reviewed transactions to DB & save snapshot (Body: JSON with transactions object) |
/person/state/reset |
DELETE | Deletes all Person state files and DB |
/ |
GET | Health check/status message |
| Endpoint Type | Main Function(s) Called | Purpose |
|---|---|---|
| MinDep Initial | extract_initial_gazette_data, load_initial_state_to_db |
Extract/load initial ministry structure |
| MinDep Amendment | process_amendment_gazette, apply_transactions_to_db |
Detect/apply department changes |
| Person Gazette | process_person_gazette, apply_transactions_to_db |
Detect/apply person-portfolio changes |
| State Management | get_latest_state, get_state_by_date, load_state, clear_all_state_data |
Manage and reset state snapshots |
- The API returns JSON error messages for missing files, invalid requests, or not found resources.
- Always check the response for an
errorkey if your request fails. - The backend prints warnings to the console for missing or duplicate data.
- The system relies on department/person position for parsing and matching (for mindep)
- MOVEs are inferred by matching omitted/added names
- RENAMEs are detected for person gazettes when ministry/portfolio names change
- Input/output file naming conventions are important (see
utils.py) - Stemming, Fuzzy Matching, and Scores:
- For person gazettes, the system uses stemming (via NLTK's PorterStemmer) and fuzzy string matching (via RapidFuzz) to compare ministry/portfolio names.
- Fuzzy matching computes a similarity score (token sort ratio) between new and existing ministry/portfolio names.
- If the score exceeds a threshold (default 70) or there is word overlap, the system suggests possible terminates for adds and moves.
- These suggestions, along with their scores, are included in the API response to help users review and confirm transactions.
- Use
curlor Postman to test endpoints (if using the API) - See
input/for sample gazette files and dates/gazette numbers - See
request_body/for sample request payloads - See https://github.com/sehansi-9/test-gztprocessor for a sample use of gztprocessor
See LICENSE.