An AI-powered tool for enhancing transparency and accountability in Brazilian public procurement.
π See the live platform! π
Ever feel like public spending is a black box? In Brazil, billions are spent on public contracts, but keeping an eye on all of it is a Herculean task. Mistakes, inefficiencies, and even fraud can hide in mountains of documents.
Public Detective is here to change the game. We're an AI-powered watchdog that sniffs out irregularities in public tenders. Think of it as a digital detective, working 24/7 to help journalists, activists, and you demand transparency.
This isn't just code; it's a mission. Developed at PUCPR with the help of the amazing folks at TransparΓͺncia Brasil, this project puts cutting-edge tech in the hands of the people.
- π€ Automated Data Retrieval: Fetches procurement data directly from the official PNCP APIs.
- π‘ AI-Powered Analysis: Uses a Generative AI model to flag potential red flags and provide a detailed risk score with a rationale.
- ποΈ Full Traceability: Archives both original and processed documents in Google Cloud Storage for every analysis.
- π‘οΈ Idempotent by Design: Avoids re-analyzing unchanged documents by checking a content hash.
The application operates in a two-stage pipeline: a lightweight Pre-analysis stage to discover and prepare data, followed by an on-demand, AI-powered Analysis stage. This decoupled architecture ensures efficiency and cost-effectiveness.
Hereβs a simplified look at how it works:
graph LR
    subgraph "Input"
        A[Public Procurement Data]
    end
    subgraph "Public Detective's Magic"
        B(Automated Analysis)
        C(AI-Powered Insights)
        D(Risk Scoring)
    end
    subgraph "Output"
        E[Transparency Reports]
        F[Actionable Insights for Journalists & Activists]
    end
    A --> B;
    B --> C;
    C --> D;
    D --> E;
    D --> F;
    - 
Language: Python 3.12+ 
- 
AI / NLP: Google Gemini API 
- 
CLI Framework: Click 
- 
Database & Migrations: PostgreSQL, managed with Alembic 
- 
Core Toolkit: - SQLAlchemy Core: For writing safe, raw SQL queries.
- Pydantic: For data validation and settings management.
- Tenacity: For robust HTTP request retries.
- LibreOffice Headless: For office document conversion.
 
- 
Infrastructure: Docker, Google Cloud Storage, Google Cloud Pub/Sub 
To get a local copy up and running, follow these simple steps.
- Python 3.12
- Poetry
- Docker
- LibreOffice Headless
- 
Clone the repository: git clone https://github.com/hunsche/public-detective.git cd public-detective
- 
Install dependencies: poetry install 
- 
Set up environment variables: Create a .envfile from the example. This is primarily used to configure local emulators.cp .env.example .env Authentication with Google Cloud is handled automatically. See the Authentication section for more details. 
- 
Start services: docker compose up -d 
- 
Apply database migrations: poetry run alembic upgrade head 
This project uses the Vertex AI backend for the Google Gemini API and authenticates using a standard Google Cloud pattern called Application Default Credentials (ADC). This provides a secure and flexible mechanism that works across different environments.
The application attempts to find credentials in the following order:
- 
GOOGLE_APPLICATION_CREDENTIALSEnvironment Variable:- Use Case: This is the standard Google Cloud method to force the application to use a specific service account. It's useful for local development or CI/CD.
- To Use: Set the environment variable to the absolute path of your service account's JSON key file.
- β E2E Test Convention: To make running E2E tests easier, this project uses the GCP_SERVICE_ACCOUNT_CREDENTIALSvariable (defined in.env.example). You should paste the full JSON content of your key there. The test suite will automatically handle creating a temporary file and setting theGOOGLE_APPLICATION_CREDENTIALSpath for you during the test run.
 
- 
gcloudCLI Credentials (for Local Development):- Use Case: The most common method for local development.
- To Use: If the GCP_SERVICE_ACCOUNT_CREDENTIALSvariable is not set, the application will use the credentials of the user logged into thegcloudCLI. To set this up, run:gcloud auth application-default login 
 
- 
Attached Service Account (Recommended for Production on GCP): - Use Case: When running the application on Google Cloud infrastructure (e.g., Cloud Run, GKE, Compute Engine).
- How it Works: The application automatically detects and uses the service account attached to the host resource. This is the most secure method for production as it eliminates the need to manage and store credential files.
- To Use: Ensure the GCP_SERVICE_ACCOUNT_CREDENTIALSenvironment variable is unset, and the host's service account has the necessary IAM permissions (e.g., "Vertex AI User"). Also, ensure any emulator-specific environment variables (likeGCP_GEMINI_HOST) are cleared so the application connects to the live Google Cloud APIs.
 
The application is controlled via a unified Command-Line Interface (CLI) accessible through the pd alias. This provides a structured and intuitive way to manage the application's lifecycle, from database migrations to data analysis.
The CLI is organized into logical groups:
- analysis: Commands for running the different stages of the procurement analysis pipeline.
- config: Tools for managing the application's configuration.
- db: Utilities for database management, including migrations.
- worker: Commands to control the background worker responsible for processing analysis tasks.
To see all available commands, you can run:
pd --helpThis group contains the core logic for the analysis pipeline.
- 
pd analysis prepare: Scans for new procurements within a given date range and prepares them for analysis.# Prepare procurements from a specific date range pd analysis prepare --start-date 2025-01-01 --end-date 2025-01-05
- 
pd analysis run: Triggers a specific analysis by its ID.# Run analysis for a specific ID pd analysis run --analysis-id "a1b2c3d4-..." 
- 
pd analysis rank: Ranks pending analyses based on a budget and triggers them.# Trigger ranked analysis with a manual budget pd analysis rank --budget 100.00
- 
pd analysis retry: Retries failed or stale analyses.# Retry analyses that have been stuck for 1 hour pd analysis retry --timeout-hours 1
Manage your application's environment settings.
- 
pd config list: Lists all configuration key-value pairs.# List all configurations pd config list # Show secret values without masking pd config list --show-secrets 
- 
pd config get: Retrieves a specific configuration value.# Get the value of a specific key pd config get POSTGRES_USER
- 
pd config set: Sets or unsets a configuration value.# Set a new value pd config set LOG_LEVEL "DEBUG" # Unset a value pd config set LOG_LEVEL --unset 
Handle database operations.
- 
pd db migrate: Applies all pending database migrations.pd db migrate 
- 
pd db downgrade: Reverts the last database migration.pd db downgrade 
- 
pd db reset: (Destructive) Resets the database to its initial state.pd db reset 
Control the background worker.
- 
pd worker start: Starts the worker to listen for and process analysis tasks from the queue.# Start the worker pd worker start
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. Please refer to the CONTRIBUTING.md file for details.
Distributed under the Creative Commons Attribution-NonCommercial 4.0 International License. See LICENSE for more information.
