Skip to content

hunsche/public-detective

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

CI Code Coverage Docstring Coverage License: CC BY-NC 4.0 Python Version Poetry Code style: black pre-commit Mypy Flake8 Bandit isort Vulture Alembic Click

Public Detective Logo

Public Detective

Open Source Data Investigation

An AI-powered tool for enhancing transparency and accountability in Brazilian public procurement.

πŸš€ See the live platform! πŸš€

πŸ•΅οΈβ€β™‚οΈ What's This All About?

Ever feel like public spending is a black box? In Brazil, billions are spent on public contracts, but keeping an eye on all of it is a Herculean task. Mistakes, inefficiencies, and even fraud can hide in mountains of documents.

Public Detective is here to change the game. We're an AI-powered watchdog that sniffs out irregularities in public tenders. Think of it as a digital detective, working 24/7 to help journalists, activists, and you demand transparency.

This isn't just code; it's a mission. Developed at PUCPR with the help of the amazing folks at TransparΓͺncia Brasil, this project puts cutting-edge tech in the hands of the people.

🌟 Core Features

  • πŸ€– Automated Data Retrieval: Fetches procurement data directly from the official PNCP APIs.
  • πŸ’‘ AI-Powered Analysis: Uses a Generative AI model to flag potential red flags and provide a detailed risk score with a rationale.
  • πŸ—ƒοΈ Full Traceability: Archives both original and processed documents in Google Cloud Storage for every analysis.
  • πŸ›‘οΈ Idempotent by Design: Avoids re-analyzing unchanged documents by checking a content hash.

βš™οΈ How the Magic Happens

The application operates in a two-stage pipeline: a lightweight Pre-analysis stage to discover and prepare data, followed by an on-demand, AI-powered Analysis stage. This decoupled architecture ensures efficiency and cost-effectiveness.

Here’s a simplified look at how it works:

graph LR
    subgraph "Input"
        A[Public Procurement Data]
    end

    subgraph "Public Detective's Magic"
        B(Automated Analysis)
        C(AI-Powered Insights)
        D(Risk Scoring)
    end

    subgraph "Output"
        E[Transparency Reports]
        F[Actionable Insights for Journalists & Activists]
    end

    A --> B;
    B --> C;
    C --> D;
    D --> E;
    D --> F;
Loading

πŸ› οΈ Built With

  • Language: Python 3.12+

  • AI / NLP: Google Gemini API

  • CLI Framework: Click

  • Database & Migrations: PostgreSQL, managed with Alembic

  • Core Toolkit:

    • SQLAlchemy Core: For writing safe, raw SQL queries.
    • Pydantic: For data validation and settings management.
    • Tenacity: For robust HTTP request retries.
    • LibreOffice Headless: For office document conversion.
  • Infrastructure: Docker, Google Cloud Storage, Google Cloud Pub/Sub

🏁 Get Started

To get a local copy up and running, follow these simple steps.

Prerequisites

  • Python 3.12
  • Poetry
  • Docker
  • LibreOffice Headless

βš™οΈ Installation

  1. Clone the repository:

    git clone https://github.com/hunsche/public-detective.git
    cd public-detective
  2. Install dependencies:

    poetry install
  3. Set up environment variables: Create a .env file from the example. This is primarily used to configure local emulators.

    cp .env.example .env

    Authentication with Google Cloud is handled automatically. See the Authentication section for more details.

  4. Start services:

    docker compose up -d
  5. Apply database migrations:

    poetry run alembic upgrade head

πŸ” Authentication

This project uses the Vertex AI backend for the Google Gemini API and authenticates using a standard Google Cloud pattern called Application Default Credentials (ADC). This provides a secure and flexible mechanism that works across different environments.

The application attempts to find credentials in the following order:

  1. GOOGLE_APPLICATION_CREDENTIALS Environment Variable:

    • Use Case: This is the standard Google Cloud method to force the application to use a specific service account. It's useful for local development or CI/CD.
    • To Use: Set the environment variable to the absolute path of your service account's JSON key file.
    • ⭐ E2E Test Convention: To make running E2E tests easier, this project uses the GCP_SERVICE_ACCOUNT_CREDENTIALS variable (defined in .env.example). You should paste the full JSON content of your key there. The test suite will automatically handle creating a temporary file and setting the GOOGLE_APPLICATION_CREDENTIALS path for you during the test run.
  2. gcloud CLI Credentials (for Local Development):

    • Use Case: The most common method for local development.
    • To Use: If the GCP_SERVICE_ACCOUNT_CREDENTIALS variable is not set, the application will use the credentials of the user logged into the gcloud CLI. To set this up, run:
      gcloud auth application-default login
  3. Attached Service Account (Recommended for Production on GCP):

    • Use Case: When running the application on Google Cloud infrastructure (e.g., Cloud Run, GKE, Compute Engine).
    • How it Works: The application automatically detects and uses the service account attached to the host resource. This is the most secure method for production as it eliminates the need to manage and store credential files.
    • To Use: Ensure the GCP_SERVICE_ACCOUNT_CREDENTIALS environment variable is unset, and the host's service account has the necessary IAM permissions (e.g., "Vertex AI User"). Also, ensure any emulator-specific environment variables (like GCP_GEMINI_HOST) are cleared so the application connects to the live Google Cloud APIs.

πŸ’» How to Use

The application is controlled via a unified Command-Line Interface (CLI) accessible through the pd alias. This provides a structured and intuitive way to manage the application's lifecycle, from database migrations to data analysis.

Core Commands

The CLI is organized into logical groups:

  • analysis: Commands for running the different stages of the procurement analysis pipeline.
  • config: Tools for managing the application's configuration.
  • db: Utilities for database management, including migrations.
  • worker: Commands to control the background worker responsible for processing analysis tasks.

To see all available commands, you can run:

pd --help

analysis Group

This group contains the core logic for the analysis pipeline.

  • pd analysis prepare: Scans for new procurements within a given date range and prepares them for analysis.

    # Prepare procurements from a specific date range
    pd analysis prepare --start-date 2025-01-01 --end-date 2025-01-05
  • pd analysis run: Triggers a specific analysis by its ID.

    # Run analysis for a specific ID
    pd analysis run --analysis-id "a1b2c3d4-..."
  • pd analysis rank: Ranks pending analyses based on a budget and triggers them.

    # Trigger ranked analysis with a manual budget
    pd analysis rank --budget 100.00
  • pd analysis retry: Retries failed or stale analyses.

    # Retry analyses that have been stuck for 1 hour
    pd analysis retry --timeout-hours 1

config Group

Manage your application's environment settings.

  • pd config list: Lists all configuration key-value pairs.

    # List all configurations
    pd config list
    
    # Show secret values without masking
    pd config list --show-secrets
  • pd config get: Retrieves a specific configuration value.

    # Get the value of a specific key
    pd config get POSTGRES_USER
  • pd config set: Sets or unsets a configuration value.

    # Set a new value
    pd config set LOG_LEVEL "DEBUG"
    
    # Unset a value
    pd config set LOG_LEVEL --unset

db Group

Handle database operations.

  • pd db migrate: Applies all pending database migrations.

    pd db migrate
  • pd db downgrade: Reverts the last database migration.

    pd db downgrade
  • pd db reset: (Destructive) Resets the database to its initial state.

    pd db reset

worker Group

Control the background worker.

  • pd worker start: Starts the worker to listen for and process analysis tasks from the queue.

    # Start the worker
    pd worker start

πŸ™Œ Join the Mission!

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. Please refer to the CONTRIBUTING.md file for details.

πŸ“„ License

Distributed under the Creative Commons Attribution-NonCommercial 4.0 International License. See LICENSE for more information.

πŸ“¬ Get In Touch

Matheus Hunsche Matheus Aoki Hunsche

About

An AI-powered tool for enhancing transparency and accountability in Brazilian public procurement.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •