Weaving the world's political data into a unified tapestry
PoliLoom is a high-performance data pipeline that extracts, enriches, and validates political entity data from Wikipedia and Wikidata at scale. Built with modern Python and TypeScript, it leverages LLMs to transform unstructured web content into structured, verifiable political metadata.
The world's political data is fragmented across thousands of Wikipedia articles in hundreds of languages. PoliLoom solves this by:
- Massive Scale Processing: Handles the entire Wikidata dump (1TB+ uncompressed) with parallel processing
- AI-Powered Extraction: Uses OpenAI's structured output API to extract political positions, dates, and relationships with high accuracy
- Community-Driven Validation: Every piece of extracted data goes through human verification before entering Wikidata
- Real-time Enrichment: Continuously discovers and extracts new political data as it appears on the web
- Tech Stack: Python, FastAPI, PostgreSQL with pgvector, SQLAlchemy
- Parallel Processing: Multi-core Wikidata dump processing with near-linear scaling
- Vector Search: Semantic similarity matching for entity resolution using sentence transformers
- Two-Stage LLM Pipeline: Overcomes API limitations by combining free-form extraction with vector-based mapping
- Tech Stack: Next.js 15+, React 19+, TypeScript, Tailwind CSS
- OAuth Integration: Seamless Wikipedia/MediaWiki authentication
- Optimized UX: Single-task interface for efficient data validation
- Real-time Updates: SWR-powered data synchronization
# Clone and setup
git clone https://github.com/yourusername/poliloom.git
cd poliloom
# Backend setup
cd poliloom
uv venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
uv pip install -e .
docker-compose up -d # PostgreSQL with pgvector
# Frontend setup
cd ../poliloom-gui
npm install
npm run dev
# Download and process Wikidata (one-time setup)
make download-wikidata-dump # ~100GB compressed
make extract-wikidata-dump # Requires lbzip2, ~1TB uncompressed
# Build entity hierarchies (required first)
poliloom dump build-hierarchy
# Import entities
poliloom dump import-entities
poliloom dump import-politicians
# Generate embeddings for similarity search
poliloom positions embed
poliloom locations embed
# Enrich a politician
poliloom politicians enrich --id Q7747
We're building the future of open political data, and we need your help! Whether you're interested in:
- 🐍 Python Backend: Optimize dump processing, improve LLM pipelines, add new data sources
- ⚛️ React Frontend: Enhance the validation interface, improve UX, add visualization features
- 🤖 AI/ML: Improve extraction accuracy, experiment with different models, optimize embeddings
- 🗃️ Data Quality: Help validate extracted data, identify edge cases, improve matching algorithms
Check out our active discussion thread where development happens in real-time.
- Performance Optimization: The dump processing pipeline always needs speed improvements
- Language Support: Extend extraction to non-English Wikipedia articles
- Entity Resolution: Improve the vector similarity matching for positions and locations
- Data Sources: Add support for parliamentary websites, news articles, and other sources
- Validation Interface: Make the confirmation process even more efficient and enjoyable
- Chunk-based Parallel Processing: Splits Wikidata dumps into byte ranges for true parallelism
- Hierarchical Entity Resolution: Builds complete descendant trees for 200K+ political positions
- Smart Conflict Detection: Identifies discrepancies between sources for human review
- Production-Ready: Comprehensive error handling, retry logic, and monitoring hooks
- Processes 100M+ Wikidata entities in hours, not days
- Tracks 200,000+ political positions across all countries
- Handles 78,000+ positions for large countries like France
- Scales linearly up to 32+ CPU cores
We're not just building a data pipeline—we're creating a living, breathing repository of the world's political landscape. By making this data accessible and verifiable, we enable:
- Journalists tracking political careers across borders
- Researchers studying political trends and patterns
- Citizens understanding their representatives better
- Developers building the next generation of civic tools
Join us in making political data truly open and accessible. Together, we can weave a complete picture of global governance.
Built with ❤️ by the open data community | Discuss | API Docs