This project explores the potential of using late-night foot traffic at Pentagon-area pizzerias as a proxy indicator of operational activity, leveraging automated Google Maps scraping with Playwright, scheduled every 15 minutes, and persistent logging in a SQLite database.
"Pizza intelligence" ("Pizzint") relies on the hypothesis that late-night gatherings at government sites correlate with increased local pizza delivery activity. Through the collection and analysis of such indirect behavioral signals, it becomes possible to detect anomalies that might suggest operational movements. This project illustrates how public data, when systematically processed, can yield insight into covert activity.
The project operates through an asynchronous web scraping pipeline that efficiently collects traffic data from multiple Pentagon-area pizzerias:
-
Browser Context Setup: Creates a Playwright browser instance with Chromium, configured with French locale and headers optimized for Google Maps scraping.
-
Concurrent Data Collection: Navigates to Google Maps business pages for each target pizzeria simultaneously, handling cookie consent dialogs and extracting live traffic percentages.
-
Data Processing: Parses French language traffic patterns ("Taux de fréquentation actuel") to extract both current and historical traffic percentages.
-
Database Logging: Stores timestamped metrics in a SQLite database, with automatic anomaly calculation to identify unusual activity spikes or drops.
-
Scheduled Execution: Runs every 15 minutes during pizzeria operating hours (10am-1am) when traffic data is available and meaningful for analysis.
inferring-pentagon-activity-from-pizza/
├── main.py # Async orchestration: manages browser contexts and concurrent scraping
├── initialize_db.py # Database initialization: creates SQLite schema with anomaly detection
├── utils/ # Core functionality modules
│ ├── functions.py # Web scraping functions using Playwright
│ └── factory.py # Pizzeria configuration and Google Maps URLs
├── data/ # SQLite database storage for traffic logs
├── .github/workflows/ # GitHub Actions automation for scheduled execution
└── README.md # Project documentation
-
Main Orchestrator (
main.py
): Manages the complete scraping workflow using asyncio for concurrent execution across multiple pizzerias during operating hours. -
Web Scraping Engine (
utils/functions.py
):- Creates optimized Playwright browser contexts with French locale
- Extracts live and historical traffic data from Google Maps aria-labels
- Handles cookie consent dialogs and data validation
-
Data Persistence (
initialize_db.py
& logging): Maintains SQLite database with automatic anomaly calculation for traffic pattern analysis.
The collected data is stored in an SQLite database (data/traffic_logs.db
) with the following schema:
Column | Description |
---|---|
pizzeria |
Name of the pizzeria being monitored |
timestamp |
Date and time when the data was collected |
day_of_week |
Day of the week for the timestamp |
hour |
Hour of the day (0-23) |
live_traffic |
Current live busyness percentage (0-100) |
historical_traffic |
Historical/expected busyness for this time |
anomaly |
Automatically calculated difference between live and historical traffic |
- Python 3.12 or later
- SQLite (bundled with Python)
- Playwright: Modern browser automation framework
- Chromium: Automatically installed via Playwright
- Clone the repository:
git clone https://github.com/gaeldatascience/inferring-pentagon-activity-from-pizza.git cd inferring-pentagon-activity-from-pizza
- Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # macOS/Linux .\.venv\Scripts\activate # Windows
- Install dependencies and Playwright browsers:
pip install -e . playwright install chromium
-
Execute the async scraper:
python main.py
The scraper will:
- Create concurrent pages for each pizzeria
- Extract traffic data from French Google Maps pages
- Log results to the SQLite database with anomaly detection
- Only run during operating hours
-
Inspect data:
- The SQLite file is located at
data/traffic_logs.db
. - Use any SQLite client (e.g.,
sqlite3
, DB Browser) to query and visualize trends.
- The SQLite file is located at
A GitHub Actions workflow (.github/workflows/scheduler.yml
) runs every 15 minutes:
name: Scheduled Traffic Collection
on:
schedule:
- cron: '*/15 * * * *'
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install Dependencies
run: |
python -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install chromium
- name: Run Scraper
run: python main.py
- name: Commit Database (optional)
run: |
git config user.name 'github-actions'
git config user.email '[email protected]'
git add data/traffic_logs.db
git commit -m 'Update traffic logs'
git push
- Target Restaurants: Edit the list in
utils/factory.py
to add or remove Google Maps URLs. - Scrape Interval: Modify the
cron
expression in the GitHub Actions file to change frequency.
Contributions and suggestions are welcome! To contribute:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/your-feature
). - Commit your changes (
git commit -m 'Add feature'
). - Push to your branch (
git push origin feature/your-feature
). - Open a Pull Request and describe your changes.
Please ensure code quality, include tests or examples where appropriate, and document new behavior.
This project is distributed under the MIT License. See the LICENSE file for details.
This project is for educational and experimental purposes only. It does not claim to provide accurate or actionable intelligence regarding Pentagon activities. The data collected is publicly available and should not be used for any unlawful or unethical purposes. Always respect privacy and data usage policies when scraping web content.