GitHub - gaeldatascience/inferring-pentagon-activity-from-pizza: Using real-time busyness data from nearby pizzerias to infer possible patterns of Pentagon activity.

Inferring Pentagon Activity from Pizza

This project explores the potential of using late-night foot traffic at Pentagon-area pizzerias as a proxy indicator of operational activity, leveraging automated Google Maps scraping with Playwright, scheduled every 15 minutes, and persistent logging in a SQLite database.

Motivation

"Pizza intelligence" ("Pizzint") relies on the hypothesis that late-night gatherings at government sites correlate with increased local pizza delivery activity. Through the collection and analysis of such indirect behavioral signals, it becomes possible to detect anomalies that might suggest operational movements. This project illustrates how public data, when systematically processed, can yield insight into covert activity.

How It Works

The project operates through an asynchronous web scraping pipeline that efficiently collects traffic data from multiple Pentagon-area pizzerias:

Browser Context Setup: Creates a Playwright browser instance with Chromium, configured with French locale and headers optimized for Google Maps scraping.
Concurrent Data Collection: Navigates to Google Maps business pages for each target pizzeria simultaneously, handling cookie consent dialogs and extracting live traffic percentages.
Data Processing: Parses French language traffic patterns ("Taux de fréquentation actuel") to extract both current and historical traffic percentages.
Database Logging: Stores timestamped metrics in a SQLite database, with automatic anomaly calculation to identify unusual activity spikes or drops.
Scheduled Execution: Runs every 15 minutes during pizzeria operating hours (10am-1am) when traffic data is available and meaningful for analysis.

Project Structure

inferring-pentagon-activity-from-pizza/
├── main.py              # Async orchestration: manages browser contexts and concurrent scraping
├── initialize_db.py     # Database initialization: creates SQLite schema with anomaly detection
├── utils/               # Core functionality modules
│   ├── functions.py     # Web scraping functions using Playwright
│   └── factory.py       # Pizzeria configuration and Google Maps URLs
├── data/                # SQLite database storage for traffic logs
├── .github/workflows/   # GitHub Actions automation for scheduled execution
└── README.md            # Project documentation

Core Components

Main Orchestrator (main.py): Manages the complete scraping workflow using asyncio for concurrent execution across multiple pizzerias during operating hours.
Web Scraping Engine (utils/functions.py):
- Creates optimized Playwright browser contexts with French locale
- Extracts live and historical traffic data from Google Maps aria-labels
- Handles cookie consent dialogs and data validation
Data Persistence (initialize_db.py & logging): Maintains SQLite database with automatic anomaly calculation for traffic pattern analysis.

Data Storage

The collected data is stored in an SQLite database (data/traffic_logs.db) with the following schema:

Table: `traffic_logs`

Column	Description
`pizzeria`	Name of the pizzeria being monitored
`timestamp`	Date and time when the data was collected
`day_of_week`	Day of the week for the timestamp
`hour`	Hour of the day (0-23)
`live_traffic`	Current live busyness percentage (0-100)
`historical_traffic`	Historical/expected busyness for this time
`anomaly`	Automatically calculated difference between live and historical traffic

Prerequisites

Python 3.12 or later
SQLite (bundled with Python)
Playwright: Modern browser automation framework
Chromium: Automatically installed via Playwright

Installation

Clone the repository:

git clone https://github.com/gaeldatascience/inferring-pentagon-activity-from-pizza.git
cd inferring-pentagon-activity-from-pizza

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate    # macOS/Linux
.\.venv\Scripts\activate   # Windows

Install dependencies and Playwright browsers:

pip install -e .
playwright install chromium

Usage

Execute the async scraper:
```
python main.py
```
The scraper will:
- Create concurrent pages for each pizzeria
- Extract traffic data from French Google Maps pages
- Log results to the SQLite database with anomaly detection
- Only run during operating hours
Inspect data:
- The SQLite file is located at data/traffic_logs.db.
- Use any SQLite client (e.g., sqlite3, DB Browser) to query and visualize trends.

Scheduling and Automation

A GitHub Actions workflow (.github/workflows/scheduler.yml) runs every 15 minutes:

name: Scheduled Traffic Collection
on:
  schedule:
    - cron: '*/15 * * * *'
jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install Dependencies
        run: |
          python -m venv .venv
          source .venv/bin/activate
          pip install -e .
          playwright install chromium
      - name: Run Scraper
        run: python main.py
      - name: Commit Database (optional)
        run: |
          git config user.name 'github-actions'
          git config user.email '[email protected]'
          git add data/traffic_logs.db
          git commit -m 'Update traffic logs'
          git push

Configuration

Target Restaurants: Edit the list in utils/factory.py to add or remove Google Maps URLs.
Scrape Interval: Modify the cron expression in the GitHub Actions file to change frequency.

Contributing

Contributions and suggestions are welcome! To contribute:

Fork the repository.
Create a feature branch (git checkout -b feature/your-feature).
Commit your changes (git commit -m 'Add feature').
Push to your branch (git push origin feature/your-feature).
Open a Pull Request and describe your changes.

Please ensure code quality, include tests or examples where appropriate, and document new behavior.

License

This project is distributed under the MIT License. See the LICENSE file for details.

Disclaimer

This project is for educational and experimental purposes only. It does not claim to provide accurate or actionable intelligence regarding Pentagon activities. The data collected is publicly available and should not be used for any unlawful or unethical purposes. Always respect privacy and data usage policies when scraping web content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inferring Pentagon Activity from Pizza

Motivation

How It Works

Project Structure

Core Components

Data Storage

Table: `traffic_logs`

Prerequisites

Installation

Usage

Scheduling and Automation

Configuration

Contributing

License

Disclaimer

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1,452 Commits
.github/workflows		.github/workflows
data		data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
initialize_db.py		initialize_db.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

gaeldatascience/inferring-pentagon-activity-from-pizza

Folders and files

Latest commit

History

Repository files navigation

Inferring Pentagon Activity from Pizza

Motivation

How It Works

Project Structure

Core Components

Data Storage

Table: traffic_logs

Prerequisites

Installation

Usage

Scheduling and Automation

Configuration

Contributing

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

Table: `traffic_logs`