An open-source tool to collect, process, and visualize GitHub Copilot usage data for any organization. Features a web-based dashboard for interactive analytics and supports data privacy by keeping all data local.
- π Privacy-First: All data stays local - no cloud dependencies
- π Interactive Dashboard: Beautiful Streamlit web interface
- π Multi-Organization: Collect data from multiple GitHub organizations
- π Rich Analytics: Usage trends, language breakdowns, editor statistics
- π Easy Deployment: Deploy dashboard to Streamlit Community Cloud
- π‘οΈ Secure: Uses GitHub CLI authentication - no token management
This project consists of three main components that work together in sequence:
- Purpose: Bash script that collects GitHub Copilot usage metrics
- What it does: Uses GitHub CLI to fetch raw metrics data from GitHub's API
- Output: Organizes JSON files in
data/year=YYYY/month=MM/DD-org.json
structure - Usage:
./collect_metrics.sh --org <organization>
- Purpose: Python script that processes collected JSON data
- What it does: Converts JSON files into a consolidated Parquet file for analysis
- Output: Generates
data.parquet
file ready for visualization - Usage:
python main.py
- Purpose: Streamlit web application for interactive analytics
- What it does: Generates charts, graphs and insights from the processed data
- Output: Interactive web dashboard with multiple visualizations
- Usage:
streamlit run dashboard.py
- GitHub CLI (gh): Must be installed and authenticated
- Python 3.8+: For data processing and dashboard
- jq: For JSON processing in the bash script
# Install GitHub CLI (if not already installed)
# On macOS
brew install gh
# On Ubuntu/Debian
sudo apt update && sudo apt install gh
# On other systems, see: https://cli.github.com/
# Install jq (if not already installed)
# On macOS
brew install jq
# On Ubuntu/Debian
sudo apt install jq
# Authenticate with GitHub CLI
gh auth login
# Install Python dependencies
pip install -r requirements.txt
# Collect metrics for your organization
./collect_metrics.sh --org your-organization-name
# Optional: specify custom data directory
./collect_metrics.sh --org your-org --data-dir ./custom-data-dir
Requirements:
- You must be authenticated with GitHub CLI (
gh auth login
) - Your account needs appropriate permissions to read Copilot metrics for the organization
- The organization must have GitHub Copilot enabled
# Process collected JSON files into consolidated parquet
python main.py
# Optional: specify custom directories
python main.py --data-dir ./custom-data-dir --output-dir ./custom-output
What happens:
- Reads all JSON files from the data directory
- Cleans and validates the data
- Combines data from multiple organizations and dates
- Outputs
data.parquet
file ready for analysis
# Run the dashboard locally with your data
streamlit run dashboard.py
Dashboard features:
- Interactive charts and graphs
- Date range filtering
- Organization comparison
- Language usage breakdown
- Editor statistics
- User engagement metrics
The processed data includes the following key metrics:
- Code Completions: Suggestions, acceptances, lines of code by language and editor
- Chat Interactions: Usage of Copilot chat features
- User Engagement: Active vs engaged users over time
- Editor Breakdown: Usage statistics by IDE (VS Code, JetBrains, etc.)
- Language Statistics: Most used programming languages
- Pull Request Summaries: GitHub.com integration metrics
- Local Processing: All data stays on your machine
- No Cloud Storage: No data is sent to external services
- Organization Control: You control what data to collect and analyze
- The project automatically anonymizes organization names in public deployments
- Sensitive data files are ignored by git (see
.gitignore
) - Raw API responses are cleaned up after processing
- β
Source code (
*.py
,*.sh
) - β
Configuration files (
requirements.txt
, etc.) - β
Documentation (
README.md
) - β Data files (
data/
,*.parquet
,*.csv
) - β Raw API responses (
raw_response_*.json
)
- Fork this repository to your GitHub account
- Visit share.streamlit.io
- Create a new app pointing to your fork
- Set main file path to
dashboard.py
- Deploy!
The deployed dashboard will:
- Show an upload interface for users to upload their
.parquet
files - Process data securely in the browser session only
- Display interactive analytics without storing any data on the server
- Clear all data when the session ends
- No Data Persistence: Cloud dashboard never stores user data
- Session-Only Processing: Data exists only during the browser session
- Client-Side Analytics: All processing happens in the user's browser
- Zero Server Storage: No data is saved on deployment servers
copilot-dashboard/
βββ collect_metrics.sh # Data collection script
βββ main.py # Data processing script
βββ dashboard.py # Streamlit dashboard
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules (includes data files)
βββ README.md # This documentation
βββ data/ # Local data storage (git-ignored)
βββ year=YYYY/
βββ month=MM/
βββ DD-org.json
Simply run the collection script with different --org
parameters:
./collect_metrics.sh --org organization-1
./collect_metrics.sh --org organization-2
./collect_metrics.sh --org organization-3
Then reprocess the data:
python main.py
The dashboard provides insights such as:
- Adoption Trends: How Copilot usage grows over time
- Language Preferences: Which programming languages benefit most from Copilot
- Editor Usage: VS Code vs JetBrains vs other IDEs
- Feature Usage: Code completions vs chat features
- Team Engagement: Active users vs engaged users ratios
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
"GitHub CLI not authenticated"
- Run
gh auth login
and follow the prompts
"Access denied" when collecting metrics
- Ensure your GitHub account has permissions to read Copilot metrics for the organization
- Contact your organization admin to grant appropriate access
"No data found to process"
- Check that the data collection step completed successfully
- Verify that JSON files exist in the
data/
directory structure
"Invalid JSON response"
- This usually indicates API rate limiting or authentication issues
- Wait a few minutes and try again
- Check your GitHub CLI authentication status
If you encounter issues:
- Check the troubleshooting section above
- Review the console output for specific error messages
- Ensure all prerequisites are properly installed
- Open an issue on GitHub with detailed error information
This script uses GitHub CLI to fetch Copilot usage data and organize it automatically.
# Navigate to the script directory
cd copilot-dashboard
# Install Python dependencies (if not already done)
pip install -r requirements.txt
# Run the collection script
./collect_metrics.sh --org <organization> [--data-dir ./data]
- You need to be authenticated with GitHub CLI (
gh auth login
) - Your account must have appropriate permissions to read Copilot metrics for the organization
- Specify the
--org
to collect data for - The default data directory is
./data
- The script will automatically organize data in the
year=YYYY/month=MM/DD.json
structure
After collecting the .json
files, run main.py
to process them.
# Ensure you are in the copilot-dashboard directory
cd copilot-dashboard
# Run the main processing script
python main.py [--data-dir ./data] [--output-dir .]
- This script reads JSON files from the
data
directory. - It cleans, merges, and enriches the data.
- It generates aggregated
data.parquet
anddata.csv
files.
# Run the dashboard locally with your generated data
streamlit run dashboard.py
- Visit our hosted dashboard: GitHub Copilot Analytics
- Upload your generated
.parquet
file using the sidebar - Explore your analytics instantly!
π Privacy Note: When using the online dashboard, your data is processed in your browser session only and is never stored on our servers.
The dashboard provides:
- Interactive charts showing Copilot usage trends
- User engagement metrics
- Language and editor breakdowns
- Chat usage statistics
- Pull request summary metrics
The data is stored in the following format:
data/
βββ year=2025/
β βββ month=01/
β β βββ 01.json
β β βββ 02.json
β β βββ ...
β βββ month=02/
β β βββ 01.json
β β βββ ...
Each JSON file contains the raw response from the GitHub Copilot metrics API for a specific day.
- No specific environment variables are required
- Authentication is handled through GitHub CLI (
gh auth login
)
This tool uses the GitHub REST API endpoint through GitHub CLI:
GET /orgs/{org}/copilot/metrics
For more information, see the GitHub API documentation.
- DEPLOYMENT.md: Complete guide for deploying to Streamlit Community Cloud
- DATA_STRUCTURE.md: Data schema and troubleshooting guide
- README.md: This file - getting started guide
# 1. Install and authenticate GitHub CLI
gh auth login
# 2. Install Python dependencies
pip install -r requirements.txt
# 3. Collect metrics for your organization
./collect_metrics.sh --org your-organization
# 4. Process data into analytics format
python main.py
# 5A. Run dashboard locally
streamlit run dashboard.py
# 5B. Or upload data.parquet to our hosted dashboard
# Visit: https://your-app.streamlit.app
- Run
streamlit run dashboard.py
for local development - All data stays on your machine
- Fork this repository
- Deploy to Streamlit Community Cloud
- Users upload their own data files
- No sensitive data stored in the cloud
This project is designed to be privacy-first and cloud-ready:
- Data Collection:
collect_metrics.sh
uses GitHub CLI to collect raw data via API - Data Processing:
main.py
processes the raw JSON files into a Parquet file - Local Storage: All data files are automatically excluded from git commits
- Local Dashboard: Run
streamlit run dashboard.py
for local analysis - Cloud Dashboard: Upload your
.parquet
file to our hosted Streamlit app - Privacy: Your data never leaves your control - upload only for analysis
- β Privacy-First: Data collection and storage happens locally
- β Cloud-Ready: Dashboard can be deployed to Streamlit Community Cloud
- β User Upload: Users upload their own data for analysis
- β No Data Persistence: Cloud dashboard doesn't store any user data
- β GitHub CLI Authentication: No token management needed
- β Multi-Organization Support: Collect from multiple GitHub orgs
data/
βββ year=2025/
β βββ month=04/
β β βββ 11-stone-payments.json
β β βββ 12-stone-payments.json
β β βββ 11-other-org.json
β β βββ ...
β βββ month=05/
β βββ 01-stone-payments.json
β βββ ...
βββ data-stone-payments.parquet (single org)
βββ data-other-org.parquet (single org)
βββ data-combined.parquet (all orgs)
- Collect data for different organizations:
./collect_metrics.sh --org org1
and./collect_metrics.sh --org org2
- Files are saved as
DD-<organization>.json
to avoid conflicts - Processing creates separate Parquet files per organization plus a combined file
- Dashboard allows selecting which organization to analyze
We welcome contributions to make this tool even better! Here's how you can help:
- π Report bugs via GitHub Issues
- π‘ Suggest features for new analytics or visualizations
- π Improve documentation with clearer examples
- π§ Submit code via Pull Requests
- β Star the repository to show your support
# Fork and clone the repository
git clone https://github.com/your-username/copilot-dashboard
cd copilot-dashboard
# Install development dependencies
pip install -r requirements.txt
# Run tests (if available)
python -m pytest
# Run the dashboard locally
streamlit run dashboard.py
- Follow existing code style and conventions
- Add documentation for new features
- Test your changes thoroughly
- Ensure data privacy and security best practices
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check DEPLOYMENT.md and DATA_STRUCTURE.md
- Issues: Report problems via GitHub Issues
- Discussions: Join conversations in GitHub Discussions
- Community: Connect with other users and contributors
- GitHub for providing the Copilot API
- Streamlit team for the amazing framework
- Open-source community for contributions and feedback
Ready to analyze your GitHub Copilot usage? Start collecting data and upload to the dashboard! π