Estimate how frequently Python packages are imported across public GitHub repositories.
We determine package popularity by:
- Randomly sampling GitHub repositories with Python as the main language
- Analyzing Python import statements in these repositories
- Extrapolating findings based on the total Python repository count (~18M repositories
The system continually improves its accuracy by sampling additional repositories every 6 hours via GitHub Actions.
Note: We have stopped considering standard Python libraries but have not yet removed all the data.
Script | Purpose |
---|---|
find_repos.py | Queries GitHub API for random Python repositories |
analyze_imports.py | Extracts import statements from repository files |
count_libs.py | Aggregates and calculates package usage statistics |
update_readme.py | Refreshes this README with latest data |
total_python_repos.ipynb | Estimates total Python repository count on GitHub |
File | Description | Format |
---|---|---|
repos.jsonl | Details of processed repositories | JSONL |
imports.jsonl | Raw import statements extracted from repos | JSONL |
library_counts.csv | Aggregated package usage statistics | CSV |
Our GitHub Actions workflow orchestrates the entire process:
Find Random Repos → Analyze Imports → Count Package Usage → Update Statistics → Refresh README
Rank | Library | Count |
---|---|---|
1 | numpy | 5235 |
2 | matplotlib | 1672 |
3 | torch | 1625 |
4 | pandas | 1495 |
5 | django | 1125 |
6 | cv2 | 1033 |
7 | requests | 1021 |
8 | utils | 862 |
9 | sklearn | 860 |
10 | tensorflow | 803 |
Last updated: 2025-04-20 12:46:12 UTC