A TypeScript CLI that fetches a GitHub repository and concatenates all text-like files into one consolidated text document. The tool streams results into the out/
directory.
- Node.js 18+
- A GitHub personal access token with
repo
scope set asGITHUB_TOKEN
in a local.env
file
GITHUB_TOKEN=ghp_your_token_here
npm install
Run the exporter with:
npm run fetch -- https://github.com/owner/repository
Generate a PDF instead of plain text:
npm run fetch -- --pdf https://github.com/owner/repository
Select a specific branch or tag (defaults to the repository's default branch when omitted):
npm run fetch -- --branch release https://github.com/owner/repository
The CLI will:
- Fetch repository metadata and estimate the number of API requests required
- Warn if the upcoming run would exceed your remaining GitHub quota
- Prompt for confirmation before downloading blobs
- Stream progress updates (
current/total
) while downloading and when generating PDFs - Resume from previous attempts via cached download checkpoints, so reruns only fetch missing files
- Export from the requested branch (or the default branch when none is provided) and record both in the output header for traceability
- Write the final merged output into the
out/
directory as<repo>-<branch>.txt
or.pdf
Tip: use npm run build
to emit the compiled ESM bundle into dist/
if you want to run the CLI directly via node dist/main.js
.
npm test
The Jest suite covers core helpers (URL parsing, text/binary detection, planning estimates, and progress reporting).
main.ts
– CLI entry pointsrc/
– modular implementation (GitHub client, exporter, progress reporter, configuration)out/
– generated output files__tests__/
– Jest test suites for reusable modules
- Large files (>5 MB) and likely-binary blobs are skipped automatically.
- Media, archive, and lock files (e.g. png, mp3, zip, gz, yarn.lock) are detected by extension and excluded pre-emptively.
- Downloads exit immediately if the GitHub rate limit is hit and automatically retry on transient network/5xx errors.
- When GitHub truncates the repository tree, the tool surfaces a warning in both the CLI and output footer.
- Concurrency defaults to 8 parallel blob requests; adjust
MAX_CONCURRENCY
insrc/config.ts
if needed.