Skip to content

vasillieux/escaped

Repository files navigation

escaped

photo_2025-08-25_00-36-11-2

w-w-what? Semantic scanner for thousands of (git-) github repositories on leaing sensitive information.

flow:

  1. Parsing repositories

  2. Checking deleted/history data with filters (blobs data)

    • Codebase languages ( contains .html, .py, .rs, .json, .sol )
    • Codebase configuration files (docker-compose.yaml, .env, .k8s, '.js', .py)
    • Generic binary files
      • Compiled .exe with patched secrets
      • .pyc and others language-specific precompiled/cache
  3. Simply, for each of the blob filetype we have to run different regexp parsers + trufflehog.

installation (No DOCKER)

prerequirements

Then run with python 3.12

  • pip install -r requirements.txt or
  • pip install -e .

using GH, to locate the repositories you want to parse.

solidity projects with "hardhat"

  • gh search repos --limit 500 --json fullName --jq '.[].fullName' 'hardhat language:Solidity' > hardhat_solidity_repos.txt

python projects using web3.py

  • gh search repos --limit 500 --json fullName --jq '.[].fullName' 'language:Python web3.py' > python_web3py_repos.txt

files named .env that might contain PRIVATE_KEY

  • gh search code --limit 500 --json repository.fullName,path --jq '.[] | .repository.fullName + "/" + .path' 'filename:.env PRIVATE_KEY' > potential_dotenv_leaks.txt

Make sure that redis is running To start redis:

  • docker-compose up redis

usage (Manual)

Start Crawler Worker(s):

  • rq worker -c escaped.config escaped_crawler_queue --url redis://localhost:6379/0

Start Analyzer Worker(s):

  • rq worker -c escaped.config escaped_analyzer_queue --url redis://localhost:6379/1

To empty queues:

  • rq empty escaped_crawler_queue --url redis://localhost:6379/0
  • rq empty escaped_analyzer_queue --url redis://localhost:6379/1

Help menu:

  • python escaped/submit_jobs.py -h (to get latest help menu)

Submit jobs with orgmode:

  • python3 escaped/submit_jobs.py orgs -f your_orgs_file.txt

installation (Docker)

Simply make sure you have docker engine, docker-compose But you should probably login in your gh via github-cli. Docker-compose will mount this folder from your local machine

volumes:
  - ~/.config/gh:/root/.config/gh:ro

usage (Docker)

  • docker-compose up --build

Check logs

Submit docker-compose logs -f crawler_worker docker-compose logs -f analyzer_worker

process

After start the analyzer will clone with batch the specified repository in ./analysis_output

And the tree will looks like

analysis_output
    ├── cloned_repos 
    ├── custom_regex_findings
    ├── dangling_blobs
    ├── restored_files
    └── trufflehog_findings

constraints

  • GH api limit 5000 requests/hour
  • Limited heuristics
  • No post-analysis of output data
  • Only GITHUB (for a while).

how git works (learn)

basically, each (binary, source) file's content (!) is indexed via blob in git. next here's something called a tree. intuition may sound like this: tree is snapshot. it's the kind of hierarchy, that tracks filenames

you can get tree by commit (btw, branches are just the specific commits)

  1. getting my dev's branch commit hash
-> % [arch] cat .git/refs/heads/dev
84274ed015f3b6b69e9236d18dd0ee1db2ceeaa8
  1. pprinted info about the commit itself
-> % [arch] git cat-file -p 84274ed015f3b6b69e9236d18dd0ee1db2ceeaa8
tree 39dfdc64ba8df7dc764286a7edf179d7940e9c89
parent 2ff633778ec99710b337f7307f8362b1366a00d9
author Paradox <> 1755767343 +0300
committer Paradox <> 1755767343 +0300
  1. pprinted info about the tree
-> % [arch] git cat-file -p  39dfdc64ba8df7dc764286a7edf179d7940e9c89
100644 blob e9af592d3033e0fb0e824fded66d90973927a1c9    .env.sample
100644 blob 6c05d8e19fb3840465a2b2f6f491c612c9ffc404    .gitignore
100644 blob fcc45fc5d0a5ac8026b461f18575efac97cc7eae    README.md
100644 blob 78fbdb5c636fd2407c3c9c2164eff015c2705ec0    base.Dockerfile
100644 blob bf7d9678581724e0d4ea2e5c565c844959d7aab5    docker-compose.yml
040000 tree 700e0e0b58d25fea2b1a4a4ba59aacc800a6f13b    escaped
100644 blob 0ec1a553fba8fb5ac65e4c54367d2ad39493b556    pyproject.toml

nice. we can see a lot of blob here. now it's time for diagram. similiar (but slightly changed structure for the strong effect) is illustrated below


             (parent)             (parent)
  Commit A <---------- Commit B <---------- Commit C [HEAD, main]
     |                    |                    |
     |                    |                    |
  Tree A               Tree B               Tree C
   /    \                 /    \                 |
  /      \               /      \                |
Blob A   Blob B        Blob A    Blob C        Blob C
(README) (config.yml)   (README)  (README v2)   (README v2)
         |
         | -- reading blob 
         |
      "SECRET_KEY=..."
  1. now we can pprinted blob itself (by it's pointer we get exactly the state of the blob, meaning the exact sourcecode in our case)
-> % [arch] git cat-file blob 0ec1a553fba8fb5ac65e4c54367d2ad39493b556
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "escaped"
...

what we can do else? scan for diff. going from commit C to commit B (parent), we can produce

git diff --name-status <Commit B> <Commit C>

which basically return the mode and file, like modified or deleted (M and D accordingly) therefore progam's understand which file has been modified/deleted and start to hunt on it.

it issues

git show <Commit B>`:filepath

and then do something similiar to described above for finding blob. the content then (contained potential SECRET_KEY) is restored and saved into analysis_output folder

credits

Thanks trufflehog for the great security and reconnaissance tool! You can find it at - https://github.com/trufflesecurity/trufflehog

P.S: Honestly, I just wanted to get a better understanding of the git

About

Scan thousands git repositories for leaked secrets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •