Skip to content

πŸ€– LLMify-Code – AI-ready code extraction with configurable ignore rules, token counting, and JSON/plain text output.

Notifications You must be signed in to change notification settings

marcoyuuu/LLMify-Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLMify-Code

LLMify-Code is a lightweight Python tool that transforms your local codebase into a single, well-structured text fileβ€”ready to be ingested by large language models like ChatGPT. Inspired by the need for tools like Gitingest but designed for private repositories, it enables secure, local code extraction for LLM processing without relying on third-party services. The tool extracts the entire directory tree and file contents (handling encoding issues gracefully) and can optionally count tokens using tiktoken. Output can be generated in plain text or JSON format.

This project was inspired by the personal need for a tool similar to Gitingest, but designed to work with private repositories and to be run locally, ensuring code privacy and security.

Note: For more detailed documentation (architecture, usage guides, configuration, contribution guidelines, etc.), please see the contents of the docs/ folder.

Features

  • Enhanced CLI Experience:
    Built with Typer for a modern, user-friendly command-line interface.

  • Rich Terminal Output:
    Uses Rich to display attractive, informative log messages.

  • Configurable Ignore Rules:
    Loads ignore rules from a YAML configuration file. The script first checks for a configuration file (llmify_config.yaml) in the target directory; if not found, it falls back to the one in the project root. This design allows external projects to either use their own configuration or rely on a default one.

  • Output Format Options:
    Choose between plain text output (default) and JSON output (including the directory tree, file metadata, and token counts) via the --output-format option.

  • Tokenization Support:
    Optionally count tokens in the extracted output (using tiktoken) to help manage prompt sizes for LLMs.

  • Robust File Reading:
    Files are read with errors="replace" to gracefully handle any encoding issues.

Repository Structure

LLMify-Code/
β”œβ”€β”€ .github/                   # GitHub workflows and related files
β”œβ”€β”€ docs/                      # Detailed documentation (overview, usage, config, etc.)
β”œβ”€β”€ src/
β”‚   └── extractor/
β”‚       β”œβ”€β”€ __init__.py        # Package initializer for extractor
β”‚       └── llm_code_prep.py   # Main extraction script
β”œβ”€β”€ tests/
β”‚   └── test_extractor.py      # Unit tests for the extraction tool
β”œβ”€β”€ CONTRIBUTING.md            # Contribution guidelines
β”œβ”€β”€ llmify_config.yaml         # Default YAML configuration for ignore rules
β”œβ”€β”€ README.md                  # This documentation file
└── requirements.txt           # Required Python packages with version numbers

Installation

  1. Clone the Repository:

    git clone https://github.com/marcoyuuu/LLMify-Code.git
    cd LLMify-Code
  2. (Optional) Create a Virtual Environment:

    python -m venv venv
    source venv/bin/activate   # On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt

Usage

Before running the script, ensure that Python can locate the extractor module by setting the PYTHONPATH to the src/ directory.

Setting PYTHONPATH

  • On Windows (PowerShell):

    $env:PYTHONPATH = "$PWD\src"

    (This sets PYTHONPATH for the current session.)

  • On Mac/Linux (Bash):

    export PYTHONPATH="$(pwd)/src"

    (To persist, add this to your shell’s startup file such as .bashrc or .zshrc.)

Running LLMify-Code

Run the tool using one of the following commands (from the project root). The script loads ignore rules from a YAML configuration fileβ€”first checking the target directory, then falling back to the project root.

1. Plain Text Extraction (with Tokenization):

python -m extractor.llm_code_prep --directory . --output codebase.txt --tokenize

Sample Output:

[21:42:30] Target directory: %s LLMify-Code
Configuration loaded from %s. llmify_config.yaml
Proceed with code extraction? [y/N]: y
[21:42:32] βœ… Code extracted successfully into %s codebase.txt
[21:42:33] Total token count: %s 6061

2. JSON Extraction:

python -m extractor.llm_code_prep --directory . --output codebase.json --output-format json

Sample Output:

[21:44:57] Target directory: %s LLMify-Code
Configuration loaded from %s. llmify_config.yaml
Proceed with code extraction? [y/N]: y
[21:44:59] βœ… JSON output generated successfully into %s codebase.json

3. Plain Text Extraction (Without Tokenization):

python -m extractor.llm_code_prep --directory . --output codebase.txt

Sample Output:

[21:48:09] Target directory: %s LLMify-Code
Configuration loaded from %s. llmify_config.yaml
Proceed with code extraction? [y/N]: y
[21:48:11] βœ… Code extracted successfully into %s codebase.txt

4. Using a Custom Configuration File:

If your target project does not include its own llmify_config.yaml, you can explicitly provide a path to one (for example, the default in LLMify-Code):

python -m extractor.llm_code_prep --directory "/path/to/TargetProject" --output codebase.txt --config "/path/to/LLMify-Code/llmify_config.yaml"

This ensures consistent ignore rules across multiple projects without copying the config file into every project.

Configuration

LLMify-Code loads ignore rules from a YAML configuration file. The script first checks for a file named llmify_config.yaml in the target directory. If it’s not found there, it falls back to the llmify_config.yaml in the project root.

Example llmify_config.yaml:

# llmify_config.yaml
# This configuration file allows you to customize ignore rules for LLMify-Code.

ignored_dirs:
  - .github
  - .pytest_cache
  - .pytest_cache/v
  - .pytest_cache/v/cache
  - .venv
  - venv
  - __pycache__

ignored_files:
  - .pylintrc
  - .gitignore
  - codebase.txt
  - codebase.json
  - "*.pyc"
  - "*.pyd"
  - "*.so"
  - "llm_code_prep.py"  # Ignore the extraction script itself if desired

Adjust the ignored_dirs and ignored_files as needed.

Testing

Unit tests are provided in the tests/ directory. To run the tests, execute:

pytest

Contributing

Contributions are welcome! Please follow these guidelines:

  • Fork the repository and create your feature branch.
  • Write tests for new features.
  • Ensure your code adheres to PEP 8 standards.
  • Open a pull request with a clear description of your changes.

See CONTRIBUTING.md for detailed instructions.

License

This project is licensed under the MIT License. See the LICENSE file for details.


LLMify-Code – Making your code LLM-ready, one file at a time!

Happy Coding!

About

πŸ€– LLMify-Code – AI-ready code extraction with configurable ignore rules, token counting, and JSON/plain text output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages