🦙 Fine-Tune TinyLlama Locally (LoRA + Offline Inference)

This project shows how to fine-tune TinyLlama locally on your own machine using LoRA, with 100% offline capability — no cloud, no GPUs required (but supported), no hosted models.

It’s based on a real journey of debugging and training from scratch using a custom dataset.

✅ What You’ll Do

Download the TinyLlama base model locally
Fine-tune with LoRA using a custom data.jsonl
Merge LoRA weights into the base model
(Optionally) Convert to .gguf and run offline with llama.cpp

🧩 0.Requirements

Install dependencies:

 pip install transformers datasets peft accelerate bitsandbytes

🚫 Do NOT install bitsandbytes if you're on Windows or using an AMD GPU

📁 Folder Structure

standard-llama-finetune/
├── data.jsonl                      ← training dataset (editable)
├── step0_download_base_model.py    ← download base model from Hugging Face
├── step1_0_pdf_to_text.py          ← Convert PDF to text
├── step1_1_generate_jsonl.py       ← Generate JSONL file for fine-tuning data
├── step2_fine_tuning.py            ← download base model from Hugging Face
├── step3_merg.py                   ← fine-tune TinyLlama with LoRA
├── step4_test.py                   ← merge LoRA adapter into base model

🔽 0. Download the Base Model

python step0_download_base_model.py

This will save the model to ./tinyllama-base/

🔽 1.0. Convert PDF to Raw Text

python step1_0_pdf_to_text.py

This will save the .txt file at the root.

🔽 1.1. Generate JSONL Files

python step1_1_generate_jsonl.py

This will save the .jsonl file at the root.

🧠 2. Fine-Tune with LoRA

python step2_fine_tuning.py

Trains on data.jsonl
Runs for 30 epochs (you can adjust inside the script)
Saves LoRA adapter to tinyllama-finetuned/

🔗 3. Merge LoRA into Base Model

python step3_merg.py

Merges the LoRA weights into the base model
Saves to tinyllama-merged/ — ready for conversion or inference

🧪 4. Run Sanity Check (Optional)

python step4_test.py

Expected output:

Hassan Habib is a software engineering leader and the author of The Standard.

🦙 5. Convert to `.gguf` for llama.cpp (make sure you install CMake, clone and build llama.cpp)

cd llama.cpp/
python3 convert_hf_to_gguf.py ../tinyllama-merged --outfile standard-mini.gguf --outtype f16

Then run with:

./build/bin/llama-cli --model standard-mini.gguf --prompt "Describe Orchestration services"

Paste this prompt:

### Instruction:
Who is Hassan Habib?

### Input:

### Response:

📽️ Video Step-by-Step

How to Run AI Offline w/ .NET

https://www.youtube.com/watch?v=lc6lVCe0XHI&t=3s

How to Fine-Tune your AI Model

https://www.youtube.com/watch?v=FQr7VrK5RRQ&t=1087s

How to Feed your Llama Model (TXT to JSONL)

https://www.youtube.com/watch?v=YB9cVyjV9Bo

Make Your Offline AI Model Talk to Local SQL — Fully Private RAG with LLaMA + FAISS

https://www.youtube.com/watch?v=3jFpLNglWBc&t=293s

👨‍🏫 Author

Built and tested by Hassan Habib, fine-tuned with ❤️ and terminal grit.

Want to turn this into a video or GitHub tutorial? It’s built to teach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦙 Fine-Tune TinyLlama Locally (LoRA + Offline Inference)

✅ What You’ll Do

🧩 0.Requirements

📁 Folder Structure

🔽 0. Download the Base Model

🔽 1.0. Convert PDF to Raw Text

🔽 1.1. Generate JSONL Files

🧠 2. Fine-Tune with LoRA

🔗 3. Merge LoRA into Base Model

🧪 4. Run Sanity Check (Optional)

🦙 5. Convert to `.gguf` for llama.cpp (make sure you install CMake, clone and build llama.cpp)

📽️ Video Step-by-Step

How to Run AI Offline w/ .NET

How to Fine-Tune your AI Model

How to Feed your Llama Model (TXT to JSONL)

Make Your Offline AI Model Talk to Local SQL — Fully Private RAG with LLaMA + FAISS

👨‍🏫 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Readme.md		Readme.md
data.jsonl		data.jsonl
step0_download_base_model.py		step0_download_base_model.py
step1_0_pdf_to_text.py		step1_0_pdf_to_text.py
step1_1_generate_jsonl.py		step1_1_generate_jsonl.py
step2_fine_tuning.py		step2_fine_tuning.py
step3_merg.py		step3_merg.py
step4_test.py		step4_test.py
step5_rag_it.py		step5_rag_it.py
step6_agentic.py		step6_agentic.py

riteshverma/AI.Llama.Traing.Offline

Folders and files

Latest commit

History

Repository files navigation

🦙 Fine-Tune TinyLlama Locally (LoRA + Offline Inference)

✅ What You’ll Do

🧩 0.Requirements

📁 Folder Structure

🔽 0. Download the Base Model

🔽 1.0. Convert PDF to Raw Text

🔽 1.1. Generate JSONL Files

🧠 2. Fine-Tune with LoRA

🔗 3. Merge LoRA into Base Model

🧪 4. Run Sanity Check (Optional)

🦙 5. Convert to .gguf for llama.cpp (make sure you install CMake, clone and build llama.cpp)

📽️ Video Step-by-Step

How to Run AI Offline w/ .NET

How to Fine-Tune your AI Model

How to Feed your Llama Model (TXT to JSONL)

Make Your Offline AI Model Talk to Local SQL — Fully Private RAG with LLaMA + FAISS

👨‍🏫 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🦙 5. Convert to `.gguf` for llama.cpp (make sure you install CMake, clone and build llama.cpp)

Packages