lang-learn

🗣️ Natural Language Processing (NLP)

Tools: Python, PyTorch, TensorFlow, Keras, NumPy, Pandas, Matplotlib, Sklearn, NLTK, LSI, GenSim, BERT, LogisticRegression, XGBoost, RandomForest, SVC / SVM, MLP, CNN, Hugging Face, Transformers, Falcon.7B, QLoRA, FAISS

data/Attention is All You Need.pdf (source) <<--- (Mikey @ The Goonies) "it all starts here" ♥

Dependencies:

Python 3.10
PyTorch
CUDA
Scikit-Learn
Transformers
Peft
FAISS-GPU

Medical Diagnosis Generator // Generative AI (GenAI) // RAG-LLM

medical-diagnosis.ipynb (standard CPU + single GPU)
medical-diagnosis_gcp.ipynb (cloud platform)
medical-diagnosis-gcp.md (to visualize on GitHub)

Dual-GPU files:

import_llm.py (updatable for security patches)
medical-diagnosis_gpu.py (to fine-tune the LLM)
requirements_llm.txt (worked on PowerShell until I switched to Linux for FAISS implementation... note some of these packages are already outdated)
test_query.py (to build the vector DB embedding map for RAG)
med_query.py (CLI script to accompany user prompt query)
conda-env.yml (full conda Linux environment)

About: This began as a learning project to better understand AIOps pipelines: the user provides a query consisting of a patient's symptoms and health status. The pipeline begins by loading Falcon-7B, a 7-billion parameter Large Language Model (LLM), along with its pre-trained weights from Hugging Face. A tokenizer from Hugging Face's Transformers library is used to convert text into model-compatible input embeddings. To enable memory-efficient fine-tuning on local consumer hardware, Quantized Low-Rank Adapters (QLoRA) reduce Falcon-7B’s precision from 16-bit to 4-bit, lowering memory requirements while preserving performance. Fine-tuning is then performed using a medical corpus (BI55/MedText) also from Hugging Face, allowing the system to perform domain-specific diagnostic reasoning. The resulting model is capable of generating medical diagnoses.

Purpose: My aim in building the dual-GPU RAG-LLM model was twofold- first, I wanted to learn how to execute a single project across multiple hardware configurations, and second, I wanted to structure the program so that its core functionality could be updated via a single internet-facing, patchable file.

Performance Evaluation: For fine-tuning the LLM, I experimented with different hardware setups to explore how they affect model performance: a standard laptop having a CPU and GPU, a cloud computing application, and an edge computing application with two GPUs. Holding all parameters equal, here are my results:

CPU+GPU: 3 hrs 37 min
Cloud: 55 mins
GPUx2: 1 hr 9 mins

Additionally, I wanted to test the dual-GPU edge computing approach using different parameters to observe metrics. Here are my results:

80/20 data split, 1/1/4 TES, 1 epoch, API tokenization: 1 hr 1 min, loss: 1.304, final ‖∇‖: 2.845, η: 2.83e-07
90/10 data split, 1/1/4 TES, 1 epoch, API tokenization: 1 hr 9 mins, loss: 1.290, final ‖∇‖: 3.155, η: 2.52e-07
90/10 data split, 1/1/4 TES, 3 epochs, API tokenization: 2 hrs 30 min, loss: 1.179, final ‖∇‖: ?, η: ? <-- didn't record
90/10 data split, 1/1/4 TES, 10 epochs, local tokenization: 12 hrs 1 min, loss: 0.826, final ‖∇‖: 22.707, η: 1.89e-08

where 'data split' represents the training/testing percentages, 'TES' represents TrainingArguments(per_device_train_batch_size=T, per_device_eval_batch_size=E, gradient_accumulation_steps=S). Both ‖∇‖ (L2 norm of the gradients) and η (learning rate) are extracted from the final training step of each experiment to monitor stability and convergence.

The final run was fully offline and exhibited the best overall performance with a final epoch average Training Loss of 0.5091 and a total average Training Loss of 0.8257 in 43279.64 seconds.

Inference: To build on what I learned, I started integrating a Retrieval-Augmented Generation (RAG) model to look up relevant data given a user's input query via word embedding; test_query.py uses FAISS vector DB to do this.

My process is outlined below.

Online setup actions:
git lfs install
git clone https://huggingface.co/datasets/BI55/MedText
CUDA_VISIBLE_DEVICES=0,1 python import_llm.py
The rest were done offline:
pip install -r requirements_gpu.txt (before import_llm.py)
CUDA_VISIBLE_DEVICES=0,1 python medical-diagnosis_gpu.py
sudo mount -o remount,size=16G /dev/shm (for offloading weights; sanity check df -h /dev/shm; am still toying with this)
CUDA_VISIBLE_DEVICES=0,1 && python test_query.py
CUDA_VISIBLE_DEVICES=0,1 && python med_query.py (for CLI querying)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
LICENSE		LICENSE
README.md		README.md
import_llm.py		import_llm.py
load_tokenizer.py		load_tokenizer.py
medical-diagnosis-gcp.md		medical-diagnosis-gcp.md
medical-diagnosis.ipynb		medical-diagnosis.ipynb
medical-diagnosis_gpu.py		medical-diagnosis_gpu.py
medical_diagnosis_gcp.ipynb		medical_diagnosis_gcp.ipynb
new_query.py		new_query.py
requirements_llm.txt		requirements_llm.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

lang-learn

🗣️ Natural Language Processing (NLP)

Medical Diagnosis Generator // Generative AI (GenAI) // RAG-LLM

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rlhinrichs/lang-learn

Folders and files

Latest commit

History

Repository files navigation

lang-learn

🗣️ Natural Language Processing (NLP)

Medical Diagnosis Generator // Generative AI (GenAI) // RAG-LLM

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages