This project is a mini implementation of a GPT (Generative Pre-trained Transformer) model using PyTorch and Ray for distributed training. The codebase includes data preprocessing, model training, and evaluation.
TRAIN.py: Main script for training the GPT model.DATA/Train.csv: Training data in CSV format.DATA/Test.csv: Test data in CSV format.Test.ipynb: Jupyter notebook for testing and running the training script.
- Python 3.7+
- PyTorch
- Ray
- Transformers (Hugging Face)
- Accelerate
- Pandas
- Evaluate
-
Clone the repository:
git clone https://github.com/GboyeStack-Robotics-ML-Engineer/MINI-GPT.git cd MINI-GPT -
Install the required packages:
pip install -r requirements.txt
To train the model, run the TRAIN.py script with the appropriate arguments:
python TRAIN.py --use_gpu=True --trainer_resources CPU=2 GPU=0 --num_workers=2 --resources_per_worker CPU=1 GPU=1You can also use the Test.ipynb notebook to run the training script and test the model.
- Imports: The script imports necessary libraries including PyTorch, Ray, and Hugging Face Transformers.
- TrainGpt Function: This function handles the training loop, including data loading, model initialization, and training steps.
- Main Block: Parses command-line arguments and initializes the Ray trainer for distributed training.
- Setup: Clones the repository and installs dependencies.
- Training: Runs the training script with specified parameters.
- Testing: Contains code for testing the trained model.
The training and test data are stored in CSV files located in the DATA directory. The data is loaded and processed using the DataGenerator class.
The model is defined in the GPT.py file and uses the T5 architecture from Hugging Face Transformers.
-
Attention Is All You Need - The original Transformer paper by Vaswani et al.
-
An intuitve approach to Transformers - De-coded:Transformers explained in plain English | by Chris Hughes | Towards Data Science
-
The Illustrated Transformer - The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
-
Transformers and Multi-Head Attention - Tutorial 6: Transformers and Multi-Head Attention — UvA DL Notebooks v1.2 documentation
Feel free to open issues or submit pull requests if you have any improvements or bug fixes.
This project is licensed under the MIT License.