Skip to content

This project is based on the IASNLP Summer School (Comparison of Encoder-Decoder and Decoder only models)

Notifications You must be signed in to change notification settings

sujaykumarmag/iasnlp

Repository files navigation

Multilingual Neural Machine Translation (NMT) Dataset for In-context Learning, Finetuning, and Baseline Model Development

Table of Contents

Problem Statement

Multilingual Neural Machine Translation (NMT) enables training a single model capable of translating between multiple source and target languages. Traditional approaches use encoder-decoder architectures, while recent advancements explore the use of Large Language Models (LLMs) for Multilingual Machine Translation (MMT). This project investigates:

  1. Performance Comparison: Evaluate the performance of encoder-decoder based MT versus smaller LLMs trained on the same data with similar parameters.

  2. Context Role Quantification: Analyze the impact of context (number of tokens) on translation quality for both architectures.

Dataset

The dataset provided includes:

  • One-to-One translations

  • One-to-Many translations

  • Many-to-One translations

  • MT Dataset: Contains data necessary for training and evaluation across various translation scenarios.

  • Google Drive Link: MT Dataset and Results

Installation

  1. Clone the repository:

    git clone https://github.com/sujaykumarmag/iasnlp.git
    cd DSP
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts�ctivate`
  3. Install the required packages:

    pip install -r requirements.txt

Configuration

Configuration parameters will be saved in runs/args.yaml for each experiment

Example args.yaml

batchsize: 10
cross_lang: false
direction_order: null
disorder: false
ex_lang: null
experiment: icl
ice_num: 8
lang_order: null
lang_pair: eng-hin
lora_alpha: 16
lora_dropout: 0.05
lr: 0.001
many2one: false
model_name: facebook/xglm-564M
model_type: enc_dec
multi: eng hin mar
numepochs: 5
one2many: false
oracle: false
output_dir: runs/
prompt_template: </E></X>=</Y>
repeat: false
retriever: random
reverse_direction: false
run_all_icl: true
seed: 43
tokenizer_name: google/mt5-base

File Structure

root_directory/
├── src/
│   ├── baselines.py
│   │
│   ├── dataset.py
│   │
│   ├── decoder_only.py
│   │
│   ├── icl.py
│   │
│   ├── utils.py
│   │
│   ├── mbart/ (from hugging face)
│   │   ├── configuration_mbart.py
│   │   ├── modeling_mbart.py 
│   │   └── tokenization_mbart.py
│   │
│   │
│   ├── xlnet/ (from hugging face)
│   │   ├── configuration_xlnet.py
│   │   ├── modeling_xlnet.py 
│   │   └── tokenization_xlnet.py
│   │
│   │
│   ├── training/
│       ├── normal_train.py
│       └── training.py
│  
├── finetuning\ (all notebooks ran on kaggle)
│  
├── runs/
│   ├── exp1
│   ├── exp2
│   ├── exp3
│   ├── exp4
│   ├── exp5 
│   ├── exp6
│   ├── exp7
│   ├── exp8
│   └── exp9
│   
│   
├── train.py  # (Entry Point for the Program)
│   
├── trainall.bash
└── notebooks/
    └── train.ipynb

Demo Video

code.mp4

Pending Tasks

  • Include Finetuning Code
  • Enhance documentation with more detailed explanations (report in IASNLP_Project_report.pdf)
  • Add support for GPU training (MPS for mac and cuda for Nvidia)
  • Research on SSA Attention Method

References

Cite this Work

@misc{iasnlp_project,
  author = {Abhinav P.M ., SujayKumar Reddy M., Oswald.C (Machine Translators)},
  title = {In-context Learning (ICL), Finetuning and Baseline Model Development for Natural Machine Translation},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/sujaykumarmag/iasnlp}},
}

About

This project is based on the IASNLP Summer School (Comparison of Encoder-Decoder and Decoder only models)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published