DistillClassifier is a tool built on top of LLM-VM to easily generate synthetic data for classification tasks using LLMs for distilling LLM knowledge for classification task into much smaller and faster-to-run classification models.
This project was build for the ANARCHY October 2023 Hackathon. Checkout ANARCHY on their github and website.
git clone https://github.com/daspartho/DistillClassifiercd DistillClassifiergit clone https://github.com/anarchy-ai/LLM-VM.git
cd LLM-VM
pip3 install .
cd ..pip3 install -r requirements.txtcreate an .env file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):
OPENAI_API_KEY=
HF_HUB_TOKEN=python3 generation.py <columns> <n_examples> [-m <model>] [-f <filename>] [-r <repo>]<columns>: Column information as a dictionary.<n_examples>: Number of examples to be generated.-m, --model: (Optional) Model name. Defaults to "chat_gpt".-f, --filename: (Optional) Dataset filename. Defaults to "dataset.json".-r, --repo: (Optional) HuggingFace repo ID". Defaults to "None"
python3 generation.py '{"text": "either spoiler or not spoiler text", "label": "if text is spoiler or not"}' 25 -m 'chat_gpt' -f 'dataset.json' -r 'spoiler_or_not'python3 demo.pyMIT
