🤗 Models & Datasets | 📃 Blog Post
This work explores the application of Constitutional AI (CAI) techniques in a multilingual context. This work covers 8 languages (Arabic, Filipino, French, Hindi, English, Russian, Serbian, Spanish).
construct_principles.ipynbis a notebook that walks through the adaptation of Anthropic's constitution to create targeted critiques and revisions to the Aya redteaming dataset.create_ultrafeedback_multilingual.pyis a script to translate Ultrafeedback Binarized into our 8 languages using NLLB-3.3B.generate_critiques_revisions.pyis an optimised vLLM script which generates the constitutional preference pairs via critiquing and revising the LLM responses to the red teaming prompts.data_cleaning.pyis a script that helps us to remove some of the unwanted examples that resulted from thegenerate_critiques_revisions.pyscript.finetuningcontains scripts and configs to do supervised finetuning and DPO for both the safety trained model and the baseline.evaluate.pyis a script that generates outputs on the test set of the red team prompts, and uses GPT-4o for LLM-as-a-judge to categorise each as either HARMFUL or HARMLESS. We also provide an explanation for the categorisation for interpretability.plots.ipynbis a notebook used for generating the plots shown in the blog post.
