Dynamic Suffix Search

This repository contains code to audit the unlearned language model microsoft/Llama2-7b-WhoIsHarryPotter. We implement two methods of adversarial attacks

GCG (Greedy Coordinate Gradient): an adversarial prompting method described in the paper https://arxiv.org/abs/2307.15043
DSS (Dynamic Suffix Search): our adversarial prompting method

Dependencies

Conda Environment: conda create --file env.yml Unlearned Model: https://huggingface.co/microsoft/Llama2-7b-WhoIsHarryPotter

Running Adversarial Attacks

python run_attack.py --model_path {{path/to/model}} --config_path {{path/to/repo}}/configs/attack_config.json -v

Name		Name	Last commit message	Last commit date
Latest commit History 220 Commits
.idea		.idea
configs		configs
data		data
deprecated		deprecated
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
attack.py		attack.py
env.yml		env.yml
full_attack.py		full_attack.py
full_attack.sh		full_attack.sh
gpt4-data.py		gpt4-data.py
result_correct.out		result_correct.out
result_wrong.out		result_wrong.out
run_attack.py		run_attack.py
sbatch.sh		sbatch.sh
script.sh		script.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic Suffix Search

Dependencies

Running Adversarial Attacks

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

nexync/llm-auditing

Folders and files

Latest commit

History

Repository files navigation

Dynamic Suffix Search

Dependencies

Running Adversarial Attacks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages