Tool for evaluations of the modified dataset

This tools evaluates the accuracy of instruct models on datasets that have hypotheses with and without negation in them.

How to set up environment

# create conda env:
conda create -n nllm python=3.10

# activate the environment
conda activate nllm

# install poetry 
pip install poetry

# install the project dependencies
poetry install --no-root

How to run predictions

First, launch the vllm server with the desired model

model_name=Qwen/Qwen2.5-0.5B-Instruct
port=8000
apikey=makesomethingup
gpu=7

CUDA_VISIBLE_DEVICES=$gpu \
HF_CACHE=.cache/ \
  vllm serve $model_name \
  --port $port \
  --api-key $apikey \
  --dtype auto \
  --task generate \
  --max-model-len 1600 \
  --enable-prefix-caching

Some parameters might need some tweaking, depending on your HW or the model used.

Optionally, use the following arguments for quantization

  --quantization bitsandbytes --load-format bitsandbytes

Mistral models needs these arguments:

  --tokenizer-mode mistral --config-format mistral --load-format mistral

If you need a quantized mistral, you are out of luck because you cannot pass --load-format bitsandbytes and --load-format mistral at the same time. In that case, you have to quantize the model yourself with quantize.py into a local file. Then, you run the vllm server with a local path to the quantized model, and don't add any mistral-specific arguments.

After the inference server is running, you can launch script for generating predictions:

python run.py http://localhost:${port}/v1 ${apikey} ${model_name} nofever-ces.csv ces_prompt.txt --output_dir ./output ; \
python run.py http://localhost:${port}/v1 ${apikey} ${model_name} nofever-eng.csv eng_prompt.txt --output_dir ./output ; \
python run.py http://localhost:${port}/v1 ${apikey} ${model_name} nofever-ukr.csv ukr_prompt.txt --output_dir ./output ; \
python run.py http://localhost:${port}/v1 ${apikey} ${model_name} nofever-deu.csv deu_prompt.txt --output_dir ./output

You can see more options with python run.py --help

This script creates 3 output files:

./output/Qwen_Qwen2.5-0.5B-Instruct_<timestamp>_P.csv: csv for the results on the positive hypotheses, containing dataset_id, predict_token (True or False), predicted_polarity (polarity of the hypothesis if predict_token True, the opposite polarity if False), correct_polarity (polarity of the actual correct hypothesis)
./output/Qwen_Qwen2.5-0.5B-Instruct_<timestamp>_N.csv: csv for the results on the negative hypotheses, same structure as the *_P.csv
./output/Qwen_Qwen2.5-0.5B-Instruct_<timestamp>_res.json: JSON object containing accuracy and other information about the run

<timestamp> is the timestamp of the start of the run.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
notebooks		notebooks
system_prompts		system_prompts
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tool for evaluations of the modified dataset

How to set up environment

How to run predictions

About

Uh oh!

Releases

Packages

Languages

MIR-MU/negation

Folders and files

Latest commit

History

Repository files navigation

Tool for evaluations of the modified dataset

How to set up environment

How to run predictions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages