Code for MCQ-based evaluation, etc.

(this page under construction)

Code for MCQ-based evaluation, etc.

Here we describe Python programs for:

Generating and evaluating MCQs
Fine-tuning models based on supplied data
Other useful things

Please email {foster|stevens|catlett}@anl.gov if you see things that are unclear or missing.

Set up to access ALCF Inference Service

Before you start: We recommend you follow the instructions for ALCF Inference Service Prerequisites to set up your ALCF auth token, required to access models via the inference service. (You need to download and run inference_auth_token.py.

Code for generating and evaluating MCQs

Clone this repository.

git clone [email protected]:auroraGPT-ANL/MCQ-and-SFT-code.git
cd MCQ-and-SFT-code

Workflow Overview

This pipeline converts scientific papers in PDF format into JSON and then uses AI models of your choice to generate multiple-choice questions (MCQs), answers, and scores of those answers.

Preparation Steps:

Set up your working directory
Set up and activate your Conda environment

Workflow Steps:

(flowchart)

Convert PDFs (papers) to JSON representations.
Generate MCQs from JSON representations.
Combine multiple MCQ JSON files into a single file
Select a subset of MCQs.
Generate additonal answers for MCQs (using a different model than used to generate the initial MCQs and answers).
Score AI-generated answers using another AI model.
Review the status of MCQ generation and scoring.

Preparation Steps

Set Up Your Working Directory

Ensure your working directory has subdirectories for storing input and output files. The names of files and folders don't matter, but these are the names specified in config.yml. If you want to place data elsewhere, update the directories secion in config.yml if you are just starting out, use these and you can copy/paste the steps).

_PAPERS/ → original PDF papers.
_JSON/ → parsed text in JSON format.
_MCQ/ → generated MCQs in JSON format.
_RESULTS/ → AI-generated answers and scores.

If you're just starting (and don't already have these or equivalent directories), , create these directories manually. If yours are named differently, substitute your directory names in config.yml

mkdir _PAPERS _JSON _MCQ _RESULTS

(Note: Some of the scripts below create their output directories automatically if they don’t already exist, but we will create them just to be sure..)

Put your papers (in PDF form) in _PAPERS.

Set Up and Activate Your Conda Environment

If you already have a Conda environment you want to keep using, update it with any missing dependencies needed for this workflow:

conda env update --name <your_conda_env> --file environment.yml

Otherwise, create a new Conda environment:

conda env create -f environment.yml
conda activate globus_env

(Note: If you get CondaValueError: prefix already exists, editenvironment.yml and change the name:, then create and activate that env.)

Workflow

1. Convert PDFs to JSON

Extract text from PDFs using a simple parser:

python src/simple_parse.py

Note: You can specify input and output with, e.g., -i _PAPERS -o _JSON, otherwise the code will default to the directories specified in config.yml

Alternatively, you can use AdaParse (higher-quality parser, still in testing). More details

2. Generate MCQs Using an AI Model

To generate MCQs from parsed JSON files:

Authenticate with ALCF inference service (if not already done):
```
python src/inference_auth_token.py authenticate
```
(Optional) Check which models are running You may wish to check to see which models are currently running as waiting for a model to load can take 10-15 minutes (see ALCF Inference service). Get the list of running and queued models as follows:
```
access_token=$(python src/inference_auth_token.py get_access_token)
curl -X GET "https://data-portal-dev.cels.anl.gov/resource_server/sophia/jobs" \
    -H "Authorization: Bearer ${access_token}" | jq
```

Piping the output to jq (Command-line JSON processor) makes it much easier to read.

Notes

If you are not connected via VPN or to Argonne-auth at the lab then you'll get an error such as curl: (6) Could not resolve host: data-portal-dev.cels.anl.gov.
If it's been a while since you authenticated, you'll get a "Permission denied" error. In this case, you'll need to re-authenticate:

python src/inference_auth_token.py authenticate --force

Run MCQ generation: This step uses generate_mcqs.py to divide text into chunks, generate MCQs, and include reference answers.

You may wish to check to see which models are currently running as waiting for a model to load can take 10-15 minutes (see ALCF Inference service). For this example we are using Mistral-7B-Instruct-v0.3. Omitting the -m option defaults to openai:gpt-4o.

python src/generate_mcqs.py -m 'alcf:mistralai/Mistral-7B-Instruct-v0.3'

Note: You can specify input and output with, e.g., -i _JSON -o _MCQ, and the model with -m as shown here; otherwise the code will default to the default model and directories specified in config.yml.

The code by default displays a progress bar. In -v / --verbose mode informational messages are displayed and in -q / --quiet mode no output is displayed.

3. Combine multiple MCQ JSON files into a single file

python src/combine_json_files.py -o MCQ-combined.json

Here you can override the settings in config.yml by specifying -i on the command line, but you must specify the filename for your combined file as shown here.

4. Select a Random Subset of MCQs for Further Processing (optional)

If you want to randomly select a subset of MCQs from the generated JSON files, use select_mcqs_at_random.py, specifying the number of MCQs to select. For example, to select 17 MCQs:

python src/select_mcqs_at_random.py -i MCQ-combined.json -o MCQ-subset.json -n 17

Here you must specify the filenames for your combined and subset files as shown here.

5. Generate Answers for MCQs Using a Different Model

This step uses an AI model to generate new answers for the selected MCQs. We will use a differnet model than above here. Note the form for specifying the model is <locn>:<model> and in this example we will use meta-llama/Meta-Llama-3-70B-Instruct, whose endpoint is running at = alcf..

python src/generate_answers.py -i MCQ-subset.json \
       -m 'alcf:meta-llama/Meta-Llama-3-70B-Instruct'

Shown here is MCQ-subset.json assuming you performed step 4; otherwise use MCQ-combined.json (or whatever filename you used for output in step 3)

The code by default displays a progress bar. In -v / --verbose mode informational messages are displayed and in -q / --quiet mode no output is displayed.

6. Score AI-Generated Answers

An AI model evaluates and scores the generated answers against reference answers. Here we will use alcf:mistralai/Mistral-7B-Instruct-v0.3 to evaluate the answers we created in the previous step with alcf:meta-llama/Meta-Llama-3-70B-Instruct

python src/score_answers.py \
       -a 'alcf:meta-llama/Meta-Llama-3-70B-Instruct' \
       -b 'alcf:mistralai/Mistral-7B-Instruct-v0.3'

As with previous steps, input and output directories default to the directories specified in config.yml but can be overriden with -i and/or -o on the command line.

Input: _RESULTS/answers_<model-A>.json
Output: _RESULTS/scores_<locn-A>:<model-A>_<locn-B>:<model-B>.json
Note: Any / in model names is replaced with + in filenames.

7. Review MCQ Generation and Scoring Status

To check progress and see which MCQs are answered/scored:

python src/review_status.py -i MCQ-combined.json

This script identifies missing or incomplete processing steps.
As earlier, output defaults to the directory specified in config.yml (_RESULTS) but can be overriden on the coammand line with -o directory-name.

below this point the paths, etc. are outdated and need to be fixed

Additional Notes

This pipeline ensures high-quality multiple-choice questions are generated and scored using AI.
The steps allow for comparison of AI-generated answers against reference answers.
The scoring step provides a numerical evaluation (1-10) of answer accuracy.

Note:

You need a file openai_access_token.txt that contains your OpenAI access token if you are to use an OpenAI model like gpt-4o.

Examples of running generate_answers.py:

python src/generate_answers.py -o ../_RESULTS -i ../_MCQ -m openai:o1-mini.json
- Uses the OpenAI model o1-mini to generate answers for MCQs in MCQs.json and stores results in the _RESULTS directory, in a file named answers_openai:o1-mini.json
python src/generate_answers.py -o ../_RESULTS -i MCQs.json -m "pb:argonne-private/AuroraGPT-IT-v4-0125
- Uses the Huggingface model argonne-private/AuroraGPT-IT-v4-0125, running on a Polaris compute node started via PBS, to generate answers for the same MCQs. Results are placed in _RESULTS/answers_pb:argonne-private+AuroraGPT-IT-v4-0125.json

Examples of running score_answers.py:

python score_answers.py -o _RESULTS -i MCQs.json -a openai:o1-mini.json -b openai:gpt-4o
- Uses the OpenAI model gpt-4o to score answers for MCQs in MCQs.json and stores results in _RESULTS directory, in a file named answers_openai:o1-mini.json
python score_answers.py -o _RESULTS -a pb:argonne-private/AuroraGPT-IT-v4-0125 -b openai:gpt-4o
- Uses the OpenAI model gpt-4o to score answers previously generated for model pb:argonne-private/AuroraGPT-IT-v4-0125, and assumed to be located in a file _RESULTS/answers_pb:argonne-private+AuroraGPT-IT-v4-0125.json, as above. Places results in file _RESULTS/scores_pb:argonne-private+AuroraGPT-IT-v4-0125:openai:gpt-4o.json.

Notes on different model execution locations

The class Model (in model_access.py) implements init and run methods that allow for use of different models.

model = Model(modelname)
response = model.run(user_prompt='Tell me something interesting')

where modelname has a prefix indicating the model type/location:

alcf: Model served by the ALCF Inference Service. You need an ALCF project to charge to.
hf: Huggingface model downloaded and run on Polaris login node (not normally a good thing).
pb: Huggingface model downloaded and run on a Polaris compute node. You need an ALCF project to charge to.
vllm: Huggingface model downloaded and run via VLLM on Polaris compute node. Not sure that works at present.
openai: An OpenAI model, like gpt-4o or o1-mini. You need an OpenAI account to charge to.

Code for fine-tuning programs

# LORA fine-tuning
python lora_fine_tune.py -i <json-file> -o <model-directory>

# Full fine tune
python full_fine_tune.py -i <json-file> -o <model-directory>

Note:

You need a file hf_access_token.txt if you want to publish models to HuggingFace.
You need to edit the file to specify where to publish models in HuggingFace
We are still debugging how to download and run published models

Code for other useful things

Determine what models are currently running on ALCF inference service (see below for more info)

python check_alcf_service_status.py

Determine what answers have been generated and scored, and what additional runs could be performed, given running models, to generate and score additional answers. (You may want to submit runs to start models. Use -m flag to see what could be useful to submit.)

python review_status.py -o <result-directory>

Perform runs of generate_answers and grade_answers.py to generate missing outputs. (See below for more info)

python run_missing_generates.py -o <result-directory>

More on `check_alcf_service_status.py`

The program check_alcf_service_status.py retrieves and processes status information from the ALCF Inference service, and lists models currently running or queued to run. E.g., as follows, which shows six models running and one queued. Models that are not accessed for some period are shut down and queued models started. A request to a model that is not running adds it to the queue.

% python check_alcf_service_status.py
Running: ['meta-llama/Meta-Llama-3-70B-Instruct', 'meta-llama/Meta-Llama-3-8B-Instruct', 'mistralai/Mistral-7B-Instruct-v0.3']
Starting: ['N/A']
Queued : []

Note:

You need a valid ALCF access token stored in a file alcf_access_token.txt. See how to generate an ALCF access token.
Here is a list of models supported by the ALCF inference service.
"N/A" is a test model used by ALCF, it can be ignored.

More on `run_missing_generates.py`

The ALCF inference service hosts many models, as listed here. At any one time, zero or more running, zero or more are queued, and the rest are neither running not queued. (See below for how to use check_alcf_service_status.py to determine which.) You may want to run against all available models. To do so, you can specify -a all, which works out what commands are needed to process specified MCQs with all running models. Adding -q also considers queued models, and -s non-running models. For example, when I ran the following command I was informed of the commands to run three models for which results are not found:

% python run_missing_generates.py -i 100-papers-qa.json -o output_files -a all -m 100 -s
python generate_and_grade_answers.py -i 100-papers-qa.json -o outputs -a 'Qwen/Qwen2-VL-72B-Instruct' -b 'gpt-4o' -c -q -s 0 -e 100
python generate_and_grade_answers.py -i 100-papers-qa.json -o outputs -a 'deepseek-ai/DeepSeek-V3' -b 'gpt-4o' -c -q -s 0 -e 100
python generate_and_grade_answers.py -i 100-papers-qa.json -o outputs -a 'mgoin/Nemotron-4-340B-Instruct-hf' -b 'gpt-4o' -c -q -s 0 -e 100

run_missing_generates.py has options as follows:

  -h, --help            show this help message and exit
  -a MODELA, --modelA MODELA
                        modelA
  -o OUTPUTDIR, --outputdir OUTPUTDIR
                        Directory to look for run results
  -i INPUTFILE, --inputfile INPUTFILE
                        File to look for inputs
  -x, --execute         Run program
  -q, --queued          Process queued models
  -m MAX, --max MAX     Max to process
  -s, --start           Request to non-running models

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MCQ-Workflow.png		MCQ-Workflow.png
README.md		README.md
config.yml		config.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(this page under construction)

Code for MCQ-based evaluation, etc.

Set up to access ALCF Inference Service

Code for generating and evaluating MCQs

Workflow Overview

Preparation Steps

Set Up Your Working Directory

Set Up and Activate Your Conda Environment

Workflow

1. Convert PDFs to JSON

2. Generate MCQs Using an AI Model

3. Combine multiple MCQ JSON files into a single file

4. Select a Random Subset of MCQs for Further Processing (optional)

5. Generate Answers for MCQs Using a Different Model

6. Score AI-Generated Answers

7. Review MCQ Generation and Scoring Status

below this point the paths, etc. are outdated and need to be fixed

Additional Notes

Notes on different model execution locations

Code for fine-tuning programs

Code for other useful things

More on `check_alcf_service_status.py`

More on `run_missing_generates.py`

About

Releases

Packages

Languages

License

proxystore/sc25-information-extraction

Folders and files

Latest commit

History

Repository files navigation

(this page under construction)

Code for MCQ-based evaluation, etc.

Set up to access ALCF Inference Service

Code for generating and evaluating MCQs

Workflow Overview

Preparation Steps

Set Up Your Working Directory

Set Up and Activate Your Conda Environment

Workflow

1. Convert PDFs to JSON

2. Generate MCQs Using an AI Model

3. Combine multiple MCQ JSON files into a single file

4. Select a Random Subset of MCQs for Further Processing (optional)

5. Generate Answers for MCQs Using a Different Model

6. Score AI-Generated Answers

7. Review MCQ Generation and Scoring Status

below this point the paths, etc. are outdated and need to be fixed

Additional Notes

Notes on different model execution locations

Code for fine-tuning programs

Code for other useful things

More on check_alcf_service_status.py

More on run_missing_generates.py

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

More on `check_alcf_service_status.py`

More on `run_missing_generates.py`

Packages