This is a modular and extensible Python toolkit designed to facilitate the creation of high-quality multilingual fact-cheking datasets using LLMs.
pip install -r dev-requirements.txt
The dataset will be released soon.
Download a wikidump to be processed from HERE.
For exampple, to parse English wikidump (enwiki-20240820-pages-articles-multistream.xml) saved at multisynfact/data/wikipedia_dumps/enwiki-20240820-pages-articles-multistream.xml
:
# set corresponding variables
$ python -m multisynfact.src.wikiparser.wiki_parser --output_dir "multisynfact/data/example_parsed_wiki
" --lang "en" --max_articles 100 --wikidump_path "multisynfact/data/wikipedia_dumps/enwiki-20240820-pages-articles-multistream.xml"
$ python -m multisynfact.src.wikiparser.filter --input_dir "multisynfact/data/example_parsed_wiki
" --lang "en"
This will create parsed and filtered wiki data in multisynfact/data/example_parsed_wiki
With the parsed wikidumps, see examples below to
- generatae claims: write a prompt for claim generation with an LLM.
- evaluate claims: based on metrics BLEU, ROUGE, METEOR, readability (supports English, German and Spanish) BertScore, MNLI classification and named entity overlap.
- filter claims: filter undesired claims generated.
Check test.py for more details.
from multisynfact import Config, InputConfig, ModelConfig, get_generator
supporting_prompt = 'Act as an expert in generating claims in {language}. I will give you a sentence in {language} about the topic: "{topic}". Generate a single short and objective claim in {language}, with factual information about the sentence I will provide, using the information available in it. Do not add any explanation, refrain from adding extra information nor your opinion on whether the claim is true or not, as long as it is supported by the sentence. The claim should have less than 30 words. The sentence is: "{sources}". Do not make any reference to the sentence in your answer, make it self-contained. Write your answer in {language} only. Evaluate at the end how good the generated claim is. Your response must be in the same format as the JSON in the examples below. {{\"CLAIM\": \"Write a single short and objective claim in {language}, with factual information about the sentence.\", \"OVERALL QUALITY\": \"Assess the overall quality of the claim on a scale of 1 to 5\", \"SELF-CONTAINED\": \"Assess how self-contained the generated claim is on a scale of 1 to 5.\", \"CATEGORY\": \"categorise whether the given sentence not supported (C0), supported (C1) the generated claim (independently of actual veracity of the claim), or not verifiable (C2)\", \"SUPPORTED BY ORIGINAL SENTENCE\": \"Assess how supported the claim is by the original sentence on a scale of 1 to 5.\", \"FACTUAL\": \"Assess how factual the claim is based on the original sentence [real/non-fiction/non-fantastic]\", \"OBJECTIVE\": \"Assess how objective the claim is on a scale of 1 to 5\"}}.'
refuting_prompt = 'Act as an expert in generating claims in {language}. I will give you one or multiple sentence in {language}, about the topic: "{topic}". Generate a single short and falsified claim in {language} based on the information about the sentence I will provide. Do not improvise. Use only the information I will provide. Do not add any explanation, refrain from adding extra information nor your opinion on whether the claim is true or not, as long as it is not supported by the sentence. The claim should have less than 30 words. The sentence is: "{sources}". Do not make any reference to the sentence in your answer, make it self-contained. Write your answer in {language} only. Evaluate at the end how good the generated claim is. Your response must be in the same format as the JSON in the examples below. {{\"CLAIM\": \"Write a single short and falsified claim in {language}.\", \"OVERALL QUALITY\": \"Assess the overall quality of the claim on a scale of 1 to 5\", \"SELF-CONTAINED\": \"Assess how self-contained the generated claim is on a scale of 1 to 5.\", \"CATEGORY\": \"categorise whether the given sentence not supported (C0), supported (C1) the generated claim (independently of actual veracity of the claim), or not verifiable (C2)\", \"SUPPORTED BY ORIGINAL SENTENCE\": \"Assess how supported the claim is by the original sentence on a scale of 1 to 5.\", \"FACTUAL\": \"Assess how factual the claim is based on the original sentence [real/non-fiction/non-fantastic]\", \"OBJECTIVE\": \"Assess how objective the claim is on a scale of 1 to 5\"}}.'
notinfo_prompt = 'Act as an expert in generating claims in {language}. I will give you one evidence in {language}, about the topic: "{topic}". Generate a single short and specific claim in {language}, relevant to the information about the evidence I will provide. The cliam should be not verifiable based on the evidence provided. Do not add any explanation, refrain from adding extra information nor your opinion on whether the claim is true or not. The claim should have less than 30 words. The evidence is: "{sources}". Do not make any reference to the sentence in your answer, make it self-contained. Write your answer in {language} only. Evaluate at the end how good the generated claim is. Your response must be in the same format as the JSON in the examples below. {{\"CLAIM\": \"Write a single short and specific claim in {language}, relevant to but not verifiable by the information about the evidence I will provide.\", \"OVERALL QUALITY\": \"Assess the overall quality of the claim on a scale of 1 to 5\", \"SELF-CONTAINED\": \"Assess how self-contained the generated claim is on a scale of 1 to 5.\", \"CATEGORY\": \"categorise whether the given sentence not supported (C0), supported (C1) the generated claim (independently of actual veracity of the claim), or not verifiable (C2)\", \"SUPPORTED BY ORIGINAL SENTENCE\": \"Assess how supported the claim is by the original sentence on a scale of 1 to 5.\", \"FACTUAL\": \"Assess how factual the claim is based on the original sentence [real/non-fiction/non-fantastic]\", \"OBJECTIVE\": \"Assess how objective the claim is on a scale of 1 to 5\"}}'
input_config = InputConfig(
language="en",
quantity=10,
dataset="./multisynfact/data/example_parsed_wiki/enwiki_intros_filtered.json",
template=refuting_prompt, # supporting_prompt, notinfo_prompt or refuting_prompt
extractor="wiki_sentences",
generator="verification",
target_class="refuting", # supporting, notinfo or refuting
max_input_tokens=1024,
extractor_args={"wiki_sentences": {"n_source": 1}},
)
model_config = ModelConfig(
provider="hf_local",
model_name="mistralai/Mistral-7B-Instruct-v0.3",
threads=2,
device="cuda:0",
quantization="int4_nf4",
access_token="ADD YOUR ACCESS TOKEN on HUGGINGFACE HERE"
)
generation_config = {"temperature": 0.7, "max_new_tokens": 100}
config = Config(
input=input_config,
model=model_config,
generation=generation_config,
)
generator = get_generator(config)
dataset = generator.generate()
# Show the first generation if there is at least one above acceptable threshold
if dataset:
# save generated dataset to disk
dataset = dataset.shuffle(seed=42)
dataset.save_to_disk("multisynfact/data/llm_filtered_dataset")
print(f"Overview of dataset generated after filtering: {dataset}")
print(f"The first claim generated: {dataset['claim'][0]}")
print(f"The evaluation of the first generated claim: {dataset['label'][0]}")
print(f"The sources used for generating the first claim: {dataset['source'][0]}")
else:
print("There is no good generation after filtering. please generate more examples or try other models.")
# Now let´s evalute the dataset generated
from multisynfact.src.metrics import run_metrics
from pathlib import Path
# for meteor, make sure 'punkt_tab' is downloaded:
# nltk.download('punkt_tab')
metrics=["bleu", "meteor", "rouge", "readability", "named_entity_overlap", "bertscore", "mnli"]
metric_args = {
"bleu": {"max_order": 3, "smooth": True},
"rouge": {"rouge_types": ["rouge1", "rouge2", "rougeL"]},
"readability": {"language": "en"},
"named_entity_overlap": {"language": "en"},
"bertscore": {
"language": "en",
"device": "cuda:0",
"model_type": "bert-base-multilingual-cased",
},
"mnli": {
"device": "cuda:0",
"model_type": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
},
}
if dataset:
scores = run_metrics(dataset, metrics, metric_args, save_dir=Path("./logs"), save_results=True)
print(f"Evaluation results: {scores}")
For more details on data partition procedure, please see our paper.
@article{chung-etal-2025-beyond,
title={Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking},
author={Yi-Ling Chung and Aurora Cobo and Pablo Serna},
year={2025},
eprint={2502.15419},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.15419},
}
Please feel free to contribute to MultiSynFact by raising an issue.