This repository contains the code supporting the homonymous paper
Unifying Post-hoc Explanations of Knowledge Graph Completions
Lonardi A., Badreddine S., Besold T. R., Sánchez-Martín P.
We evaluate four explainability algorithms extracting necessary post-hoc explanations for Knowledge Graph Completion (KGC):
The first three algorithms are implemented in the Kelpie repository. The AnyBURL Explainer is available downloading AnyBURL v22 from the AnyBURL repository. KGC is performed by ComplEx on FB15k-237, using the code implemented in Kelpie.
Below, we provide instructions to replicate all paper results.
To install the necessary dependencies, follow these steps.
- Install Poetry in your system.
- Clone this repository to your local machine.
- Install the dependencies with Poetry using
poetry install --no-rootTo train ComplEx, it is possible to use the instruction detailed in Kelpie. Particularly,
- Clone the Kelpie repository with
git clone https://github.com/AndRossi/Kelpie- Train ComplEx using
python scripts/complex/train.py --dataset FB15k-237 --optimizer Adagrad --dimension 1000 --batch_size 1000 --max_epochs 100 --learning_rate 0.1 --reg 5e-2Note: As discussed in Appendix D of the paper, the hyperparameters suggested in Kelpie (in the command above) lead to an increase of the validation loss during training. We address this by proposing a new set of hyperparameters. To train the model with them, substitute the command in step 2 with the one below.
- Train ComplEx using
python scripts/complex/train.py --dataset FB15k-237 --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0
Note: In these instructions, we always refer to the new set of hyperparameters. All commands can also be executed with the original hyperparameters suggested in Kelpie by substituting them into each command.
Upon training, it is possible to extract necessary explanations.
First, the triples to explain are needed. We have randomly drawn 50 inferred triples top-ranked by ComplEx. To enable reproducibility, these triples have been stored in the triples-to-explain/ folder with two formats:
triples_to_explain_kelpie_format.csv, the Kelpie format.triples_to_explain_anyburl_format.txt, the AnyBURL format.
In this way,
- To run Kelpie, CRIAGE, and Data Poisoning, it is sufficient to move
triples_to_explain_kelpie_format.csvinto theinput_facts/folder of the Kelpie repository, as per Kelpie's documentation. - To run the AnyBURL Explainer, it is sufficient to move
triples_to_explain_anyburl_format.txtin thetriple_to_explain/folder of the AnyBURL repository, as explained in Running the AnyBURL Explainer.
It is now possible to run the explainability algorithms. Kelpie, CRIAGE, and Data Poisoning are all implemented within the Kelpie repository. The commands to extract necessary explanations with these algorithms are as follows.
- Kelpie
python scripts/complex/explain.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --facts_to_explain_path input_facts/triples_to_explain_kelpie_format.csv --mode necessary- Kelpie - K1 (explanations of length 1)
python scripts/complex/explain.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --facts_to_explain_path input_facts/triples_to_explain_kelpie_format.csv --mode necessary --baseline k1- Data Poisoning
python scripts/complex/explain.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --facts_to_explain_path input_facts/triples_to_explain_kelpie_format.csv --mode necessary --baseline data_poisoning- CRIAGE
python scripts/complex/explain.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --facts_to_explain_path input_facts/triples_to_explain_kelpie_format.csv --mode necessary --baseline criageThese commands return the explanations extracted for the triples in triples_to_explain_kelpie_format.csv. To compute the rank changes of such triples, it is necessary to retrain ComplEx once after each run above. The command is
python scripts/complex/verify_explanations.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --mode necessaryThis procedure outputs an output.txt file containing the explanations of each inferred triple, which can be used to compute the lengths of the explanations, and an output_end_to_end.csv file with the rank changes of the predictions.
Note: All three explainability algorithms here described are run with their default configuration set in the Kelpie repository.
To run the AnyBURL explainer, it is possible to follow the steps detailed in the AnyBURL repository. The important steps, made compatible with our setup, are as follows.
- Download and unzip AnyBURL v22 as per AnyBURL's documentation. The repository default level is
src/. - At default level, create a
build/folder, compile the Java source code, and package the compiled file into a JAR file. This can be done by running the commands
mkdir build
javac de/unima/ki/anyburl/*.java -d build
jar cfv AnyBURL-22.jar -C build .
rm -r build- Download the FB15k-237 dataset in a format compatible with AnyBURL. The dataset can be found as a zipped file in the AnyBURL repository. Unzip it and paster the
data/FB15-237/folder at default level. - Create a
triple_to_explain/folder at default level. Here, copy thetriples_to_explain_anyburl_format.txtfile and rename it totarget.txtto make it compatible with the AnyBURL Explainer. - Run the AnyBURL Explainer using
java -Xmx3G -cp AnyBURL-22.jar de.unima.ki.anyburl.Explain ../triples_to_explain/ ../data/FB15-237/This procedure outputs necessary explanations in triple_to_explain/delete.txt. The file can be reformatted as the output.txt file of Kelpie, to then retrain ComplEx upon removing AnyBURL's explanations with
python scripts/complex/verify_explanations.py --dataset FB15k-237 --model_path stored_models/ComplEx_FB15k-237.pt --optimizer Adagrad --dimension 256 --batch_size 2048 --max_epochs 25 --learning_rate 0.01 --reg 0 --mode necessary
Note: The AnyBURL explainer is run with its default configuration set in the AnyBURL repository.
For convenience, we stored all experiments result in the data/ folder.
In data/complex_training/, we pickled all data concerning ComplEx training.
In data/explainers-output/, we pickled all data concerning the post-hoc explanations found by each algorithm.
These data can be conveniently processed by the Jupyter notebooks in notebooks/.
Figures are saved in figures/
The code in this repository is released under the MIT License. Please refer to the LICENSE file for detailed licensing information.
If you have any questions, feedback, or inquiries about the code, please reach out to the authors.
If you encounter any issues with the code or have suggestions, please open an issue on this repository.