-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I salute you for your efforts on this repository, it's was quite hard to wrap my head around it.
Here are some notes regarding my attempt at reproduction of your results:
Dataset creation
- Consider using less threads/processes in
img2dataset
for RAISE dataset download, as the current setup does not fetch all images. I've ended up just downloading the originals and modified the code locally to load TIFs instead ofwebdataset
's pngs. - Is it intentional, that the final dataset does not contain any ImageNet images? The article mentions it, however neither dataset_creation/resources/train_real_file_ids.txt nor dataset_creation/resources/validation_real_file_ids.txt contains it.
- As the whole dataset building procedure is quite lengthy, consider adding more check for presence of files/directories. I've built the dataset twice, because the real builder did not mind the incorrect
RAISE
dataset root dir path, which I've provided. - Eventually I generated the resulting dataset (more precisely its real part, as the fake part has been already provided):
origin_dataset split
COCO2017_train train 76960
validation 11679
COCO2017_val train 3683
validation 66
LAION-400M train 58217
validation 24198
RAISE train 5140
validation 57
Training
- I've run the training with the following settings, as suggested (the smallest model).
BENCHMARK_PIPELINE_CFG="training_configurations/benchmark_pipelines/base_benchmark_sliding_windows.yaml"
MODEL_CFG="training_configurations/RN50_clip/RN50_clip_tune_resize.yaml"
- The config is not compatible with the
openai_clip
factory methods. Fixed in Small fixes for easier reproducibility #1 - Unbound local variable errors in training_and_evaluation/lightning_data_modules/deepfake_detection_datamodule.py. Dataset/label balancing yielded empty validation set. Fixed in Small fixes for easier reproducibility #1
- Consider creating wandb log directory
training_and_evaluation/experiments_logs/wandb_logs
, in order to avoid the/tmp
fallback, which is not an option in some cluster environments.
Evaluation
- The readme does not mention how to evaluate the code, though it is easy to find the corresponding training_and_evaluation/evaluation_scripts/evaluate_predictions_multiwindow.py.
- Attached bellow are stats computed with the above configuration. Unless I made some mistake during the evaluation process, I got better results than those reported in the Leaderboard - average-over-time Area Under the ROC Curve (AUC) = 86.48
General notes
- Consider running older stack of python (3.10) or at least mention the requirement for 3.12 in the readme.
Metadata
Metadata
Assignees
Labels
No labels