Attempt at reproducing your results

I salute you for your efforts on this repository, it's was quite hard to wrap my head around it.
Here are some notes regarding my attempt at reproduction of your results:

## Dataset creation
- Consider using less threads/processes in `img2dataset` for RAISE dataset download, as the current setup does not fetch all images. I've ended up just downloading the originals and modified the code locally to load TIFs instead of `webdataset`'s pngs.
- Is it intentional, that the final dataset does not contain any ImageNet images? The article mentions it, however neither [dataset_creation/resources/train_real_file_ids.txt](https://github.com/MI-BioLab/AI-GenBench/blob/main/dataset_creation/resources/train_real_file_ids.txt) nor [dataset_creation/resources/validation_real_file_ids.txt](https://github.com/MI-BioLab/AI-GenBench/blob/main/dataset_creation/resources/validation_real_file_ids.txt) contains it.
- As the whole dataset building procedure is quite lengthy, consider adding more check for presence of files/directories. I've built the dataset twice, because the real builder did not mind the incorrect `RAISE` dataset root dir path, which I've provided.
- Eventually I generated the resulting dataset (more precisely its real part, as the fake part has been already provided):
```
    origin_dataset  split     
    COCO2017_train  train         76960
                    validation    11679
    COCO2017_val    train          3683
                    validation       66
    LAION-400M      train         58217
                    validation    24198
    RAISE           train          5140
                    validation       57
```

## Training
-  I've run the training with the following settings, as suggested (the smallest model).
```
BENCHMARK_PIPELINE_CFG="training_configurations/benchmark_pipelines/base_benchmark_sliding_windows.yaml"
MODEL_CFG="training_configurations/RN50_clip/RN50_clip_tune_resize.yaml"
```
- The config is not compatible with the `openai_clip` factory methods. Fixed in #1 
- Unbound local variable errors in [training_and_evaluation/lightning_data_modules/deepfake_detection_datamodule.py](https://github.com/MI-BioLab/AI-GenBench/blob/main/training_and_evaluation/lightning_data_modules/deepfake_detection_datamodule.py). Dataset/label balancing yielded empty validation set. Fixed in #1 
- Consider creating wandb log directory `training_and_evaluation/experiments_logs/wandb_logs`, in order to avoid the `/tmp` fallback, which is not an option in some cluster environments.


## Evaluation
- The readme does not mention how to evaluate the code, though it is easy to find the corresponding [training_and_evaluation/evaluation_scripts/evaluate_predictions_multiwindow.py](https://github.com/MI-BioLab/AI-GenBench/blob/main/training_and_evaluation/evaluation_scripts/evaluate_predictions_multiwindow.py).
- Attached bellow are [stats](https://github.com/user-attachments/files/21260240/AI-GenBench_RN50_tune_resize.md) computed with the above configuration. Unless I made some mistake during the evaluation process, I got better results than those reported in the [Leaderboard](https://github.com/MI-BioLab/AI-GenBench/tree/main?tab=readme-ov-file#leaderboard) - average-over-time Area Under the ROC Curve (AUC) = 86.48

## General notes
- Consider running older stack of python (3.10) or at least mention the requirement for 3.12 in the readme.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attempt at reproducing your results #2

Dataset creation

Training

Evaluation

General notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Attempt at reproducing your results #2

Description

Dataset creation

Training

Evaluation

General notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions