Skip to content

Attempt at reproducing your results #2

@janfrancu

Description

@janfrancu

I salute you for your efforts on this repository, it's was quite hard to wrap my head around it.
Here are some notes regarding my attempt at reproduction of your results:

Dataset creation

  • Consider using less threads/processes in img2dataset for RAISE dataset download, as the current setup does not fetch all images. I've ended up just downloading the originals and modified the code locally to load TIFs instead of webdataset's pngs.
  • Is it intentional, that the final dataset does not contain any ImageNet images? The article mentions it, however neither dataset_creation/resources/train_real_file_ids.txt nor dataset_creation/resources/validation_real_file_ids.txt contains it.
  • As the whole dataset building procedure is quite lengthy, consider adding more check for presence of files/directories. I've built the dataset twice, because the real builder did not mind the incorrect RAISE dataset root dir path, which I've provided.
  • Eventually I generated the resulting dataset (more precisely its real part, as the fake part has been already provided):
    origin_dataset  split     
    COCO2017_train  train         76960
                    validation    11679
    COCO2017_val    train          3683
                    validation       66
    LAION-400M      train         58217
                    validation    24198
    RAISE           train          5140
                    validation       57

Training

  • I've run the training with the following settings, as suggested (the smallest model).
BENCHMARK_PIPELINE_CFG="training_configurations/benchmark_pipelines/base_benchmark_sliding_windows.yaml"
MODEL_CFG="training_configurations/RN50_clip/RN50_clip_tune_resize.yaml"

Evaluation

General notes

  • Consider running older stack of python (3.10) or at least mention the requirement for 3.12 in the readme.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions