The code in this repository is a template to create a codabench bundle for a machine learning competition in python. It sets up a dummy classification task, evaluated with accuracy metric on a public and private test set.
competition.yaml: configuration file for the codabench competition, specifying phases, tasks, and evaluation metrics.ingestion_program/: contains the ingestion program that will be run on participant's submissions. It is responsible for loading the code from the submission, passing the training data to train the model, and generating predictions on the test datasets. It contains:metadata.yaml: A file describing how to run the ingestion program forcodabench. For a single script ingestion program iningestion.py, no need to edit it.ingestion.py: A script to run the ingestion. The role of this script is to load the submission code and produce predictions that can be evaluated with thescoring_program. In our example,the submission.pydefine aget_modelfunction that returns a scikit-learn compatible model. This model is then fitted on the training data callingfit, and thepredictmethod is used to generate predictions on the test data. These predictions are stored as a csv file, to be loaded with thescoring_program.
scoring_program/: contains the scoring program that will be run to evaluate the predictions generated by the ingestion program. It loads the predictions and the ground truth labels, computes the evaluation metric (accuracy in this case), and outputs the score. It contains:metadata.yaml: A file describing how to run the scoring program forcodabench. For a single script ingestion program inscoring.py, no need to edit it.scoring.py: A script to run the scoring. This script loads the prediction dumped from the ingestion program and produce a single json file containing the scores associated with the submission. In our example, we computeaccuracyon two test sets (public and private) as well as runtime.
solution/: contains a sample solution submission that participants can use as a reference. Here, this is a simple Random Forest classifier. This file is gives the user the structure of the code they need to submit. In our example, the user needs to submit asubmission.pyfile withget_modelfunction that returns a scikit-learn compatible model.*_phase/: contains the data for a given phase, including input data and reference labels. Runningtools/setup_data.pywill generate dummy data for a development phase. For a real competition, this data should be replaced with the actual data.pages/: contains markdown files that will be rendered as web pages in the codabench competition.requirements.txt: contains the required python dependencies to run the challenge.
tools/setup_data.py: script to generate dummy data for the competition. This should be changed to load and preprocess real data for a given competition.tools/create_bundle.py: script to create the codabench bundle archive from the repository structure.tools/Dockerfile: Dockerfile to build the docker image that will be used to run the ingestion and scoring programs.tools/run_docker.py: convenience script to build and test the docker image locally without knowing docker commands. See here for more details.
Make sure that the setup_data.py script has been run to generate the data for
the competition.
Then, run the create_bundle.py script to create the codabench bundle archive:
python create_bundle.pyYou can then upload the generated bundle.zip file to codabench to create the competition on this page.
To test the ingestion program, run:
python ingestion_program/ingestion.py --data-dir dev_phase/input_data/ --output-dir ingestion_res/ --submission-dir solution/To test the scoring program, run:
python scoring_program/scoring.py --reference-dir dev_phase/reference_data/ --output-dir scoring_res --prediction-dir ingestion_res/For convenience, a python script tools/run_docker.py is provided to build
the docker image, and run the ingestion and scoring programs inside the docker
container.
This script requires installing the docker python package, which can be done via pip:
pip install docker
python tools/run_docker.pyYou can also perform these steps manually.
You first need to build the docker image locally from the Dockerfile with:
docker build -t docker-image toolsTo test the docker image locally, run:
docker run --rm -u root \
-v "./ingestion_program":"/app/ingestion_program" \
-v "./dev_phase/input_data":/app/input_data \
-v "./ingestion_res":/app/output \
-v "./solution":/app/ingested_program \
--name ingestion docker-image \
python /app/ingestion_program/ingestion.py
docker run --rm -u root \
-v "./scoring_program":"/app/scoring_program" \
-v "./dev_phase/reference_data":/app/input/ref \
-v "./ingestion_res":/app/input/res \
-v "./scoring_res":/app/output \
--name scoring docker-image \
python /app/scoring_program/scoring.py