Skip to content

x-datascience-datacamp/template_codabench

Repository files navigation

Template to create a codabench bundle for ML competition in python

The code in this repository is a template to create a codabench bundle for a machine learning competition in python. It sets up a dummy classification task, evaluated with accuracy metric on a public and private test set.

Structure of the bundle

  • competition.yaml: configuration file for the codabench competition, specifying phases, tasks, and evaluation metrics.
  • ingestion_program/: contains the ingestion program that will be run on participant's submissions. It is responsible for loading the code from the submission, passing the training data to train the model, and generating predictions on the test datasets. It contains:
    • metadata.yaml: A file describing how to run the ingestion program for codabench. For a single script ingestion program in ingestion.py, no need to edit it.
    • ingestion.py: A script to run the ingestion. The role of this script is to load the submission code and produce predictions that can be evaluated with the scoring_program. In our example, the submission.py define a get_model function that returns a scikit-learn compatible model. This model is then fitted on the training data calling fit, and the predict method is used to generate predictions on the test data. These predictions are stored as a csv file, to be loaded with the scoring_program.
  • scoring_program/: contains the scoring program that will be run to evaluate the predictions generated by the ingestion program. It loads the predictions and the ground truth labels, computes the evaluation metric (accuracy in this case), and outputs the score. It contains:
    • metadata.yaml: A file describing how to run the scoring program for codabench. For a single script ingestion program in scoring.py, no need to edit it.
    • scoring.py: A script to run the scoring. This script loads the prediction dumped from the ingestion program and produce a single json file containing the scores associated with the submission. In our example, we compute accuracy on two test sets (public and private) as well as runtime.
  • solution/: contains a sample solution submission that participants can use as a reference. Here, this is a simple Random Forest classifier. This file is gives the user the structure of the code they need to submit. In our example, the user needs to submit a submission.py file with get_model function that returns a scikit-learn compatible model.
  • *_phase/: contains the data for a given phase, including input data and reference labels. Running tools/setup_data.py will generate dummy data for a development phase. For a real competition, this data should be replaced with the actual data.
  • pages/: contains markdown files that will be rendered as web pages in the codabench competition.
  • requirements.txt: contains the required python dependencies to run the challenge.

Extra scripts in the tools/ folder

  • tools/setup_data.py: script to generate dummy data for the competition. This should be changed to load and preprocess real data for a given competition.
  • tools/create_bundle.py: script to create the codabench bundle archive from the repository structure.
  • tools/Dockerfile: Dockerfile to build the docker image that will be used to run the ingestion and scoring programs.
  • tools/run_docker.py: convenience script to build and test the docker image locally without knowing docker commands. See here for more details.

Instruction to create the codabench bundle

Make sure that the setup_data.py script has been run to generate the data for the competition.

Then, run the create_bundle.py script to create the codabench bundle archive:

python create_bundle.py

You can then upload the generated bundle.zip file to codabench to create the competition on this page.

Instructions to test the bundle locally

To test the ingestion program, run:

python ingestion_program/ingestion.py --data-dir dev_phase/input_data/ --output-dir ingestion_res/  --submission-dir solution/

To test the scoring program, run:

python scoring_program/scoring.py --reference-dir dev_phase/reference_data/ --output-dir scoring_res  --prediction-dir ingestion_res/

Setting up and testing the docker image

For convenience, a python script tools/run_docker.py is provided to build the docker image, and run the ingestion and scoring programs inside the docker container. This script requires installing the docker python package, which can be done via pip:

pip install docker
python tools/run_docker.py

You can also perform these steps manually. You first need to build the docker image locally from the Dockerfile with:

docker build -t docker-image tools

To test the docker image locally, run:

docker run --rm -u root \
    -v "./ingestion_program":"/app/ingestion_program" \
    -v "./dev_phase/input_data":/app/input_data \
    -v "./ingestion_res":/app/output \
    -v "./solution":/app/ingested_program \
    --name ingestion docker-image \
        python /app/ingestion_program/ingestion.py

docker run --rm -u root \
    -v "./scoring_program":"/app/scoring_program" \
    -v "./dev_phase/reference_data":/app/input/ref \
    -v "./ingestion_res":/app/input/res \
    -v "./scoring_res":/app/output \
    --name scoring docker-image \
        python /app/scoring_program/scoring.py

About

A template to create codabench competition bundle in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published