Demo: Train a small neural network in a Docker container

Docker enables portability and reproducibility of a codebase. This repository serves as an example for training neural networks in Docker containers, as well as running GUI applications.

Notes:

Instead of developing the ultimate neural network, the focus of this repository is to show how one can train a neural network in Docker containers. In other words, contents in train.py and utils.py are out of scope.
Although this repository only uses Python, one can extend the idea to arbitrary languages/tools.
In addition to Docker Hub, NGC is also a popular place (i.e. registry) for pulling prebuilt images. One can also host private registries using Red Hat Quay or Google Cloud Container Registry.
Any feedback is welcome! Feel free to start a discussion.

Prerequisites

Hardware

amd64 CPU(s)
NVIDIA GPU(s) with Architecture >= Kepler (or compute capability 3.0)

Software

Ubuntu 18.04, 20.04, or 22.04 (either bare metal or WSL2)
NVIDIA driver >= 450
NVIDIA Container Toolkit
git

Instructions

Clone the repository

$ git clone https://github.com/tn00364361/demo-cnn-docker.git
$ cd demo-cnn-docker

Build the Docker image:
```
$ ./docker/build.sh
```
This step can take a couple of minutes, depending on your internet connection. Once it finishes, run docker image ls to verify the existence of the image.
Launch a container
```
$ ./docker/run.sh           # use GPU0 only (default)
$ ./docker/run.sh -d all    # use all GPU(s)
$ ./docker/run.sh -d 2,3    # use GPU2 and GPU3
```
See here for valid values for GPU ID(s).

The username and hostname should become yellow and the working directory should be /my_workspace, indicating that you are now in a container. Run ll to verify that you can see the host files.

In addition, run nvidia-smi in the container to veirfy that the GPU is visible in the container.

Details

Differences between images and containers

An image can be seen as a snapshot of an environment. It encapsulates all dependencies, including software packages and environmental variables, in a static state. On can pull prebuilt images from Docker Hub. For example:
```
$ docker pull ubuntu:22.04
```
By contrast, a container is (usually) interactive and can be launched given the name of an image. For example:
```
$ docker run -it --rm ubuntu:22.04
```
A more sophisticated example can be found in the run script which will be explained later.
Running as the root user or using sudo in containers

TL;DR: Don't.

Once the image is built, one should launch containers using the -u flag, as shown in the run script. This allows the written files to have the correct ownership, instead of to be owned by the root user.

If you want to install packages or change system cofiguration files, do them in the Dockerfile.
docker/home

This is the home directory to be mounted to a container. Mounting it allows the user in the container to keep files like Bash/Python history and some common configuration files.
docker/Dockerfile

This file containes instructions to build the desired image. Starting from a base image, environmental variables are set, and packages are installed.

When developing custom/new algorithms, one almost always wants to write their own Dockerfile instead of using a prebuilt image from Dockerh Hub. A typical pipeline may look like this:
1. Figure out dependencies of the project.
2. Write a Dockerfile with all dependencies.
3. Build the image.
4. Launch a container.
5. Develop and test your algorithm in the container.
6. If new dependency appears during development, go back to 1 and reiterate.
In addition, it is a good pratice to explicitly specify the software versions. For example, when providing a requirements.txt for PIP, one should prefer == or >=X,<Y over >= or no versioning at all.

Another thing to pay attention to is that when cloning a thirdparty repository as a dependency, one should checkout to a particular tag or commit in the Dockerfile.
docker/build.sh

This script builds the image with a predefined tag (i.e. name of the image). See the official documentation for more details.

docker/run.sh

This script launchs a container from the built image. In short, this script handles GUI- and user-related stuff, in addition to accessing GPU(s) for training neural networks.

GUI (requires x11-xserver-utils installed on the host)

#!/usr/bin/env bash
...
xhost +local:

docker run -it --rm \
    ...
    -e DISPLAY \
    -e QT_X11_NO_MITSHM=1 \
    -e XDG_RUNTIME_DIR=/run/user/$USER_ID
    -v /run/user/$USER_ID:/run/user/$USER_ID
    -v /tmp/.X11-unix:/tmp/.X11-unix
    ...
    demo-cnn

xhost -local:

User & group passthrough

#!/usr/bin/env bash
...
USER_ID=$(id -u)
GROUP_ID=$(id -g)
PASSWD_FILE=$(mktemp) && echo $(getent passwd $USER_ID) > $PASSWD_FILE
GROUP_FILE=$(mktemp) && echo $(getent group $GROUP_ID) > $GROUP_FILE
...
docker run -it --rm \
    ...
    -e HOME \
    -u $USER_ID:$GROUP_ID \
    -v $PASSWD_FILE:/etc/passwd:ro \
    -v $GROUP_FILE:/etc/group:ro \
    -v $ROOTFS/home:$HOME \
    ...
    demo-cnn
...

GPU access

#!/usr/bin/env bash
...
DEVICES=0
...
docker run -it --rm \
    --gpus '"device='$DEVICES'"' \
    ...
    demo-cnn
...

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
docker		docker
logs		logs
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Demo: Train a small neural network in a Docker container

Prerequisites

Hardware

Software

Instructions

Details

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tn00364361/demo-cnn-docker

Folders and files

Latest commit

History

Repository files navigation

Demo: Train a small neural network in a Docker container

Prerequisites

Hardware

Software

Instructions

Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages