This project is a set of example workarounds for action recognition & deep learning. It's based on PyTorch and aims to classify given still images for human actions.
The dataset that this project is based is the Stanford 40 Actions, containing more than 9.500 images, capturing human actions. To download the dataset check here
The project is based on two main packages, models
and sfd40
:
- models: Covers all the functionality around the neural networks.
- sfd40: Covers all the functionality around the Stanford 40 data load.
Both projects have a manager class which is the main class for every package. Those two classes play important role in the main.py
file.
The default approach shown in the readme is based on uv
, however one can install all dependencies using other tools as well. To install uv
check here.
First, we start by exporting the xml and image directories so our script is able to fetch the two different dirs:
export IMAGE_FILES_PATH="absolute-path-stanford40-jpeg-images-dir"
export XML_FILES_PATH="absolute-path-stanford40-xml-annotations-dir"
In order to run the Neural Network example you can simply run:
make run
The script can be configured through the usage of environment variables. An example usage with custom configuration is:
# Increased number of epochs
NN_NUM_EPOCHS=500 make run
# Increased number of epochs and only pretrained model selected
NN_NUM_EPOCHS=500 MODEL="pretrained" make run
The env vars used are:
Name | Description | Type | Default |
---|---|---|---|
VALIDATION_RATIO |
The percentage of train data used for validation | float |
0.05 |
TEST_RATIO |
The percentage of full data used for test | float |
0.15 |
Name | Description | Type | Default |
---|---|---|---|
IMAGE_FILES_PATH |
The path to the jpeg images dir | string |
"JPEGImages" |
XML_FILES_PATH |
The path to the xml annotations dir | string |
"XMLAnnotations" |
Name | Description | Type | Default |
---|---|---|---|
NN_IMAGE_READ_MODE |
Mode of image read (GRAY or RGB) | str |
"RGB" |
NN_LEARNING_RATE |
The percentage of learning rate | float |
1e-4 |
NN_TRANSFORM_RESIZE |
The size of the image transformation | int |
224 |
NN_TRAIN_BATCH_SIZE |
The batch size used for training | int |
128 |
NN_TEST_BATCH_SIZE |
The batch size used for testing | int |
50 |
NN_VAL_BATCH_SIZE |
The batch size used for validation | int |
15 |
NN_NUM_EPOCHS |
The number of epochs during training | int |
25 |
Name | Description | Type | Default |
---|---|---|---|
MODEL |
Specify which model you want to use ["pretrained" or "custom"]. If missing the script will iterate over both models (first custom, then pretrained) | str |
"both" |
SAVE_AS_YAML |
Saves hyperparameters and results in yaml file | bool |
True |
The test resources used are images fetched directly from the Stanford40 public dataset.