This repository contains the code for the transfer learning experiments from the manuscript Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success.
To summarize, this project:
-
Uses a pretrained model
30pktTCNET_256
from CESNET Models. -
Evaluates how well the model generalizes by transferring it to seven traffic classification (TC) datasets:
- ISCXVPN2016
- MIRAGE19
- MIRAGE22
- UTMOBILENET21
- UCDAVIS19
- CESNET-TLS22
- AppClassNet
-
Implements three transfer learning methods: k-NN, linear probing, and model fine-tuning, along with training from scratch and an input-space baseline for comparison.
-
The datasets cover ten downstream tasks in total. Our fine-tuning approach surpasses SOTA performance on nine of them. The hyperparameters used to achieve the best results are listed in
conf/best
.
-
Prepare an environment using pip or conda, installing the dependencies listed in
./requirements
. -
Download all datasets
- MIRAGE19, MIRAGE22, UTMOBILENET21, and UCDAVIS19 datasets are obtained from the tcbench framework. Follow their instructions for preparing the datasets.
- CESNET-TLS22 is accessed through CESNET DataZoo. It will download automatically on first use.
- Download AppClassNet from figshare.
- For ISCXVPN2016, we use a version provided by Alfredo Nascita. You can contact him at alfredo[dot]nascita[at]unina[dot]it. Before using this version, process it with
scripts/preprocess_iscx_dataset.py
.
-
Update
conf/local-vars.yaml
. Specify a local folder containing AppClassNet and ISCXVPN2016 datasets, a temp folder, and a wandb project name (integration withwandb
is currently mandatory). -
Experiments are configured with Hydra. Multiple configs are prepared in
./conf
. To run an experiment with a given config, usepython -m experiment_wrapper.do_experiment --config-name local-config.yaml
.
The results of experiments are saved in $temp_dir/results
. The final results presented in the manuscript are available in scripts/final-results
. See scripts/explore_results.ipynb
for a summarization of the results in tables. The best found hyperparameteres for each dataset can be found in conf\best
.