This SpeechBrain recipe includes scripts to train end-to-end transducer-based target speaker automatic speech recognition (TS-ASR) systems as proposed in Streaming Target-Speaker ASR with Neural Transducer.
Generate the LibriSpeechMix data in <path-to-data-folder> following the
official readme.
Clone the repository, navigate to <path-to-repository>, open a terminal and run:
pip install -e vendor/speechbrain
pip install -r requirements.txtNavigate to <path-to-repository>, open a terminal and run:
python train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder>To use multiple GPUs on the same node, run:
python -m torch.distributed.launch --nproc_per_node=<num-gpus> \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launchTo use multiple GPUs on multiple nodes, for each node with rank 0, ..., <num-nodes> - 1 run:
python -m torch.distributed.launch --nproc_per_node=<num-gpus-per-node> \
--nnodes=<num-nodes> --node_rank=<node-rank> --master_addr <rank-0-ip-addr> --master_port 5555 \
train_<dataset>_<variant>.py hparams/<dataset>/<config>.yaml --data_folder <path-to-data-folder> --distributed_launchHelper functions and scripts for plotting and analyzing the results can be found in utils.py and tools.
NOTE: the vendored version of SpeechBrain inside this repository includes several hotfixes (e.g. distributed training, gradient clipping, gradient accumulation, causality, etc.) and additional features (e.g. distributed evaluation).
nohup python -m torch.distributed.launch --nproc_per_node=8 \
train_librispeechmix_scratch.py hparams/LibriSpeechMix/conformer-t_scratch.yaml \
--data_folder datasets/LibriSpeechMix --num_epochs 100 \
--distributed_launch &