This repository provides the Python and MATLAB scripts accompanying our paper accepted at Interspeech 2025:
Chang, A., Li, Y., Roman, I.R., & Poeppel, D. (2025). Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds. Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Rotterdam, The Netherlands.
The method and results are described in our paper accepted at Interspeech 2025
🔗 [Pending publisher DOI link]
This project introduces Spectrotemporal Modulation (STM), a signal processing feature representation inspired by the neurophysiological encoding in the human auditory cortex. It is designed to provide an efficient and interpretable framework for classifying diverse audio types, including speech, music, and environmental sounds.
The results presented in our paper are fully reproducible using the provided scripts:
- Scripts are numbered sequentially to reflect the execution order.
- Python environments and dependencies are specified in the
./conda_env
directory.
Note: Due to file size constraints and copyright considerations, some audio data and output directories are excluded from this repository.
@inproceedings{chang2025spectrotemporal,
author = {Chang, Andrew and Li, Y. and Roman, I. R. and Poeppel, David},
title = {Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds},
booktitle = {Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)},
year = {2025},
address = {Rotterdam, The Netherlands},
publisher = {ISCA}
}