Skip to content

curlsloth/MusicSpeech-STM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectrotemporal Modulation (STM): Efficient and Interpretable Audio Feature Representation

This repository provides the Python and MATLAB scripts accompanying our paper accepted at Interspeech 2025:

Chang, A., Li, Y., Roman, I.R., & Poeppel, D. (2025). Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds. Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Rotterdam, The Netherlands.

The method and results are described in our paper accepted at Interspeech 2025

📄 ArXiv link

🔗 [Pending publisher DOI link]

Project Overview

This project introduces Spectrotemporal Modulation (STM), a signal processing feature representation inspired by the neurophysiological encoding in the human auditory cortex. It is designed to provide an efficient and interpretable framework for classifying diverse audio types, including speech, music, and environmental sounds.

Scripts and Reproducibility

The results presented in our paper are fully reproducible using the provided scripts:

  • Scripts are numbered sequentially to reflect the execution order.
  • Python environments and dependencies are specified in the ./conda_env directory.

Note: Due to file size constraints and copyright considerations, some audio data and output directories are excluded from this repository.

Citation

@inproceedings{chang2025spectrotemporal,
  author    = {Chang, Andrew and Li, Y. and Roman, I. R. and Poeppel, David},
  title     = {Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds},
  booktitle = {Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech)},
  year      = {2025},
  address   = {Rotterdam, The Netherlands},
  publisher = {ISCA}
}

About

Spectrotemporal modulation analysis of music and speech corpora

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •