A list of various Audio/Speech datasets about Speech Recognition, Speech Synthesis, Noise, Audio Tagging/Sound Event Detection, Speaker Diarization, Speaker Recognition, (Inverse) Text normalization, Speech Translation, Multilingual, etc. (continuously update)
Table of contents generated with markdown-toc
- Task
- ASR
- TTS
- Noise
- Audio/Sound
- SD
- SR
- TN/ITN
- ST
- Language
- chinese
- english
- ohter
| Name | Duration(hours) | Links | Comments |
|---|---|---|---|
| THCHS-30 | 30 | [SLR18] | train 30 speakers, 10893 utterances test 10 speakers, 2496 utterances |
| Aishell | 179 | [SLR33] | 400 speakers |
| Aishell2 | 1000 | [Website] | if available, 1991 speakers |
| Free ST Chinese Mandarin (ST-CMDS) | 110 | [SLR38] | 855 speakers, 102600 utterances |
| Primewords Chinese Corpus Set 1 | 99 | [SLR47] | 296 native Chinese speakers |
| aidatatang_200zh | 200 | [SLR62] | 600 speakers |
| aidatatang_1505zh | 1505 | [Github] | if available |
| MAGICDATA Mandarin Read | 755 | [SLR68] | 1080 speakers |
| MAGICDATA Mandarin Conversational (RAMC) | 180 | [SLR123] | 663 speakers |
| AliMeeting (M2MeT) | 118.75 (train/dev/test 104.75/4/10) | [SLR119] | ASR, SD |
| WenetSpeech | 10000+ | [SLR121] [Github] [Website] |
|
| TAL-ASR | 100 | [Website] | 80+ speakers |
| TAL-CSASR | 587 | [Website] | code-switching, 200+ speakers |
| didispeech | if available |
| Name | Duration(hours) | Links | Comments |
|---|---|---|---|
| LibriSpeech | 1000 | [SLR12] [LM] |
|
| GigaSpeech | 33,000+ for unsupervised 10,000 for supervised |
[Github] | |
| Multilingual LibriSpeech (MLS) | [SLR94] | Multilingual | |
| libri-light | 60,000 unlabelled speech | [Github] | pretraining, unsupervised, semi-supervised |
| libriheavy | 50,000 | [Github] | casing, punctuation, context |
| Spgispeech | |||
| People's Speech |
| Name | Duration(hours) | Links | Comments |
|---|---|---|---|
| AISHELL-3 | 85 | [Website] | 44.1k, 218 native Chinese spearkers, 88035 utterances |
| LibriTTS | |||
| Name | Duration(hours) | Links | Comments |
|---|---|---|---|
| MUSAN | [SLR17] | ||
| Aachen Impulse Response database (AIR) | [SLR20] | ||
| Simulated Room Impulse Response Database | [SLR26] | ||
| Room Impulse Response and Noise Database | [SLR28] |
| Name | Duration(hours) | Links | Comments |
|---|---|---|---|
| AliMeeting (M2MeT) | 118.75 (train/dev/test 104.75/4/10) | [SLR119] | ASR, SD |
GigaST
GigaS2S