Skip to content

Conversation

Coeur
Copy link
Contributor

@Coeur Coeur commented Jan 17, 2019

Copy link
Contributor

@lenzo-ka lenzo-ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a modified ARPABET, without AX, AXR, IX, UX, EL, EM, EN, NX, Q, WH.

@Alexir
Copy link

Alexir commented Jun 1, 2023

cmudict was developed primarily for use in speech recognition. At some point it had ~50 symbols (e.g. aspirated stops like TH, DH; flaps, DX; AX/AH, and other variants. It was believed that maintaining phonetic distinctions was important. Turns out it wasn't (accousrtic modeling got better).

@danmartinez
Copy link

danmartinez commented Jul 17, 2023

@lenzo-ka @Coeur I agree. At minimum, the comment should say something like, "CMUdict transcriptions use a modified version of ARPABET encodings."

@Coeur Coeur requested a review from lenzo-ka July 22, 2023 16:29
@Coeur
Copy link
Contributor Author

Coeur commented Jul 22, 2023

Updated the PR, taking into account the review.

Note: please just apply your desired improvements, no need to wait years for the original author, ah ah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants