Skip to content

C.18 Readability

PriyaDCosta edited this page Mar 7, 2023 · 7 revisions

1. Feature Name

C.18 Readbility

The Dale–Chall readability formula is a readability test that provides a numeric gauge of the comprehension difficulty that readers come upon when reading a text. It uses a list of 3000 words that groups of fourth-grade American students could reliably understand, considering any word not on that list to be difficult.

The Formula for the Dale-Chall readability score is:

The formula for calculating the raw score of the Dale–Chall readability score (1948) is given below:

0.1579(difficult words ×100/words) + 0.0496(words/sentences)

Scores range from 0 - 10, details can be found in the link below.

Details: https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula

Credits: Wikipedia

2. Literature Source (Serial Number, link)

C.18

3. Description of how the feature is computed (In Layman’s terms)

  1. The dale_chall_helper function takes a text as input and calculates the raw score based on the percentage of difficult words and the average sentence length. It then adjusts the raw score based on a threshold for the percentage of difficult words and returns the adjusted score.

  2. The count_syllables function counts the number of syllables in a given word using a set of rules based on vowel sounds.

  3. The count_words function counts the number of words in a given text.

  4. The count_difficult_words function counts the number of words in a given text that are not in a list of common words from the Dale-Chall word list and have three or more syllables.

  5. The classify_text function takes a score as input and returns the corresponding classification based on the score thresholds for easy, medium, and difficult. Classification is also possible as per grades (View the Wikipedia Link). Dale-Chall 3000 word list original source

4. Algorithms used (KNN, Logistic Regression etc.)

None

5. ML Inputs/Features

None

6. Statistical concepts used

None

7. Pages of the literature to be referred to for details

PDF Pages 9,10

8. Any tweaks/changes/adaptions made from the original source

None

9. Testing

Within readability.py file - also approximately matched with Dale Chall Readability score from this website

Clone this wiki locally