Skip to content

Conversation

houston-d
Copy link

@houston-d houston-d commented Jun 17, 2025

Closes #17

When n >= 4, it is possible to get a negative keyword score returned. This is occurs when there are two or more stopwords in a row. This causes sum_h < -1 and so the denominator remains negative when calculating self.h for the composed word.

This PR remedies this by treating a chain of stopwords as a single entity for calculating h. When a stopword is located, we calculate prob_t1 as before. Now, prob_t2 is calculated using the final stopword in the chain. The loop then skips to the word following the chain.

As all existing tests use n <= 3, their result remains unchanged. An additional test for n=4 is added in English using a portion of text from https://en.wikipedia.org/wiki/Natural_language_processing.

Furthermore, the CONTRIBUTING.rst is updated to reflect the use of poetry as a package manager.

Copilot

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Negative or Zero YAKE score

1 participant