Skip to content

Implemented Jaro-Winkler distance algorithm #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Alokzh
Copy link

@Alokzh Alokzh commented Apr 11, 2025

Fixes: #1

This PR adds the Jaro-Winkler edit distance algorithm to the existing edit distance implementations in the repository. It returns value from 0.0 (no similarity) to 1.0 (exact match).

Changes:

  • Added AIJaroWinklerDistance class inheriting from AIAbstractEditDistance
  • Implemented the standard Jaro similarity calculation first with proper handling of matching characters and transpositions
  • Added Winkler's prefix adjustment
  • Configurable prefix scale (default 0.1) and maximum prefix length (default 4)

Screenshot:

Screenshot from 2025-04-11 10-17-47

@Alokzh
Copy link
Author

Alokzh commented Apr 11, 2025

Hi @hernanmd @jordanmontt I have been working on this since last week, I have added implementation & verified it is working

I have tried to replicate the original C implementation you provided but I was not able to understand few parts such as similar characters part (like 'O' and '0') so did not implement this part (Maybe I don't think it is required ) also I am not converting all strings to uppercase before matching as of now.

Please review it whenever you get time & let me know if any improvements are needed, meanwhile I am trying to write tests for this implementation.

@Alokzh
Copy link
Author

Alokzh commented Apr 12, 2025

I have also added Tests for the implementation

@jordanmontt
Copy link
Member

Hello, nice! the code looks ok. the problem is that you modified a lot of files so it's difficult to review it

@jordanmontt
Copy link
Member

Capture d’écran 2025-04-16 à 10 16 22

It also says that you deleted several extension methods

@jordanmontt
Copy link
Member

Capture d’écran 2025-04-16 à 10 17 02

And the CI is failing because of this PR. It can be because of the deleted tests

@Alokzh
Copy link
Author

Alokzh commented Apr 16, 2025

It also says that you deleted several extension methods

Hey Jordan , I did not delete them manually I asked the same doubt in discord channel why it is getting deleted automatically ?
As soon as I wrote tests & tried to run the tests some of the protocols were automaticlly changed to as yet unclassified , so what I did was I renamed them to tests protocol, pushed my changes & saw that those extension files were automatically deleted from the repo.
Also I noticed that all those files which have protocol as tests they don't have any extension.st file already.

@Alokzh
Copy link
Author

Alokzh commented Apr 16, 2025

And the CI is failing because of this PR. It can be because of the deleted tests

Jordan correct me if I am wrong but I think CI failed due to changes in last commit you made to repo where you Refactored & used CTArray2D instead of Array2D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Jaro-Winkler
2 participants