A JavaScript implementation of the Sørensen–Dice coefficient.
The dice object has two main methods:
-
The
coefficientmethod returns an index of similarity for two strings, between 0 and 1.dice.coefficient('rabbit season', 'duck season') = 0.545 -
The
overlapmethod returns an array indicating which strings are similar in two sets of strings.dice.overlap(['kiss', 'the', 'bride'], ['there', 'is', 'no', 'try']) = [1, 0, null]Another example:
strings input 1 -> ['AB', 'CD', 'EF', 'GH', 'IJ'] 0 1 2 3 4 | | | | | [ 0, 2, null, null, 3 ] -> overlap output | \ / | `-. .-' | \ / 0 1 2 3 strings input 2 -> ['AA', 'XY', 'CD', 'IJ']
The dice object also has two main properties:
-
The
multigramLengthproperty, set to2by default.To compare two strings, the algorithm splits them in sets of multigrams (sequences of
2characters by default, called bigrams). For each pair of identical multigrams, the Sørensen–Dice coefficient goes up. You can setdice.multigramLengthto a higher integer to increase the coefficient's accuracy. -
The
matchMinimumproperty, set to0.5by default.If two strings have a Sørensen–Dice coefficient higher than
0.5, they are considered similar. You can setdice.matchMinimumto a higher value to make the algorithm less tolerant.