Version 2.0 introduces two major changes:
- A new method to traverse the composition graph, which dramatically improves the overall speed, especially when the sequences are long contain many errors. We have files that took 25 minutes to align before that can now take about 7 seconds. This is especially noticeable with the adapted composition (the default).
- Some smarts were introduced when --use-case and --use-punctuation are enabled. Now, by default, punctuation symbols can only be substituted by other punctuation symbols (or deleted/inserted). Also, words that differ only by the first letter case will be preffered for substitution.
These behavior, as well as the beam size (that has a default value of 50.0) can be controlled with the following new parameters:
--disable-strict-punctuation: Disable strict punctuation alignment (which prevents punctuation aligning with words).
--disable-favored-subs Disable favored substitutions (which makes alignment favor substitutions between words which differ only by case).
--favored-sub-cost FLOAT Cost for favored substitutions (e.g., case diff). Default: 0.1
See the README.md for more details.