-
Notifications
You must be signed in to change notification settings - Fork 234
feat: support roman numerals #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
elijah-potter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Good stuff.
In a separate PR, it might be a good idea to skip spellchecking on Roman numbers whose length is greater than or equal to two.
Or skipped if the roman numerals property is set but none of the POS properties perhaps. |
|
Ooh yes, I like that too. |
…into roman-numerals
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.61.0` -> `v0.62.0` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>Automattic/harper (Automattic/harper/harper-ls)</summary> ### [`v0.62.0`](https://github.com/Automattic/harper/releases/tag/v0.62.0) [Compare Source](Automattic/harper@v0.61.0...v0.62.0) #### What's Changed - Implement full verb form annotations by [@​hippietrail](https://github.com/hippietrail) in [#​1730](Automattic/harper#1730) - feat(core): create rule to detect duplicate punctuation by [@​elijah-potter](https://github.com/elijah-potter) in [#​1694](Automattic/harper#1694) - feat: add correction of degrees kelvin to -> kelvin by [@​lukasmwerner](https://github.com/lukasmwerner) in [#​1829](Automattic/harper#1829) - docs: add path to `stats.txt` for Linux and macOS by [@​hippietrail](https://github.com/hippietrail) in [#​1854](Automattic/harper#1854) - fix: missing `LintKind`s from `LintKind::new_from_str` by [@​hippietrail](https://github.com/hippietrail) in [#​1850](Automattic/harper#1850) - chore: various improvements to phrasal verb linter by [@​hippietrail](https://github.com/hippietrail) in [#​1824](Automattic/harper#1824) - feat:interested at/into/on/with→interested in by [@​hippietrail](https://github.com/hippietrail) in [#​1809](Automattic/harper#1809) - Add core.trac.wordrpess.org support to Chrome extension by [@​sirreal](https://github.com/sirreal) in [#​1865](Automattic/harper#1865) - feat: quiet⇔quite by [@​hippietrail](https://github.com/hippietrail) in [#​1781](Automattic/harper#1781) - feat:digestive track→digestive tract by [@​hippietrail](https://github.com/hippietrail) in [#​1837](Automattic/harper#1837) - Dictionary curation 2025 08 27 by [@​hippietrail](https://github.com/hippietrail) in [#​1853](Automattic/harper#1853) - fix: compound nouns that shouldn't be phrasal verbs by [@​hippietrail](https://github.com/hippietrail) in [#​1870](Automattic/harper#1870) - feat: would've never/would never have→never would have by [@​hippietrail](https://github.com/hippietrail) in [#​1794](Automattic/harper#1794) - fix(chrome-ext): flaky tests by [@​elijah-potter](https://github.com/elijah-potter) in [#​1868](Automattic/harper#1868) - build(deps): bump uuid from 1.18.0 to 1.18.1 by [@​dependabot](https://github.com/dependabot)\[bot] in [#​1883](Automattic/harper#1883) - build(deps): bump clap from 4.5.45 to 4.5.47 by [@​dependabot](https://github.com/dependabot)\[bot] in [#​1879](Automattic/harper#1879) - build(deps): bump tree-sitter from 0.25.8 to 0.25.9 by [@​dependabot](https://github.com/dependabot)\[bot] in [#​1880](Automattic/harper#1880) - chore: mostly annotating verbs and nouns by [@​hippietrail](https://github.com/hippietrail) in [#​1874](Automattic/harper#1874) - build(deps): bump tree-sitter-javascript from 0.23.1 to 0.25.0 by [@​dependabot](https://github.com/dependabot)\[bot] in [#​1882](Automattic/harper#1882) - build(deps): bump foldhash from 0.1.5 to 0.2.0 by [@​dependabot](https://github.com/dependabot)\[bot] in [#​1881](Automattic/harper#1881) - feat: addicting→addictive by [@​hippietrail](https://github.com/hippietrail) in [#​1886](Automattic/harper#1886) - Update place names 978 by [@​hippietrail](https://github.com/hippietrail) in [#​1013](Automattic/harper#1013) - feat: windscreen vs windshield regionalism by [@​hippietrail](https://github.com/hippietrail) in [#​1888](Automattic/harper#1888) - feat: support roman numerals by [@​hippietrail](https://github.com/hippietrail) in [#​1851](Automattic/harper#1851) - feat: add colour to `harper-cli nominal-phrases` and `just getnps-colour` by [@​hippietrail](https://github.com/hippietrail) in [#​1869](Automattic/harper#1869) #### New Contributors - [@​sirreal](https://github.com/sirreal) made their first contribution in [#​1865](Automattic/harper#1865) **Full Changelog**: <Automattic/harper@v0.61.0...v0.62.0> </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS45OC4xIiwidXBkYXRlZEluVmVyIjoiNDEuOTguMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Issues
Not filed as an issue but Roman numerals came up in discussion in #1190
Description
Adds support for Roman numerals. Not as a
TokenKindsince words, especiallyIcan be simultaneously an English word and a Roman numeral, and only analysing the context can reveal which. Instead they are added as part of the orthography flags in theWordMetadatathat also tracks whether dictionary entries are capitalized, hyphenated, apostrophized, use camel case, etc.The code first checks that only the letters MDCLXVI are used and they can't be mixed case. If that test passes then a more complex check is done to make sure that the letters are in a sensible order. We could probably go with just the second test but my hunch was that it would be slower or have a higher potential for edge cases. I can remove the preliminary test if that way is preferred.
In the POS tags snapshots they are marked with
#rin line with#for regular numbers and#dfor decades.Demo
How Has This Been Tested?
Unit tests have been added to enforce all the above constraints.
Checklist