Skip to content

Conversation

@hippietrail
Copy link
Collaborator

Issues

Not filed as an issue but Roman numerals came up in discussion in #1190

Description

Adds support for Roman numerals. Not as a TokenKind since words, especially I can be simultaneously an English word and a Roman numeral, and only analysing the context can reveal which. Instead they are added as part of the orthography flags in the WordMetadata that also tracks whether dictionary entries are capitalized, hyphenated, apostrophized, use camel case, etc.

The code first checks that only the letters MDCLXVI are used and they can't be mixed case. If that test passes then a more complex check is done to make sure that the letters are in a sensible order. We could probably go with just the second test but my hunch was that it would be slower or have a higher potential for edge cases. I can remove the preliminary test if that way is preferred.

In the POS tags snapshots they are marked with #r in line with # for regular numbers and #d for decades.

Demo

image

How Has This Been Tested?

Unit tests have been added to enforce all the above constraints.

Checklist

  • I have performed a self-review of my own code
  • I have added tests to cover my changes

elijah-potter
elijah-potter previously approved these changes Sep 8, 2025
Copy link
Collaborator

@elijah-potter elijah-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Good stuff.

In a separate PR, it might be a good idea to skip spellchecking on Roman numbers whose length is greater than or equal to two.

@hippietrail
Copy link
Collaborator Author

I like this. Good stuff.

In a separate PR, it might be a good idea to skip spellchecking on Roman numbers whose length is greater than or equal to two.

Or skipped if the roman numerals property is set but none of the POS properties perhaps.

@elijah-potter
Copy link
Collaborator

Ooh yes, I like that too.

@elijah-potter elijah-potter added this pull request to the merge queue Sep 8, 2025
Merged via the queue into Automattic:master with commit fbab988 Sep 8, 2025
23 checks passed
@hippietrail hippietrail deleted the roman-numerals branch September 9, 2025 02:39
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Sep 12, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [Automattic/harper/harper-ls](https://github.com/Automattic/harper) | minor | `v0.61.0` -> `v0.62.0` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>Automattic/harper (Automattic/harper/harper-ls)</summary>

### [`v0.62.0`](https://github.com/Automattic/harper/releases/tag/v0.62.0)

[Compare Source](Automattic/harper@v0.61.0...v0.62.0)

#### What's Changed

- Implement full verb form annotations by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1730](Automattic/harper#1730)
- feat(core): create rule to detect duplicate punctuation by [@&#8203;elijah-potter](https://github.com/elijah-potter) in [#&#8203;1694](Automattic/harper#1694)
- feat: add correction of degrees kelvin to -> kelvin by [@&#8203;lukasmwerner](https://github.com/lukasmwerner) in [#&#8203;1829](Automattic/harper#1829)
- docs: add path to `stats.txt` for Linux and macOS by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1854](Automattic/harper#1854)
- fix: missing `LintKind`s from `LintKind::new_from_str` by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1850](Automattic/harper#1850)
- chore: various improvements to phrasal verb linter by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1824](Automattic/harper#1824)
- feat:interested at/into/on/with→interested in by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1809](Automattic/harper#1809)
- Add core.trac.wordrpess.org support to Chrome extension by [@&#8203;sirreal](https://github.com/sirreal) in [#&#8203;1865](Automattic/harper#1865)
- feat: quiet⇔quite by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1781](Automattic/harper#1781)
- feat:digestive track→digestive tract by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1837](Automattic/harper#1837)
- Dictionary curation 2025 08 27 by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1853](Automattic/harper#1853)
- fix: compound nouns that shouldn't be phrasal verbs by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1870](Automattic/harper#1870)
- feat: would've never/would never have→never would have by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1794](Automattic/harper#1794)
- fix(chrome-ext): flaky tests by [@&#8203;elijah-potter](https://github.com/elijah-potter) in [#&#8203;1868](Automattic/harper#1868)
- build(deps): bump uuid from 1.18.0 to 1.18.1 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in [#&#8203;1883](Automattic/harper#1883)
- build(deps): bump clap from 4.5.45 to 4.5.47 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in [#&#8203;1879](Automattic/harper#1879)
- build(deps): bump tree-sitter from 0.25.8 to 0.25.9 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in [#&#8203;1880](Automattic/harper#1880)
- chore: mostly annotating verbs and nouns by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1874](Automattic/harper#1874)
- build(deps): bump tree-sitter-javascript from 0.23.1 to 0.25.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in [#&#8203;1882](Automattic/harper#1882)
- build(deps): bump foldhash from 0.1.5 to 0.2.0 by [@&#8203;dependabot](https://github.com/dependabot)\[bot] in [#&#8203;1881](Automattic/harper#1881)
- feat: addicting→addictive by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1886](Automattic/harper#1886)
- Update place names 978 by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1013](Automattic/harper#1013)
- feat: windscreen vs windshield regionalism by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1888](Automattic/harper#1888)
- feat: support roman numerals by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1851](Automattic/harper#1851)
- feat: add colour to `harper-cli nominal-phrases` and `just getnps-colour` by [@&#8203;hippietrail](https://github.com/hippietrail) in [#&#8203;1869](Automattic/harper#1869)

#### New Contributors

- [@&#8203;sirreal](https://github.com/sirreal) made their first contribution in [#&#8203;1865](Automattic/harper#1865)

**Full Changelog**: <Automattic/harper@v0.61.0...v0.62.0>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS45OC4xIiwidXBkYXRlZEluVmVyIjoiNDEuOTguMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants