-
Notifications
You must be signed in to change notification settings - Fork 203
feat: to/too mixups in set phrases #1839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly ready, I just have the one minor nitpick.
@@ -353,6 +353,49 @@ pub fn lint_group() -> LintGroup { | |||
"Corrects `rise the question` to `raise the question`.", | |||
LintKind::Grammar | |||
), | |||
"ToToo" => ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried this would get confused with the ToTwoToo
rule. Maybe IdiomaticToToo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried this would get confused with the
ToTwoToo
rule. MaybeIdiomaticToToo
?
I think there's a tension between names which will be useful for users since they're used in the options, and names which don't conflict for the programmers.
I think we were working on these at the same time. All uses of "to" and "too" are idiomatic. The proper solution is surely to merge them by way of a LintGroup
. I think but am not sure that there's not a way to group lints together so that only the name of the group is exposed to user? I looked into that a week ago but forget.
(But see my other comment. I forget this PR was a phrase_corrections
one.)
As a sidenote, while "two" is a third homophone for these, I've never seen it mixed up for the others in either direction. People seem to find one difference a lot more salient than the other.
(&["life is to short"], &["life is too short"]), | ||
|
||
// "one to many" has many false positives | ||
(&["put to fine a point"], &["put too fine a point"], ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we can take a more data driven approach to this file as a whole. Is there a public dataset of these kinds of corrections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah sorry. In my previous comment I forget the way I made this PR! For now I think ToTooIdioms
and/or TooToIdioms
or maybe ...InIdioms
, but I still think putting them behind a single setting is best, even when some are phrase corrections and some are a dedicated linter. (Not Idiomatic
since that term has a broader meaning than idiom
.)
Since these ones are all in idioms I used the Wiktionary article for "too", which links all the idioms it's a part of. I checked them one by one since some are too marginal to include.
You could consider that a kind of "data driven". Idioms are special so finding lists of them already assembled is probably the way. For fixing "to" vs "too" in the general case outside of idioms a more "big data" approach analysing neighbouring word properties would be the best. As for all general grammar linters, especially syntax. It would require a good corpus of good English and a good corpus of bad English as well, ideally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly ready, I just have the one minor nitpick.
Issues
Help out with #1832
Description
Looks for idioms and set phrases that should use "too" where instead "to" has been used.
Also one case of the opposite.
Also moved and expanded "spoke to soon" from
phrase_corrections
tophrase_set_corrections
.I only included those that got more than a few search hits.
For others which got plenty of search hits but also lots of false positives, I included comments so people won't hastily try to add them.
How Has This Been Tested?
I added one unit test for each pattern that's flagged, all or most sourced from GitHub.
Checklist