Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions harper-core/src/linting/phrase_corrections/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -973,13 +973,6 @@ pub fn lint_group() -> LintGroup {
"Changes `spacial attention` to `special attention`.",
LintKind::Typo
),
"SpokeTooSoon" => (
["spoke to soon"],
["spoke too soon"],
"Use the adverb `too` instead.",
"Identifies common misuse of the preposition `to` in the phrase `spoke too soon`.",
LintKind::Grammar
),
"Starving" => (
["very hungry", "really hungry", "extremely hungry"],
["starving"],
Expand Down
3 changes: 0 additions & 3 deletions harper-core/src/linting/phrase_corrections/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1182,9 +1182,6 @@ fn spacial_attention() {
assert_suggestion_result("spacial attention", lint_group(), "special attention");
}

// SpokeTooSoon
// -none-

// Starving
// -none-

Expand Down
43 changes: 43 additions & 0 deletions harper-core/src/linting/phrase_set_corrections/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,49 @@ pub fn lint_group() -> LintGroup {
"Corrects `rise the question` to `raise the question`.",
LintKind::Grammar
),
"ToToo" => (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this would get confused with the ToTwoToo rule. Maybe IdiomaticToToo?

Copy link
Collaborator Author

@hippietrail hippietrail Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this would get confused with the ToTwoToo rule. Maybe IdiomaticToToo?

I think there's a tension between names which will be useful for users since they're used in the options, and names which don't conflict for the programmers.

I think we were working on these at the same time. All uses of "to" and "too" are idiomatic. The proper solution is surely to merge them by way of a LintGroup. I think but am not sure that there's not a way to group lints together so that only the name of the group is exposed to user? I looked into that a week ago but forget.

(But see my other comment. I forget this PR was a phrase_corrections one.)

As a sidenote, while "two" is a third homophone for these, I've never seen it mixed up for the others in either direction. People seem to find one difference a lot more salient than the other.

&[
(&["a bridge to far"], &["a bridge too far"]),
(&["cake and eat it to"], &["cake and eat it too"]),
// "a few to many" has many false positives

(&["go to far"], &["go too far"]),
(&["goes to far"], &["goes too far"]),
(&["going to far"], &["going too far"]),
(&["gone to far"], &["gone too far"]),
(&["went to far"], &["went too far"]),

// "in to deep" has many false positives
(&["life's to short", "lifes to short"], &["life's too short"]),
(&["life is to short"], &["life is too short"]),

// "one to many" has many false positives
(&["put to fine a point"], &["put too fine a point"], ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can take a more data driven approach to this file as a whole. Is there a public dataset of these kinds of corrections?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah sorry. In my previous comment I forget the way I made this PR! For now I think ToTooIdioms and/or TooToIdioms or maybe ...InIdioms, but I still think putting them behind a single setting is best, even when some are phrase corrections and some are a dedicated linter. (Not Idiomatic since that term has a broader meaning than idiom.)

Since these ones are all in idioms I used the Wiktionary article for "too", which links all the idioms it's a part of. I checked them one by one since some are too marginal to include.

You could consider that a kind of "data driven". Idioms are special so finding lists of them already assembled is probably the way. For fixing "to" vs "too" in the general case outside of idioms a more "big data" approach analysing neighbouring word properties would be the best. As for all general grammar linters, especially syntax. It would require a good corpus of good English and a good corpus of bad English as well, ideally.


(&["speak to soon"], &["speak too soon"]),
(&["speaking to soon"], &["speaking too soon"]),
// "speaks to soon" is very rare
(&["spoke to soon"], &["spoke too soon"]),
(&["spoken to soon"], &["spoken too soon"]),

(&["think to much"], &["think too much"]),
(&["to big for"], &["too big for"]),
(&["to big to fail"], &["too big to fail"]),
(&["to good to be true", "too good too be true"], &["too good to be true"]),
(&["to much information"], &["too much information"]),
],
"Use `too` rather than `to` in this expression.",
"Corrects `to` used instead of `too`.",
LintKind::Grammar
),
"TooTo" => (
&[
(&["too big too fail"], &["too big to fail"])
],
"Use `to` rather than `too` in this expression.",
"Corrects `too` used instead of `to`.",
LintKind::Grammar
),
"WholeEntire" => (
&[
(&["whole entire"], &["whole", "entire"]),
Expand Down
222 changes: 222 additions & 0 deletions harper-core/src/linting/phrase_set_corrections/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1127,6 +1127,228 @@ fn detect_risen_the_question() {
);
}

// ToToo

// -a bridge too far-
#[test]
fn fix_a_bridge_too_far() {
assert_suggestion_result(
"If Winforms can ever be conquered by the Mono developers may be a bridge to far.",
lint_group(),
"If Winforms can ever be conquered by the Mono developers may be a bridge too far.",
);
}

// -cake and eat it too-
#[test]
fn fix_cake_and_eat_it_too() {
assert_suggestion_result(
"The solution: wouldn't it be great if I could have my cake and eat it to?",
lint_group(),
"The solution: wouldn't it be great if I could have my cake and eat it too?",
);
}

// -go to far-
#[test]
fn fix_go_to_far() {
assert_suggestion_result(
"It's difficult to be sure when we go to far sometime when you don't exactly how the beast works in the background .",
lint_group(),
"It's difficult to be sure when we go too far sometime when you don't exactly how the beast works in the background .",
);
}

// -goes to far-
#[test]
fn fix_goes_to_far() {
assert_suggestion_result(
"Memory consumption and cpu consumption goes to far like 900% and more than this",
lint_group(),
"Memory consumption and cpu consumption goes too far like 900% and more than this",
);
}

// -going to far-
#[test]
fn fix_going_to_far() {
assert_suggestion_result(
"wsrun is going to far on this because debug 's devDependency shouldn't be considered in the cycle detection, should it?",
lint_group(),
"wsrun is going too far on this because debug 's devDependency shouldn't be considered in the cycle detection, should it?",
);
}

// -gone to far-
#[test]
fn fix_gone_to_far() {
assert_suggestion_result(
"I might have gone to far with opening issues for small things.",
lint_group(),
"I might have gone too far with opening issues for small things.",
);
}

// -went to far-
#[test]
fn fix_went_to_far() {
assert_suggestion_result(
"But I went to far compared to the initial request that seems talk about ...",
lint_group(),
"But I went too far compared to the initial request that seems talk about ...",
);
}

// -life's too short-
#[test]
fn fix_life_s_too_short() {
assert_suggestion_result(
"Life's to short for messing around with git add , writing commit message.",
lint_group(),
"Life's too short for messing around with git add , writing commit message.",
);
}

#[test]
fn fix_lifes_to_short() {
assert_suggestion_result(
"I wouldn't go back after the 3rd interview lifes to short.",
lint_group(),
"I wouldn't go back after the 3rd interview life's too short.",
);
}

// -life is too short-
#[test]
fn fix_life_is_too_short() {
assert_suggestion_result(
"[Life is to short to use dated cli tools that suck]",
lint_group(),
"[Life is too short to use dated cli tools that suck]",
);
}

// -put too fine a point-
#[test]
fn fix_put_too_fine_a_point() {
assert_suggestion_result(
"Not to put to fine a point on it... that's not the kind of team I think we want to be.",
lint_group(),
"Not to put too fine a point on it... that's not the kind of team I think we want to be.",
);
}

// -speak too soon-
#[test]
fn fix_speak_too_soon() {
assert_suggestion_result(
"I don't want to speak to soon but I kept everything as I had before but included: http = httplib2.Http()",
lint_group(),
"I don't want to speak too soon but I kept everything as I had before but included: http = httplib2.Http()",
);
}

// -speaking too soon-
#[test]
fn fix_speaking_too_soon() {
assert_suggestion_result(
"EDIT: Thats what I get for speaking to soon...",
lint_group(),
"EDIT: Thats what I get for speaking too soon...",
);
}

// -spoke too soon-
#[test]
fn fix_spoke_too_soon() {
assert_suggestion_result(
"I spoke to soon. Ignore the previous post.",
lint_group(),
"I spoke too soon. Ignore the previous post.",
);
}

// -spoken too soon-
#[test]
fn fix_spoken_too_soon() {
assert_suggestion_result(
"EDIT: I might have spoken to soon...",
lint_group(),
"EDIT: I might have spoken too soon...",
);
}

// -think to much-
#[test]
fn fix_think_too_much() {
assert_suggestion_result(
"I don't think to much about it, but I don't think it's a big deal.",
lint_group(),
"I don't think too much about it, but I don't think it's a big deal.",
);
}

// -too big for-
#[test]
fn fix_too_big_for() {
assert_suggestion_result(
"ng-relations form to big for small screens",
lint_group(),
"ng-relations form too big for small screens",
);
}

// -too big to fail-
#[test]
fn fix_too_big_to_fail() {
assert_suggestion_result(
"The core alone has 50k LOC. Reminds me of \"to big to fail\".",
lint_group(),
"The core alone has 50k LOC. Reminds me of \"too big to fail\".",
);
}

// -too good to be true-
#[test]
fn fix_too_good_to_be_true() {
assert_suggestion_result(
"This seemed to good to be true, but local to scene resources will not work when they are not contained in a node.",
lint_group(),
"This seemed too good to be true, but local to scene resources will not work when they are not contained in a node.",
);
}

#[test]
fn fix_too_good_too_be_true() {
assert_suggestion_result(
"The normalization of rewards is making the plot in tensorboard look too good too be true, because they are not the actual reward ...",
lint_group(),
"The normalization of rewards is making the plot in tensorboard look too good to be true, because they are not the actual reward ...",
);
}

// -too much information-
#[test]
fn fix_too_much_information() {
assert_suggestion_result(
"Live test are printing way to much information and is polluting our test output",
lint_group(),
"Live test are printing way too much information and is polluting our test output",
);
}

// TooTo

// -too big too fail-
#[test]
fn fix_too_big_too_fail() {
assert_suggestion_result(
"In other words, pointer arithmetic is, at this point, too big too fail, regardless of the clever and sophisticated way C++ lawyercats worded it.",
lint_group(),
"In other words, pointer arithmetic is, at this point, too big to fail, regardless of the clever and sophisticated way C++ lawyercats worded it.",
);
}

// WholeEntire

#[test]
Expand Down
Loading