Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
60e0372
feat: handle multiple derivations for words in the metadata
hippietrail Apr 6, 2025
4413bb4
Merge commit 'bcc3df7718e4d4e766228a9e9f074cc14fd366aa' into multiple…
hippietrail Jul 30, 2025
2099ce9
Merge commit 'f4ce421bb496fd3b0b5c7bb7be3f363ab2a0d900' into multiple…
hippietrail Jul 30, 2025
06916c6
Merge commit 'cc1f15bacac35bb86c5560c770235cb387aec9d2' into multiple…
hippietrail Jul 30, 2025
b98c589
Merge commit 'f7b6bc18cbb49e4ac3ca40075e4b64e676e68707' into multiple…
hippietrail Jul 30, 2025
80f63fe
Merge commit '3144b95385aac58ba961c6c6c73d97628f919348' into multiple…
hippietrail Jul 30, 2025
da7b544
Merge commit '00ea7330d5b078d14ebf6e05d5fc2d0fbc372eab' into multiple…
hippietrail Jul 30, 2025
fb3ad6f
Merge commit 'e0c1acbbdbdc679214a6f02add24d7b9c3c069a6' into multiple…
hippietrail Jul 30, 2025
535872d
Merge commit 'c0b4dbbabbe679e0e733e5f20912850b866528cd' into multiple…
hippietrail Jul 30, 2025
02e9d0f
Merge commit '32385de6abe68d548bac66e4bdd563b1ab323ef2' into multiple…
hippietrail Jul 30, 2025
e538a37
Merge commit '84fc9d8746b8ddf2f9e2ad26acb8abf45f3b7e78' into multiple…
hippietrail Jul 30, 2025
123fd7f
chore: merging old branch bit by bit
hippietrail Jul 30, 2025
abac832
Merge commit '8878c2c94a9edb307e33bba4d742b0b4d03b01c4' into multiple…
hippietrail Jul 30, 2025
b520d77
Merge commit '6f5da14ca5c3c80187d0a3e132455022511efbad' into multiple…
hippietrail Jul 30, 2025
50afbe0
Merge commit '09c826fe2fa5ea051aa059124dad886c5f0df47e' into multiple…
hippietrail Jul 30, 2025
5756660
Merge commit '5a70b9b69f9f5277a25eb03e679f0b1cd1e74ff8' into multiple…
hippietrail Jul 30, 2025
3b26a8e
Merge commit '74608430b581cab91c8ef941301f5e2f8b7343ba' into multiple…
hippietrail Jul 30, 2025
4782a86
Merge commit '47aa8495f0949661e1fd2c5cde2ff9c1e9c1f404' into multiple…
hippietrail Jul 30, 2025
ef1f9c6
Merge commit '1dc6a185a985fcb2ca462b1b7cdd08cf9a199b3e' into multiple…
hippietrail Jul 30, 2025
a08fafc
Merge commit 'a2e0da7a841fba9436c2fa1bbfbabf4bee4e314d' into multiple…
hippietrail Jul 30, 2025
065d637
Merge commit '1c87e4f63c1f3debd3d42876eb90a55608cd5d9b' into multiple…
hippietrail Jul 30, 2025
cca8c0d
Merge commit 'cf5c58b8c3d41e2ad030b7401b38b4261a41e5a3' into multiple…
hippietrail Jul 30, 2025
bc50097
Merge commit '1ed112e24f10a45739fc3f71a47151450dc938ce' into multiple…
hippietrail Jul 30, 2025
d852ccf
chore: merge old pr branch bit by bit
hippietrail Jul 30, 2025
927d03c
Merge commit '13155e20bf6a351902af83583f179cd17ae05f96' into multiple…
hippietrail Jul 30, 2025
ce90fd7
Merge commit '19ebb9915745a6c783c50e4e19a17dd379fc3150' into multiple…
hippietrail Jul 30, 2025
061ac48
Merge commit '5ccbc9a93c47a0dd0ceab84d5af63e040fc945e6' into multiple…
hippietrail Jul 30, 2025
88c4ee0
Merge commit 'fabe9452a7329c6d6a97a38c51c3ba0d84103a85' into multiple…
hippietrail Jul 30, 2025
a333a32
Merge commit '1e2c6c332a966bdbbe3d1398dc9d06db4906af31' into multiple…
hippietrail Jul 30, 2025
23612a4
Merge commit '8edba54788fe0ad9a816d0e777f06eb1b6577822' into multiple…
hippietrail Jul 30, 2025
04fdf1d
Merge commit '0b4bf707f2276f1c6710bbce6b3e847869e43610' into multiple…
hippietrail Jul 30, 2025
1c5bade
Merge commit '8689146c558e4bc79ca3f3e3fc849273d7dc06f8' into multiple…
hippietrail Jul 30, 2025
7ee6b63
Merge commit '601f16a2a73bff81016ed9d70b89db0a34ce38a7' into multiple…
hippietrail Jul 30, 2025
84440fb
Merge commit '36e4c7d2e3c2f4cbbd19d5bb5174cb01d7c35f2d' into multiple…
hippietrail Jul 30, 2025
1d604b2
Merge commit '71447728518d333d879169b1feb01e2a9c2fa090' into multiple…
hippietrail Jul 30, 2025
46bb627
Merge commit '6b5d18adba946435f7e60204808f6116129bce14' into multiple…
hippietrail Jul 30, 2025
e72808b
Merge commit 'ec475af7ea1d540b5e0986994e6ce0b177e5051d' into multiple…
hippietrail Jul 30, 2025
a4efc14
Merge commit '9194abb45a2375e00d4b115609cef350586780a6' into multiple…
hippietrail Jul 30, 2025
624705e
Merge commit '219abb7992b8f781992d9b3e494f05ec2eac2db1' into multiple…
hippietrail Jul 30, 2025
365644e
Merge commit 'e9f5947349595359c943221d0dace174321ca760' into multiple…
hippietrail Jul 30, 2025
58dabc3
Merge commit '770c9b30f24e1250268620aa16b076a708bd485b' into multiple…
hippietrail Jul 30, 2025
8bbae83
Merge commit '2fcaa32c00c8da4c5296aaf78aff522bc291f0da' into multiple…
hippietrail Jul 30, 2025
5b5cd8b
Merge commit '54db7f3ebfcc95c60a63b9b5ae0cab6fe32c48d2' into multiple…
hippietrail Jul 30, 2025
a97ec85
Merge commit '5c14363613013c3e2f6e6cc0717b38623ab699ae' into multiple…
hippietrail Jul 30, 2025
7da4ade
Merge commit '0143398cdefb7423787b9c30aa7d2aaeb86208b3' into multiple…
hippietrail Jul 30, 2025
5f9e263
Merge commit '6a8e1e9695892ecc7d5f8789ed4856f00e595c04' into multiple…
hippietrail Jul 30, 2025
2debc6e
Merge commit '12e41304e047d64a23f9718424a6f81228368f61' into multiple…
hippietrail Jul 30, 2025
6370a27
Merge commit 'f27bd61c260b61d340626041bf948c2e828c409e' into multiple…
hippietrail Jul 30, 2025
dfcf05e
Merge commit '93175e43c7ae47e2ad8ff031db753830bb9e1f16' into multiple…
hippietrail Jul 30, 2025
cf5447d
Merge commit '5c2e6d035c7b1862d569de052bbb6c5bc8e08338' into multiple…
hippietrail Jul 30, 2025
f033355
Merge commit 'cd63a5ec41772da75fab060d2fc05e2ad0fe2fc1' into multiple…
hippietrail Jul 30, 2025
a8564af
Merge commit '292720388f020ea3f963af3acc9fd9ff92ba1d96' into multiple…
hippietrail Jul 30, 2025
1b2cdae
Merge commit 'e3e573520eac82ad575573de8da97e4ab36aa4a6' into multiple…
hippietrail Jul 30, 2025
cfde7e6
chore: merge old pr branch bit by bit
hippietrail Jul 30, 2025
ddd8be4
Merge commit 'db89187c3f9dd06d3e6ffeb980ea068ec3c1f1d6' into multiple…
hippietrail Jul 30, 2025
7a4ed87
Merge commit '3ec4daeb71421da45b0546893cd41d5f45e34067' into multiple…
hippietrail Jul 30, 2025
9ae88ac
Merge commit '57de1b03704dffca274fbff195c5d1045c7e1728' into multiple…
hippietrail Jul 30, 2025
f3d8af0
Merge commit 'c05f5342b0258902e67c3c4b2243ec81410feead' into multiple…
hippietrail Jul 30, 2025
5afb765
Merge commit 'ead504e9525f071fad04abe2297faf51f2a166af' into multiple…
hippietrail Jul 30, 2025
ec7b748
Merge commit '86b03d67a45735fad2bcf48d7d39a3bcdb8ebf3e' into multiple…
hippietrail Jul 30, 2025
b648e9e
Merge commit '973b7b82353b7297331d5230c83f5791b0f247e1' into multiple…
hippietrail Jul 30, 2025
8da5e9b
Merge commit '0c04291bfec25d0e934eaeb057d0f54af8e14a78' into multiple…
hippietrail Jul 30, 2025
e858ad0
Merge commit '9fee646e9d1333fb9722bc385f8bdeabe730a24f' into multiple…
hippietrail Jul 30, 2025
ef89aec
Merge commit '5230a9d082fc4f9a023ecf4f7f6b599a5aa24d00' into multiple…
hippietrail Jul 30, 2025
4af9ee2
Merge commit 'e9ff4c977e882121294fc7d405f5fe57cd30939e' into multiple…
hippietrail Jul 30, 2025
8795944
Merge commit 'f27a6748bde71120942362252034525c5548636a' into multiple…
hippietrail Jul 30, 2025
5be4f5d
Merge commit 'a8983e3f8af82291bb906eacb71af7f35d502ab2' into multiple…
hippietrail Jul 30, 2025
e5d2012
Merge commit 'cd534b6cc4e4248cc356693bd6b581a6f391cb63' into multiple…
hippietrail Jul 30, 2025
3ce2a99
Merge commit '92d004796eec41ed2f133aa787b8743b251dd38d' into multiple…
hippietrail Jul 30, 2025
5a6605d
Merge commit 'bb84be8310c5e92ab5650b0888e682202d586a3d' into multiple…
hippietrail Jul 30, 2025
008d44f
Merge commit 'c87adcdc1a13a1cfb485f4339f48d4ca5f63ba77' into multiple…
hippietrail Jul 30, 2025
df13444
chore: merge old pr branch bit by bit
hippietrail Jul 30, 2025
7e48780
Merge commit '92b964d0381ce9e26e23b1293c6cabf06feaa351' into multiple…
hippietrail Jul 30, 2025
5faeae4
Merge commit '21888aab558f8fc1188a81210a9b5e9599829060' into multiple…
hippietrail Jul 30, 2025
e414c92
Merge commit 'e97f5975ad8a058f4d5d481124ff48a0fd93b52b' into multiple…
hippietrail Jul 30, 2025
eae984c
Merge commit '5c9d8df2fb4768337ca0772571595e88cf8010e2' into multiple…
hippietrail Jul 30, 2025
1caa25d
Merge commit '29971de8c54e605d1250218b6ca1f4814fc4c55f' into multiple…
hippietrail Jul 30, 2025
7b791fa
Merge commit '73cd8c38305087b68fecbe468e2211c77288151c' into multiple…
hippietrail Jul 30, 2025
fbbeb38
Merge commit '710a1ff6cf389e46e381a0282dc5a2b2cf9a82c3' into multiple…
hippietrail Jul 30, 2025
96d7c66
Merge commit 'b325d5dbe24ff4b79482dcb651673690b44ae6ad' into multiple…
hippietrail Jul 30, 2025
cbd3ff9
Merge commit '705331f878f61974730b5ea4fa0a470bb21b7800' into multiple…
hippietrail Jul 30, 2025
67b9696
Merge commit '26daaa516a5470811c18f504c4f272e4820280e8' into multiple…
hippietrail Jul 30, 2025
074632b
Merge commit 'a965aaa086a5a17335a1b384a039afe264012f3b' into multiple…
hippietrail Jul 30, 2025
d3d7514
Merge commit '5238f7c1a7948bc04af25da5c674a1f5b94f9d95' into multiple…
hippietrail Jul 30, 2025
f0a4ebe
Merge commit '82f20e9d838cfdeed64d3ee4e17c56f8178027d9' into multiple…
hippietrail Jul 30, 2025
f553e04
chore: merge old pr branch bit by bit
hippietrail Jul 30, 2025
30c7f71
Merge commit 'b6f66f5b93a91da4d981c0686dd6a7b5035a855a' into multiple…
hippietrail Jul 30, 2025
b6f545a
Merge commit '59465d54bec0a977756d4130a740b90b8acf612e' into multiple…
hippietrail Jul 30, 2025
ae1d34c
Merge commit '55a475eface67d0cb509b56eeb42ff57ee07898a' into multiple…
hippietrail Jul 30, 2025
c04c6da
Merge commit 'f30d08478d75715d2c2c9fc9da551c3a1884f6ea' into multiple…
hippietrail Jul 30, 2025
88efd87
Merge commit '9bbe9b7051d03beb91b0c626174915b6314ffcb0' into multiple…
hippietrail Jul 30, 2025
567fc48
Merge commit 'a1fb3d4f4ba7185cf6d41028fdca3e58c97a7393' into multiple…
hippietrail Jul 30, 2025
623d62c
Merge commit '64b20a843008af5a94a6ebe85668d56c4d9082e6' into multiple…
hippietrail Jul 30, 2025
0bb3a66
Merge commit '1214bd8e1c65d1196bd583aea470914cd5441b4f' into multiple…
hippietrail Jul 30, 2025
a504e63
Merge commit '1cef35cb66cf2d8ad6a0a9c4fb2554f9df65540a' into multiple…
hippietrail Jul 30, 2025
3c648fc
Merge commit '65b0292760a125f8ebf0b8a098002a79fefb9412' into multiple…
hippietrail Jul 30, 2025
637523c
Merge commit '2d358c24d0b7605a4406030d297208dec4255748' into multiple…
hippietrail Jul 30, 2025
e97226e
Merge commit '569d6162f01b4755f874ee5d1730cd0422300229' into multiple…
hippietrail Jul 30, 2025
5d9bb1f
Merge commit 'a604ec448ea85b9965d8ef93e2385ed5803b5cbb' into multiple…
hippietrail Jul 30, 2025
2635db9
Merge commit 'a2bc3743a0b8cd7250a0411f290664dd45f6b040' into multiple…
hippietrail Jul 30, 2025
f396684
Merge commit '4f09cecfc08d02b552d52836f9b3a6cd51b19497' into multiple…
hippietrail Jul 30, 2025
06886ae
Merge commit '71521f2a1bbd2cd6631225e951c187df933d69be' into multiple…
hippietrail Jul 30, 2025
7ac10bf
Merge commit 'f79548fd2ed3e7cb63e1093f91487efc191b1512' into multiple…
hippietrail Jul 30, 2025
923647c
Merge commit '37b0ac5675baad870cec1776038d9c8e09d1bc8e' into multiple…
hippietrail Jul 30, 2025
44588ee
Merge commit '6849aad2b331a9b4efd9b5fc3a13e8f7c626eb40' into multiple…
hippietrail Jul 30, 2025
159a320
Merge commit '90a66a9c8fc7f6308986b117ab2a623c9909a3dd' into multiple…
hippietrail Jul 30, 2025
0591673
Merge commit 'df118218f59a2411694e403b14313385de6ed730' into multiple…
hippietrail Jul 30, 2025
abd55f7
Merge commit '88244550f829afae8b0cd86fd42b972863c56ca7' into multiple…
hippietrail Jul 30, 2025
e58c3f2
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Aug 2, 2025
aa2916e
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Aug 12, 2025
9403b1d
fix: appease precommit
hippietrail Aug 12, 2025
d1b7d5c
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Aug 15, 2025
7def98a
Merge branch 'master' into multiple-derivations
hippietrail Aug 17, 2025
1a1e0bc
Merge branch 'master' into multiple-derivations
hippietrail Aug 18, 2025
8b7627f
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Aug 31, 2025
2f0084b
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Sep 2, 2025
f8f5c55
chore: merge with upstream
hippietrail Sep 2, 2025
1c5e616
Merge branch 'master' into multiple-derivations
hippietrail Sep 4, 2025
7a957cf
Merge branch 'master' into multiple-derivations
hippietrail Sep 6, 2025
c78f13e
Merge branch 'multiple-derivations' of https://github.com/hippietrail…
hippietrail Sep 10, 2025
61dc9e6
Merge branch 'master' of https://github.com/Automattic/harper into mu…
hippietrail Sep 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 38 additions & 4 deletions harper-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ enum Args {
/// The document to mine words from.
file: PathBuf,
},
/// Get the word associated with a particular word id.
WordFromId { hash: u64 },
#[cfg(feature = "training")]
TrainBrillTagger {
#[arg(short, long, default_value = "1.0")]
Expand Down Expand Up @@ -368,7 +370,29 @@ fn main() -> anyhow::Result<()> {
let mut results = BTreeMap::new();
for word in words {
let metadata = dictionary.get_word_metadata_str(&word);
results.insert(word, metadata);
let mut metadata_value = serde_json::to_value(metadata).unwrap_or_default();

// If there are derived words, add them to the metadata
if let Some(metadata) = dictionary.get_word_metadata_str(&word)
&& let Some(derived_from) = &metadata.derived_from
{
let derived_words: Vec<String> = derived_from
.iter()
.filter_map(|wordid| dictionary.get_word_from_id(wordid))
.map(|word| word.iter().collect())
.collect();

if !derived_words.is_empty()
&& let Some(obj) = metadata_value.as_object_mut()
{
obj.insert(
"derived_from_words".to_string(),
serde_json::json!(derived_words),
);
}
}

results.insert(word, metadata_value);
}
let json = serde_json::to_string_pretty(&results).unwrap();
println!("{json}");
Expand Down Expand Up @@ -502,6 +526,11 @@ fn main() -> anyhow::Result<()> {

Ok(())
}
Args::WordFromId { hash } => {
let id = WordId::from_hash(hash);
println!("{:?}", dictionary.get_word_from_id(&id));
Ok(())
}
Args::CoreVersion => {
println!("harper-core v{}", harper_core::core_version());
Ok(())
Expand Down Expand Up @@ -853,9 +882,14 @@ fn print_word_derivations(word: &str, annot: &str, dictionary: &impl Dictionary)

let id = WordId::from_word_str(word);

let children = dictionary
.words_iter()
.filter(|e| dictionary.get_word_metadata(e).unwrap().derived_from == Some(id));
let children = dictionary.words_iter().filter(|e| {
dictionary
.get_word_metadata(e)
.unwrap()
.derived_from
.as_ref()
.is_some_and(|derived| derived.contains(&id))
});

println!(" - {word}");

Expand Down
4 changes: 2 additions & 2 deletions harper-core/src/fat_token.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use crate::{CharStringExt, TokenKind};

/// A [`Token`](crate::Token) that holds its content as a fat [`Vec<char>`] rather than as a
/// [`Span`](crate::Span).
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, PartialOrd, Hash, Eq)]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Hash, Eq)]
pub struct FatToken {
pub content: Vec<char>,
pub kind: TokenKind,
Expand All @@ -20,7 +20,7 @@ impl From<FatStringToken> for FatToken {
}

/// Similar to a [`FatToken`], but uses a [`String`] as the underlying store.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, PartialOrd, Hash, Eq)]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct FatStringToken {
pub content: String,
pub kind: TokenKind,
Expand Down
12 changes: 11 additions & 1 deletion harper-core/src/ignored_lints/lint_context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use crate::{

/// A location-agnostic structure that attempts to captures the context and content that a [`Lint`]
/// occurred.
#[derive(Debug, Hash, Serialize, Deserialize)]
#[derive(Debug, Serialize, Deserialize)]
pub struct LintContext {
pub lint_kind: LintKind,
pub suggestions: Vec<Suggestion>,
Expand All @@ -18,6 +18,16 @@ pub struct LintContext {
pub tokens: Vec<FatToken>,
}

impl Hash for LintContext {
fn hash<H: Hasher>(&self, state: &mut H) {
self.lint_kind.hash(state);
self.suggestions.hash(state);
self.message.hash(state);
self.priority.hash(state);
self.tokens.hash(state);
}
}

impl LintContext {
pub fn from_lint(lint: &Lint, document: &Document) -> Self {
let Lint {
Expand Down
31 changes: 17 additions & 14 deletions harper-core/src/spell/fst_dictionary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -306,52 +306,55 @@ mod tests {
#[test]
fn plural_llamas_derived_from_llama() {
let dict = FstDictionary::curated();

assert_eq!(
assert!(
dict.get_word_metadata_str("llamas")
.unwrap()
.derived_from
.unwrap(),
WordId::from_word_str("llama")
)
.as_ref()
.unwrap()
.contains(&WordId::from_word_str("llama"))
);
}

#[test]
fn plural_cats_derived_from_cat() {
let dict = FstDictionary::curated();

assert_eq!(
assert!(
dict.get_word_metadata_str("cats")
.unwrap()
.derived_from
.unwrap(),
WordId::from_word_str("cat")
.as_ref()
.unwrap()
.contains(&WordId::from_word_str("cat"))
);
}

#[test]
fn unhappy_derived_from_happy() {
let dict = FstDictionary::curated();

assert_eq!(
assert!(
dict.get_word_metadata_str("unhappy")
.unwrap()
.derived_from
.unwrap(),
WordId::from_word_str("happy")
.as_ref()
.unwrap()
.contains(&WordId::from_word_str("happy"))
);
}

#[test]
fn quickly_derived_from_quick() {
let dict = FstDictionary::curated();

assert_eq!(
assert!(
dict.get_word_metadata_str("quickly")
.unwrap()
.derived_from
.unwrap(),
WordId::from_word_str("quick")
.as_ref()
.unwrap()
.contains(&WordId::from_word_str("quick"))
);
}
}
2 changes: 1 addition & 1 deletion harper-core/src/spell/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ mod rune;
mod word_id;
mod word_map;

#[derive(PartialEq, Debug, Hash, Eq)]
#[derive(PartialEq, Debug, Eq)]
pub struct FuzzyMatchResult<'a> {
pub word: &'a [char],
pub edit_distance: u8,
Expand Down
12 changes: 9 additions & 3 deletions harper-core/src/spell/rune/attribute_list.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use hashbrown::HashMap;
use hashbrown::{HashMap, HashSet};
use serde::{Deserialize, Serialize};
use smallvec::ToSmallVec;

Expand Down Expand Up @@ -124,11 +124,17 @@ impl AttributeList {
);
let t_metadata = dest.get_metadata_mut_chars(&new_word).unwrap();
t_metadata.append(&metadata);
t_metadata.derived_from = Some(WordId::from_word_chars(&word.letters))
t_metadata
.derived_from
.get_or_insert_with(HashSet::new)
.insert(WordId::from_word_chars(&word.letters));
}
} else {
for (key, mut value) in new_words.into_iter() {
value.derived_from = Some(WordId::from_word_chars(&word.letters));
value
.derived_from
.get_or_insert_with(HashSet::new)
.insert(WordId::from_word_chars(&word.letters));

if let Some(val) = dest.get_metadata_mut_chars(&key) {
val.append(&value);
Expand Down
4 changes: 4 additions & 0 deletions harper-core/src/spell/word_id.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,8 @@ impl WordId {
let chars: CharString = text.as_ref().chars().collect();
Self::from_word_chars(chars)
}

pub fn from_hash(hash: u64) -> Self {
Self { hash }
}
}
49 changes: 47 additions & 2 deletions harper-core/src/token_kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ use harper_brill::UPOS;
use is_macro::Is;
use serde::{Deserialize, Serialize};

use crate::{Number, Punctuation, Quote, TokenKind::Word, WordMetadata};
use crate::TokenKind::Word;
use crate::{Number, Punctuation, Quote, WordMetadata};
use std::hash::{Hash, Hasher};

/// Generate wrapper code to pass a function call to the inner [`WordMetadata`],
/// if the token is indeed a word, while also emitting method-level documentation.
Expand All @@ -29,7 +31,7 @@ macro_rules! delegate_to_metadata {
/// Has a variety of queries available.
/// If there is a query missing, it may be easy to implement by just calling the
/// `delegate_to_metadata` macro.
#[derive(Debug, Is, Clone, Serialize, Deserialize, Default, PartialOrd, Hash, Eq, PartialEq)]
#[derive(Debug, Is, Clone, Serialize, Deserialize, Default, PartialOrd, Eq, PartialEq)]
#[serde(tag = "kind", content = "value")]
pub enum TokenKind {
/// `None` if the word does not exist in the dictionary.
Expand All @@ -52,6 +54,49 @@ pub enum TokenKind {
Regexish,
}

impl Hash for TokenKind {
fn hash<H: Hasher>(&self, state: &mut H) {
match self {
TokenKind::Word(metadata) => {
metadata.hash(state);
}
TokenKind::Punctuation(punct) => {
punct.hash(state);
}
TokenKind::Decade => {
0.hash(state);
}
TokenKind::Number(number) => {
number.hash(state);
}
TokenKind::Space(space) => {
space.hash(state);
}
TokenKind::Newline(newline) => {
newline.hash(state);
}
TokenKind::EmailAddress => {
0.hash(state);
}
TokenKind::Url => {
0.hash(state);
}
TokenKind::Hostname => {
0.hash(state);
}
TokenKind::Unlintable => {
0.hash(state);
}
TokenKind::ParagraphBreak => {
0.hash(state);
}
TokenKind::Regexish => {
0.hash(state);
}
Comment on lines +66 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hashes seem highly dubious to me. Still work in progress?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These hashes seem highly dubious to me. Still work in progress?

I believe @elijah-potter added that with a plan in mind but so far those are not really used by anything.

Also worth noting, the field the hashes are written into is not checked to see if something is already there and the new one stomps the old one. I wrote a patch that's in a PR to use a set instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: my comment was in reference to the fact most variants have the same hash, which defeats the purpose of hashing.

Also worth noting, the field the hashes are written into is not checked to see if something is already there and the new one stomps the old one. I wrote a patch that's in a PR to use a set instead.

Wdym by "the field"? This function is generic over the hasher, so we don't know anything about how hashing is done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm on the wrong track and there's two kinds/places with hashing, the hash gets stored in a field in the metadata, derived_from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so you were talking about WordMetadata::derived_from and not about this hashing function. I thought you were talking about the Hasher implementation used in this function.

And the hash in WordMetadata::derived_from is created by WordId, which completely unrelated this hashing function.

}
}
}

impl TokenKind {
// Word metadata delegation methods grouped by part of speech
delegate_to_metadata! {
Expand Down
Loading
Loading