Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.

Conversation

@acquamarin
Copy link
Contributor

This PR improves fts wildcard pattern matching by introducing a new optional parameter advanced_pattern_match in create_fts_index function:
In the current design of the wildcard pattern matching in kuzu, we only do pattern matching on the stemmed terms.
We introduced a new optional parameter advanced_pattern_match:
If advanced_pattern_match, we create an additional table to store the original terms in the docs. When we do pattern matching, we can match the original terms in the docs.

@codecov
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.32%. Comparing base (3ebd904) to head (a75db71).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5998      +/-   ##
==========================================
+ Coverage   85.77%   86.32%   +0.54%     
==========================================
  Files        1642     1456     -186     
  Lines       75584    65790    -9794     
  Branches     9003     8028     -975     
==========================================
- Hits        64833    56793    -8040     
+ Misses      10493     8738    -1755     
- Partials      258      259       +1     
Flag Coverage Δ
extension ?
in-mem 81.02% <ø> (+0.03%) ⬆️
on-disk 86.39% <ø> (+0.06%) ⬆️
recovery 86.40% <ø> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how we handled updating the index when advanced_pattern_match is turned on. Maybe let's discuss on that.

@acquamarin acquamarin force-pushed the fts-wild-card-pattern branch from a75db71 to 1ef39d5 Compare September 11, 2025 04:50
@github-actions
Copy link

Benchmark Result

Master commit hash: 2f527957b029428dbd17a16884256c7ff0f79f8e
Branch commit hash: 0c6c2a67fd95e5142bae9c5d28ff55e01d31ead6

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
join q31 7.03 4.74 2.28 (48.16%)
ldbc_snb_is q32 2.27 3.94 -1.67 (-42.38%)
multi-rel multi-rel-lookup 8.28 11.31 -3.02 (-26.73%)
recursive_join recursive-join-sparse 9.48 5.28 4.20 (79.52%)
var_size_seq_scan q22 146.24 110.39 35.85 (32.47%)
Other queries
Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 702.29 703.55 -1.27 (-0.18%)
aggregation q28 7682.37 7709.11 -26.75 (-0.35%)
filter q14 61.82 60.80 1.02 (1.68%)
filter q15 60.00 59.09 0.91 (1.54%)
filter q16 276.08 277.39 -1.31 (-0.47%)
filter q17 382.82 383.08 -0.26 (-0.07%)
filter q18 1854.51 1869.48 -14.97 (-0.80%)
filter zonemap-node 24.54 25.42 -0.88 (-3.47%)
filter zonemap-node-lhs-cast 24.68 25.88 -1.20 (-4.63%)
filter zonemap-node-null 24.34 26.28 -1.94 (-7.37%)
filter zonemap-rel 5715.51 5641.73 73.77 (1.31%)
fixed_size_expr_evaluator q07 619.91 619.57 0.34 (0.06%)
fixed_size_expr_evaluator q08 904.46 904.85 -0.38 (-0.04%)
fixed_size_expr_evaluator q09 904.28 905.63 -1.35 (-0.15%)
fixed_size_expr_evaluator q10 191.48 191.64 -0.16 (-0.09%)
fixed_size_expr_evaluator q11 190.42 191.51 -1.09 (-0.57%)
fixed_size_expr_evaluator q12 168.38 168.35 0.03 (0.02%)
fixed_size_expr_evaluator q13 1543.14 1496.15 47.00 (3.14%)
fixed_size_seq_scan q23 49.42 50.29 -0.87 (-1.73%)
join q29 811.34 753.07 58.27 (7.74%)
join q30 1629.61 1763.50 -133.88 (-7.59%)
join SelectiveTwoHopJoin 52.89 57.00 -4.11 (-7.21%)
ldbc_snb_ic q35 9.43 9.55 -0.12 (-1.26%)
ldbc_snb_ic q36 95.69 108.46 -12.78 (-11.78%)
ldbc_snb_is q33 16.40 16.08 0.32 (2.01%)
ldbc_snb_is q34 1.22 1.22 0.00 (0.00%)
limit push-down-limit-into-distinct 1951.91 1952.02 -0.11 (-0.01%)
multi-rel multi-rel-large-scan 1480.90 1459.16 21.74 (1.49%)
multi-rel multi-rel-small-scan 199.07 204.47 -5.40 (-2.64%)
order_by q25 64.58 66.38 -1.81 (-2.72%)
order_by q26 392.62 383.82 8.80 (2.29%)
order_by q27 1314.14 1299.33 14.81 (1.14%)
recursive_join recursive-join-bidirection 319.59 339.37 -19.78 (-5.83%)
recursive_join recursive-join-dense 7028.81 6979.54 49.28 (0.71%)
recursive_join recursive-join-path 23492.64 23361.90 130.74 (0.56%)
recursive_join recursive-join-trail 7032.35 6913.28 119.07 (1.72%)
scan_after_filter q01 107.34 107.91 -0.57 (-0.53%)
scan_after_filter q02 94.76 92.86 1.90 (2.04%)
shortest_path_ldbc100 q37 152.23 151.80 0.44 (0.29%)
shortest_path_ldbc100 q38 367.07 317.57 49.51 (15.59%)
shortest_path_ldbc100 q39 88.77 85.16 3.61 (4.23%)
shortest_path_ldbc100 q40 413.93 509.64 -95.71 (-18.78%)
var_size_expr_evaluator q03 2080.70 2109.49 -28.79 (-1.36%)
var_size_expr_evaluator q04 2115.45 2153.58 -38.14 (-1.77%)
var_size_expr_evaluator q05 2629.84 2658.84 -28.99 (-1.09%)
var_size_expr_evaluator q06 1257.51 1261.50 -3.99 (-0.32%)
var_size_seq_scan q19 1615.62 1352.67 262.95 (19.44%)
var_size_seq_scan q20 3029.76 2672.46 357.30 (13.37%)
var_size_seq_scan q21 2460.35 2171.10 289.25 (13.32%)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants