Skip to content

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Aug 28, 2025

Closes #1336 by:

  • sanitizing user input for search and matching on literal chars
  • inserting implicit word boundaries around whitespace
  • chaining whitespace delimited search query into multiple queries
  • for queries longer than 5 chars, do not add word boundaries

Copy link

cypress bot commented Aug 28, 2025

datalab    Run #3830

Run Properties:  status check passed Passed #3830  •  git commit eee34c8e7e ℹ️: Merge fdb658893b8a48abf21011f1243e4d2732563925 into 203e6a972e892d7d80e516c6c917...
Project datalab
Branch Review ml-evs/fix-regex-search-sanitization
Run status status check passed Passed #3830
Run duration 08m 00s
Commit git commit eee34c8e7e ℹ️: Merge fdb658893b8a48abf21011f1243e4d2732563925 into 203e6a972e892d7d80e516c6c917...
Committer Matthew Evans
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 336
View all changes introduced in this branch ↗︎

@ml-evs ml-evs force-pushed the ml-evs/fix-regex-search-sanitization branch from 925f612 to 40c619a Compare August 29, 2025 00:02
@ml-evs ml-evs changed the title Improve regex sanitisation in item search Improve regex item search: implict word boundaries, chaining and literal matches Aug 29, 2025
@ml-evs ml-evs force-pushed the ml-evs/fix-regex-search-sanitization branch 2 times, most recently from 1036ffa to 98301aa Compare August 29, 2025 00:11
@ml-evs ml-evs marked this pull request as ready for review August 29, 2025 00:11
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.07%. Comparing base (203e6a9) to head (6681a9e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
+ Coverage   80.05%   80.07%   +0.01%     
==========================================
  Files          70       70              
  Lines        4729     4732       +3     
==========================================
+ Hits         3786     3789       +3     
  Misses        943      943              
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/routes/v0_1/items.py 86.40% <100.00%> (+0.12%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ml-evs ml-evs requested a review from be-smith August 29, 2025 09:18
ml-evs added 2 commits August 29, 2025 22:36
- sanitize user input with brackets
- match on implicit word boundaries
- chain whitespace delimited queries
@ml-evs ml-evs force-pushed the ml-evs/fix-regex-search-sanitization branch from d604630 to 6681a9e Compare August 29, 2025 21:36
@ml-evs ml-evs added the enhancement New feature or request label Aug 29, 2025
@ml-evs ml-evs force-pushed the ml-evs/fix-regex-search-sanitization branch from 93752d6 to c2c407f Compare August 31, 2025 17:37
@ml-evs ml-evs force-pushed the ml-evs/fix-regex-search-sanitization branch from c2c407f to fdb6588 Compare September 1, 2025 12:07
Copy link
Member

@jdbocarsly jdbocarsly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work very nicely! The one thing is that I don't understand exactly the rules, i.e. surrounding short vs. long. A slightly longer writeup of the behavior somewhere would be nice so we can reference for future debugging/ additional enhancements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search function in synthesis information block not working correctly
2 participants