feat: implement paginated search processing for large time ranges #79

digizeph · 2025-09-04T21:29:17Z

Summary

Implements efficient paginated search processing to handle large time ranges (days/weeks) without memory issues, with real-time progress tracking showing success/failure counts.

Key Features

Paginated Processing: Process 1000-item pages sequentially instead of loading all items into memory
Real-time Progress Display: Live updates showing Processed X files, found Y messages | Page N (succeeded: A, failed: B) @ timestamp
Per-file SQLite Commits: Enhanced data safety with commits after each file instead of 10k message batches
Memory Efficient: Constant memory usage regardless of query time range length
Chronological Progress: Shows timestamp progression through time ranges

Technical Improvements

Simplified Architecture: Reduced from 4 channels to 2 with structured progress updates
Eliminated Complex String Parsing: Clean ProgressUpdate enum with structured data
Enhanced Database Safety: Fixed SQLite PRAGMA issues and reduced potential data loss
Clean Progress Display: Suppressed verbose logging during progress mode
Performance Optimized: 75% reduction in channel overhead and eliminated string manipulation

Benefits

Scalability: Handles multi-day/week queries without memory constraints
User Experience: Clear real-time feedback with success/failure visibility
Data Safety: Per-file commits minimize potential data loss on crashes
Maintainability: Simplified codebase with better error handling
Backward Compatibility: Small queries work identically to before

Example Output

⠁ Processed 145 files, found 25847 messages | Page 3 (succeeded: 132, failed: 13) @ 2025-08-01 02:30 UTC
⠂ Processed 146 files, found 25891 messages | Page 3 (succeeded: 133, failed: 13) @ 2025-08-01 02:30 UTC
⠃ Processed 147 files, found 25936 messages | Page 3 (succeeded: 134, failed: 13) @ 2025-08-01 02:30 UTC

Resolves issue #76 with large time range queries that previously caused long delay and poor user feedback.

Replace memory-intensive bulk item loading with efficient pagination approach: **Core Changes:** - Add `build_broker()` method to SearchFilters for reusable broker configuration - Replace single `to_broker_items()` call with page-by-page processing (1000 items per page) - Process pages sequentially while maintaining parallel file processing within pages **Progress Display Improvements:** - Replace static progress bar with dynamic spinner showing real-time file processing - Display: "Processed X files, found Y messages | Page N (files, total) @ timestamp" - Show chronological progress through time ranges with first item timestamps **Database Safety:** - Implement per-file SQLite commits using WriterMessage enum with FileComplete signals - Fix SQLite PRAGMA statements to use proper `pragma_update()` method - Reduce potential data loss from 10k+ messages to single file scope **Benefits:** - Constant memory usage regardless of query time range length - Faster processing start (no wait for complete item enumeration) - Better user feedback with real-time progress and time progression - Enhanced data safety with per-file database commits - Scalable to multi-day/week queries without memory constraints **Testing:** - Add pagination logic tests with small page sizes - Verify broker configuration and filtering works correctly - Maintain backward compatibility for existing small queries

Replace complex progress display with simplified real-time tracking: **Simplified Architecture:** - Replace 4 channels (sender, pb_sender, page_sender, status_sender) with 2 channels - Add structured ProgressUpdate enum with FileComplete and PageStarted variants - Eliminate complex string parsing with clean data structures **Real-time Progress Display:** - Success/failure counts update immediately as each file completes processing - Format: "Processed X files, found Y messages | Page N (succeeded: A, failed: B) @ timestamp" - Remove complex timeout-based polling with simple blocking receiver loop **Code Quality Improvements:** - Eliminate 20+ lines of fragile string manipulation and reconstruction - Remove race conditions from multiple channel coordination - Reduce synchronization overhead by 75% (4 channels -> 2 channels) - Simplify progress thread from complex try_recv() polling to clean for-loop **User Experience:** - Clean progress display without log spam during progress mode - Real-time feedback showing processing success rates - Immediate visual feedback as individual files complete **Performance Benefits:** - Reduced CPU overhead from eliminated string parsing - Lower memory allocations from structured data vs string reconstruction - Improved thread coordination without busy-waiting patterns

digizeph added 2 commits September 4, 2025 14:12

digizeph merged commit f9eefa8 into main Sep 4, 2025
1 check passed

digizeph deleted the feat/handling-long-query-ranges branch September 4, 2025 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement paginated search processing for large time ranges #79

feat: implement paginated search processing for large time ranges #79

Uh oh!

digizeph commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: implement paginated search processing for large time ranges #79

feat: implement paginated search processing for large time ranges #79

Uh oh!

Conversation

digizeph commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Technical Improvements

Benefits

Example Output

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

digizeph commented Sep 4, 2025 •

edited

Loading