Fix/qdrant collection selection #66

lpi-tn · 2025-09-30T15:10:00Z

This pull request improves the logic for classifying document slices per Qdrant collection, updates related tests, and makes minor code cleanups. The main focus is on more accurately mapping document slices to their respective collections, especially for multilingual and monolingual scenarios.

Improvements to collection classification logic:

Refactored the classify_documents_per_collection function in qdrant_handler.py to use a more robust approach for determining the correct collection for each document slice. The function now checks for both multilingual and monolingual collection naming patterns, and logs an error if a matching collection is not found.
Updated the return type initialization in classify_documents_per_collection from a defaultdict to a standard dictionary for clarity and consistency.

Testing enhancements:

Added a new test, test_should_handle_multiple_slices_for_same_collection_with_multi_lingual_collection_and_gibberish, to cover scenarios where document slices belong to multilingual collections or collections with unexpected names, ensuring the new classification logic works as intended.

Logging and code cleanup:

Added an info-level log statement in qdrant_syncronizer.py to indicate which collection is being processed during synchronization.
Simplified import statements and removed unused variables in qdrant_handler.py for better code hygiene. [1] [2]

Copilot

Pull Request Overview

This pull request improves the logic for classifying document slices per Qdrant collection by implementing a more robust approach for mapping document slices to their respective collections. The changes focus on better handling of multilingual and monolingual collection naming patterns.

Refactored classify_documents_per_collection function to use language and model-based collection matching
Added comprehensive test coverage for edge cases including multilingual collections with unexpected naming patterns
Improved code maintainability by simplifying imports and removing unused variables

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
qdrant_syncronizer.py	Added logging statement to track collection processing during synchronization
qdrant_handler.py	Completely refactored collection classification logic and cleaned up imports/unused variables
test_qdrant_handler.py	Added new test case for multilingual collections with non-standard naming patterns

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

welearn_datastack/modules/qdrant_handler.py

Co-authored-by: Copilot <[email protected]>

welearn_datastack/modules/qdrant_handler.py

#66 (comment)

… into Fix/qdrant-collection-selection # Conflicts: # welearn_datastack/modules/qdrant_handler.py

welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py

welearn_datastack/modules/qdrant_handler.py

…ronizer.py Co-authored-by: Sandra Guerreiro <[email protected]>

… into Fix/qdrant-collection-selection

lpi-tn added 3 commits September 30, 2025 16:46

More clean logical selection

ae11136

add log

64042e1

fix and test

71a66c2

lpi-tn requested review from Copilot, jmsevin and sandragjacinto September 30, 2025 15:10

Copilot AI reviewed Sep 30, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

Update welearn_datastack/modules/qdrant_handler.py

de68cff

Co-authored-by: Copilot <[email protected]>

jmsevin reviewed Sep 30, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

lpi-tn added 2 commits October 1, 2025 10:39

I regroup the two loops in one

0adb222

#66 (comment)

Merge remote-tracking branch 'origin/Fix/qdrant-collection-selection'…

d54c23b

… into Fix/qdrant-collection-selection # Conflicts: # welearn_datastack/modules/qdrant_handler.py

sandragjacinto reviewed Oct 1, 2025

View reviewed changes

welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py Outdated Show resolved Hide resolved

sandragjacinto reviewed Oct 1, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

sandragjacinto approved these changes Oct 1, 2025

View reviewed changes

lpi-tn and others added 3 commits October 1, 2025 11:07

Update welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_sync…

99e0612

…ronizer.py Co-authored-by: Sandra Guerreiro <[email protected]>

removed from the loop

82792ef

Merge remote-tracking branch 'origin/Fix/qdrant-collection-selection'…

8389968

… into Fix/qdrant-collection-selection

lpi-tn merged commit cb713ea into main Oct 1, 2025
3 checks passed

lpi-tn deleted the Fix/qdrant-collection-selection branch October 1, 2025 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/qdrant collection selection #66

Fix/qdrant collection selection #66

Uh oh!

lpi-tn commented Sep 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix/qdrant collection selection #66

Fix/qdrant collection selection #66

Uh oh!

Conversation

lpi-tn commented Sep 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!