-
Notifications
You must be signed in to change notification settings - Fork 169
MemoryOptimizedSearch warm up optimization #2954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @MrFlap Before proceeding, could you verify how much this warm-up effective? Could you do 2 experiments and share numbers?
At the beginning of each experiment, please run below to drop cache: |
| import org.apache.lucene.index.SegmentInfo; | ||
| import org.apache.lucene.index.SegmentReader; | ||
| import org.apache.lucene.store.Directory; | ||
| import org.apache.lucene.index.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove * imports
| .filter(fieldInfo -> fieldInfo.attributes().containsKey(KNNVectorFieldMapper.KNN_FIELD)) | ||
| .filter(fieldInfo -> { | ||
| final MappedFieldType fieldType = mapperService.fieldType(fieldInfo.getName()); | ||
| // Check which warmup strategy to use. Currently, this will be partial warmup for Non-FSDirectory and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need partial warmup for Non-FSDirectory? Directory implementations will take care of memory issues right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure, this is just an abstraction of this code
| return StreamSupport.stream(leafReader.getFieldInfos().spliterator(), false) | ||
| .filter(fieldInfo -> isMemoryOptimizedSearchField(fieldInfo, mapperService, indexName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use for-loops please. returning a stream is not preferable as it can throw already closed exception if terminal operation is performed on it.
There is no effeciency gains here with stream api usage so its better to be defensive
| this.directory = directory; | ||
| } | ||
|
|
||
| private void pageFaultFile(String file) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why is this named pageFaultFile? loadFile or warmupFile should work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warmupFile should be fine, I named it pageFaultFile so that it wouldn't be confused with loading the file into a file pointer.
| for (int i = 0; i < input.length(); i += 4096) { | ||
| input.seek(i); | ||
| input.readByte(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify the idea here is that we fetch one byte so it kernel will fetch the 4kb page right?
| } | ||
|
|
||
| private void pageFaultFile(String file) throws IOException { | ||
| IndexInput input = directory.openInput(file, IOContext.DEFAULT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Lets use
IOContext.READONCEto avoid any side-effects - Also can we wrap this up in try with resources to make sure file is unmapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the IOContext should be fine. Can you clarify what the second bullet point means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| IndexInput input = directory.openInput(file, IOContext.DEFAULT); | |
| try (IndexInput input = directory.openInput(file, IOContext.READONCE)) { | |
| // logic here | |
| } |
This takes of resource closure in case of error cases. It makes sure file is unmapped in case an error is thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, makes sense 👍
| @Override | ||
| public boolean warmUp(FieldInfo field) throws IOException { | ||
| final KNNEngine knnEngine = extractKNNEngine(field); | ||
| final List<String> engineFiles = KNNCodecUtil.getEngineFiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just fetch .vec files here to avoid multiple code flows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally going to do that, but there is a subtle difference with this. If we add the .vec file fetch in this method, we will load the .vec .faiss pair for each fieldInfo sequentially. This means that the .vec file might load over a .faiss file loaded in a previous call. If we load all of the .vec files first, then we know that they won't override any .faiss files.
| /** | ||
| * Fully warm up the index by loading every byte from disk, causing page faults | ||
| */ | ||
| private class FullFieldWarmUpStrategy implements FieldWarmUpStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not go with inner classes. Can we create a package warmup and then have separate classes in there. If we are moving to strategy pattern then native cache should also be moved ideally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, are we just trying to avoid inner classes in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally private inner classes are hard to test. breaking it into its own class with making it package-private will generally achieve the same goal and will help cover more cases
| } | ||
|
|
||
| @SneakyThrows | ||
| public void testWarmUpMemoryOptimizedSearcher_multipleSegments() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is okay but its not tight enough, anyway we can mock and verify the warmup method is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean to verify that FieldWarmUpStrategy::warmup is called? We can probably mock to use a custom FieldWarmUpStrategy that returns the files that it touches somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes makes sense , things which is making really hard to understand is putting eveerything in one class , let's re-factor liitle bit,
- Put a Warm Up Class with partial and full load strategy.
These are the test cases I can think of we can put
- Test Warm-Up with FSDirectory: Verify full warm-up strategy loads .faiss files for FSDirectory, no off-heap cache used, and logs confirm field warm-up.
- Test Warm-Up with Non-FSDirectory: Verify partial warm-up strategy triggers null vector search for non-FSDirectory, no off-heap cache, and logs confirm field warm-up.
- Test Warm-Up with Multiple Segments: Verify warm-up handles multiple segments, all eligible fields warmed, no off-heap cache, and logs confirm fields per segment.
- Test Full-Precision Vector Loading: Verify .vec files loaded before .faiss, search works post-warm-up, and no off-heap cache used.
- Test Warm-Up with Multiple Fields: Verify all k-NN fields in a segment warmed, logs confirm all fields, and returned set includes all field names.
- Test Warm-Up with No Eligible Fields: Verify empty field set handled, logs “no fields found,” empty set returned, no exceptions.
- Test Warm-Up with Empty Segment: Verify empty segment (no documents) handled, empty set returned, no exceptions.
- Test Warm-Up with Missing Engine Files: Verify missing .faiss files logged as warnings, field skipped, no exceptions.
- Test Warm-Up with Invalid Vector Data Type: Verify invalid VECTOR_DATA_TYPE_FIELD skipped, logs error/warning, no exceptions.
- Test Warm-Up with Closed LeafReader: Verify AlreadyClosedException caught, logged, empty set returned, no crash.
- Test IOException in Full Warm-Up: Simulate IOException in warmUpFile, verify error logged, field skipped, warm-up continues.
- Test IOException in Vector Loading: Simulate IOException in loadFullPrecisionVectors, verify error logged, warm-up continues.
- Test Null MapperService: Verify null MapperService handled, fields skipped, empty set returned, no exceptions.
- Test Warm-Up with Large Segment: Verify warm-up scales with 10,000 documents, completes in reasonable time, no memory errors.
- Test Warm-Up with Many Fields: Verify warm-up scales with 50 k-NN fields, all warmed, linear performance scaling.
- Test Warm-Up Strategy Invocation: Verify correct strategy (FullWarmUpStrategy or PartialWarmUpStrategy) called per field using mock.
- Test Strategy Selection Based on Directory: Verify FullWarmUpStrategy for FSDirectory, PartialWarmUpStrategy for non-FSDirectory using mocks.
- Test Warm-Up During Shard Recovery: Verify warm-up works during shard recovery, fields loaded, search functional post-recovery.
- Test Warm-Up with Concurrent Queries: Verify warm-up doesn’t block k-NN searches, search results correct during/after warm-up.
- Test Directory Resource Cleanup: Verify IndexInput closed in warmUpFile, no resource leaks using mock.
- Test Searcher Resource Cleanup: Verify Engine.Searcher closed in warmup(), no resource leaks using mock
| import java.util.Set; | ||
| import java.nio.file.Path; | ||
| import java.nio.file.Paths; | ||
| import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:- Remove *
| import static org.opensearch.knn.common.KNNConstants.SPACE_TYPE; | ||
| import static org.opensearch.knn.common.KNNConstants.VECTOR_DATA_TYPE_FIELD; | ||
| import static org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine; | ||
| import static org.opensearch.knn.common.KNNConstants.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
|
Whoops, for some reason the import changes didn't go through in KNNIndexShard.java |
|
I'm going to squash the commits once all tests pass and you all think it looks good @0ctopus13prime @shatejas @Vikasht34, but all feedback has been incorporated. |
7ffaf8c to
3fda638
Compare
|
|
||
| import java.io.IOException; | ||
|
|
||
| public abstract class FieldWarmUpStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT : Java doc please!
| import org.apache.lucene.store.FilterDirectory; | ||
| import org.opensearch.common.lucene.Lucene; | ||
|
|
||
| public class FieldWarmUpStrategyFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT : Java doc please!
| private Directory directory; | ||
| private LeafReader leafReader; | ||
|
|
||
| public FieldWarmUpStrategyFactory setDirectory(Directory directory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT : @Setter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it doesn't return the FieldWarmUpStrategyFactory when I use Setter. Is this intended behavior?
| try (IndexInput input = directory.openInput(file, IOContext.READONCE)) { | ||
| for (int i = 0; i < input.length(); i += 4096) { | ||
| input.seek(i); | ||
| input.readByte(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, do you recall all pages loaded regardless readByte or readBytes(byte[]) are called?
| ArrayList<String> warmedUp = new ArrayList<>(); | ||
|
|
||
| for (FieldInfo field : memOptSearchFields) { | ||
| boolean warm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try {
if (fieldWarmUpStrategy.warmUp(field)) {
warmedUp.add(field.getName());
}
} catch (IOException e) {
log.error("Failed to warm up field: {}: {}", field.getName(), e.toString());
}
| vectorValues.vectorValue(iter.docID()); | ||
| } | ||
| } catch (IOException e) { | ||
| log.error("Failed to load vec file for field: {}: {}", field.getName(), e.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's good to pass e to logger
log.error("Failed to load vec file for field: {}", field.getName(), e);
|
@MrFlap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for incorporating all the tests !! , Will wait for all tests to pass.
…ch-project#2954) Signed-off-by: Andrew Klepchick <[email protected]>
5837ea3 to
c1fec8f
Compare
|
@0ctopus13prime 32x |
|
@0ctopus13prime fp16 |
|
@MrFlap |
Description
Adds warmup procedures for memory optimized search.
Related Issues
Resolves #2939
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.