Skip to content

Conversation

@MrFlap
Copy link
Contributor

@MrFlap MrFlap commented Oct 21, 2025

Description

Adds warmup procedures for memory optimized search.

Related Issues

Resolves #2939

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@0ctopus13prime
Copy link
Collaborator

0ctopus13prime commented Oct 22, 2025

Hi @MrFlap
Thank you for PR, overall looks good, though I can see small refactoring + few nitpicks but we can revisit later after benchmark.

Before proceeding, could you verify how much this warm-up effective?
Ideally, if all bytes loaded into page cache, search latency right after this warm-up should be consistent. So gap between p50 and p99 should be small enough.

Could you do 2 experiments and share numbers?

  1. FP16, Cohere 10M -> For Faiss index only
  2. Quantized vectors, Cohere 10M -> Both Faiss + Lucene .vec file

At the beginning of each experiment, please run below to drop cache:

# Show current memory cache usage
free -h

# Drop page cache + dentries + inodes (full cache flush)
sudo sync && sudo echo 3 | sudo tee /proc/sys/vm/drop_caches

@0ctopus13prime 0ctopus13prime changed the title Lucene on faiss warmup MemoryOptimizedSearch warm up optimization Oct 22, 2025
import org.apache.lucene.index.SegmentInfo;
import org.apache.lucene.index.SegmentReader;
import org.apache.lucene.store.Directory;
import org.apache.lucene.index.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove * imports

.filter(fieldInfo -> fieldInfo.attributes().containsKey(KNNVectorFieldMapper.KNN_FIELD))
.filter(fieldInfo -> {
final MappedFieldType fieldType = mapperService.fieldType(fieldInfo.getName());
// Check which warmup strategy to use. Currently, this will be partial warmup for Non-FSDirectory and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need partial warmup for Non-FSDirectory? Directory implementations will take care of memory issues right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure, this is just an abstraction of this code

Comment on lines 119 to 120
return StreamSupport.stream(leafReader.getFieldInfos().spliterator(), false)
.filter(fieldInfo -> isMemoryOptimizedSearchField(fieldInfo, mapperService, indexName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use for-loops please. returning a stream is not preferable as it can throw already closed exception if terminal operation is performed on it.

There is no effeciency gains here with stream api usage so its better to be defensive

this.directory = directory;
}

private void pageFaultFile(String file) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this named pageFaultFile? loadFile or warmupFile should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warmupFile should be fine, I named it pageFaultFile so that it wouldn't be confused with loading the file into a file pointer.

Comment on lines 234 to 237
for (int i = 0; i < input.length(); i += 4096) {
input.seek(i);
input.readByte();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify the idea here is that we fetch one byte so it kernel will fetch the 4kb page right?

}

private void pageFaultFile(String file) throws IOException {
IndexInput input = directory.openInput(file, IOContext.DEFAULT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Lets use IOContext.READONCE to avoid any side-effects
  • Also can we wrap this up in try with resources to make sure file is unmapped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the IOContext should be fine. Can you clarify what the second bullet point means?

Copy link
Collaborator

@shatejas shatejas Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IndexInput input = directory.openInput(file, IOContext.DEFAULT);
try (IndexInput input = directory.openInput(file, IOContext.READONCE)) {
// logic here
}

This takes of resource closure in case of error cases. It makes sure file is unmapped in case an error is thrown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, makes sense 👍

@Override
public boolean warmUp(FieldInfo field) throws IOException {
final KNNEngine knnEngine = extractKNNEngine(field);
final List<String> engineFiles = KNNCodecUtil.getEngineFiles(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just fetch .vec files here to avoid multiple code flows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally going to do that, but there is a subtle difference with this. If we add the .vec file fetch in this method, we will load the .vec .faiss pair for each fieldInfo sequentially. This means that the .vec file might load over a .faiss file loaded in a previous call. If we load all of the .vec files first, then we know that they won't override any .faiss files.

/**
* Fully warm up the index by loading every byte from disk, causing page faults
*/
private class FullFieldWarmUpStrategy implements FieldWarmUpStrategy {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not go with inner classes. Can we create a package warmup and then have separate classes in there. If we are moving to strategy pattern then native cache should also be moved ideally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, are we just trying to avoid inner classes in general?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally private inner classes are hard to test. breaking it into its own class with making it package-private will generally achieve the same goal and will help cover more cases

}

@SneakyThrows
public void testWarmUpMemoryOptimizedSearcher_multipleSegments() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is okay but its not tight enough, anyway we can mock and verify the warmup method is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to verify that FieldWarmUpStrategy::warmup is called? We can probably mock to use a custom FieldWarmUpStrategy that returns the files that it touches somehow.

Copy link
Collaborator

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes makes sense , things which is making really hard to understand is putting eveerything in one class , let's re-factor liitle bit,

  1. Put a Warm Up Class with partial and full load strategy.

These are the test cases I can think of we can put

  • Test Warm-Up with FSDirectory: Verify full warm-up strategy loads .faiss files for FSDirectory, no off-heap cache used, and logs confirm field warm-up.
  • Test Warm-Up with Non-FSDirectory: Verify partial warm-up strategy triggers null vector search for non-FSDirectory, no off-heap cache, and logs confirm field warm-up.
  • Test Warm-Up with Multiple Segments: Verify warm-up handles multiple segments, all eligible fields warmed, no off-heap cache, and logs confirm fields per segment.
  • Test Full-Precision Vector Loading: Verify .vec files loaded before .faiss, search works post-warm-up, and no off-heap cache used.
  • Test Warm-Up with Multiple Fields: Verify all k-NN fields in a segment warmed, logs confirm all fields, and returned set includes all field names.
  • Test Warm-Up with No Eligible Fields: Verify empty field set handled, logs “no fields found,” empty set returned, no exceptions.
  • Test Warm-Up with Empty Segment: Verify empty segment (no documents) handled, empty set returned, no exceptions.
  • Test Warm-Up with Missing Engine Files: Verify missing .faiss files logged as warnings, field skipped, no exceptions.
  • Test Warm-Up with Invalid Vector Data Type: Verify invalid VECTOR_DATA_TYPE_FIELD skipped, logs error/warning, no exceptions.
  • Test Warm-Up with Closed LeafReader: Verify AlreadyClosedException caught, logged, empty set returned, no crash.
  • Test IOException in Full Warm-Up: Simulate IOException in warmUpFile, verify error logged, field skipped, warm-up continues.
  • Test IOException in Vector Loading: Simulate IOException in loadFullPrecisionVectors, verify error logged, warm-up continues.
  • Test Null MapperService: Verify null MapperService handled, fields skipped, empty set returned, no exceptions.
  • Test Warm-Up with Large Segment: Verify warm-up scales with 10,000 documents, completes in reasonable time, no memory errors.
  • Test Warm-Up with Many Fields: Verify warm-up scales with 50 k-NN fields, all warmed, linear performance scaling.
  • Test Warm-Up Strategy Invocation: Verify correct strategy (FullWarmUpStrategy or PartialWarmUpStrategy) called per field using mock.
  • Test Strategy Selection Based on Directory: Verify FullWarmUpStrategy for FSDirectory, PartialWarmUpStrategy for non-FSDirectory using mocks.
  • Test Warm-Up During Shard Recovery: Verify warm-up works during shard recovery, fields loaded, search functional post-recovery.
  • Test Warm-Up with Concurrent Queries: Verify warm-up doesn’t block k-NN searches, search results correct during/after warm-up.
  • Test Directory Resource Cleanup: Verify IndexInput closed in warmUpFile, no resource leaks using mock.
  • Test Searcher Resource Cleanup: Verify Engine.Searcher closed in warmup(), no resource leaks using mock

import java.util.Set;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:- Remove *

import static org.opensearch.knn.common.KNNConstants.SPACE_TYPE;
import static org.opensearch.knn.common.KNNConstants.VECTOR_DATA_TYPE_FIELD;
import static org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine;
import static org.opensearch.knn.common.KNNConstants.*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@MrFlap
Copy link
Contributor Author

MrFlap commented Oct 23, 2025

Whoops, for some reason the import changes didn't go through in KNNIndexShard.java

@MrFlap
Copy link
Contributor Author

MrFlap commented Oct 23, 2025

I'm going to squash the commits once all tests pass and you all think it looks good @0ctopus13prime @shatejas @Vikasht34, but all feedback has been incorporated.

@MrFlap MrFlap force-pushed the lucene-on-faiss-warmup branch from 7ffaf8c to 3fda638 Compare October 23, 2025 21:55

import java.io.IOException;

public abstract class FieldWarmUpStrategy {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT : Java doc please!

import org.apache.lucene.store.FilterDirectory;
import org.opensearch.common.lucene.Lucene;

public class FieldWarmUpStrategyFactory {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT : Java doc please!

private Directory directory;
private LeafReader leafReader;

public FieldWarmUpStrategyFactory setDirectory(Directory directory) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT : @Setter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason it doesn't return the FieldWarmUpStrategyFactory when I use Setter. Is this intended behavior?

try (IndexInput input = directory.openInput(file, IOContext.READONCE)) {
for (int i = 0; i < input.length(); i += 4096) {
input.seek(i);
input.readByte();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, do you recall all pages loaded regardless readByte or readBytes(byte[]) are called?

ArrayList<String> warmedUp = new ArrayList<>();

for (FieldInfo field : memOptSearchFields) {
boolean warm;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try {
    if (fieldWarmUpStrategy.warmUp(field)) {
        warmedUp.add(field.getName());
    }
} catch (IOException e) {
    log.error("Failed to warm up field: {}: {}", field.getName(), e.toString());
}

vectorValues.vectorValue(iter.docID());
}
} catch (IOException e) {
log.error("Failed to load vec file for field: {}: {}", field.getName(), e.toString());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it's good to pass e to logger

log.error("Failed to load vec file for field: {}", field.getName(), e);

@0ctopus13prime
Copy link
Collaborator

@MrFlap
Also could you post the results for FP16 and 32x quantization case respectively?

Copy link
Collaborator

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for incorporating all the tests !! , Will wait for all tests to pass.

@MrFlap MrFlap force-pushed the lucene-on-faiss-warmup branch from 5837ea3 to c1fec8f Compare October 24, 2025 20:58
@MrFlap
Copy link
Contributor Author

MrFlap commented Oct 24, 2025

@0ctopus13prime 32x
etric,Task,Value,Unit
Cumulative indexing time of primary shards,,0,min
Min cumulative indexing time across primary shards,,0,min
Median cumulative indexing time across primary shards,,0,min
Max cumulative indexing time across primary shards,,0,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min
Cumulative merge time of primary shards,,0,min
Cumulative merge count of primary shards,,0,
Min cumulative merge time across primary shards,,0,min
Median cumulative merge time across primary shards,,0,min
Max cumulative merge time across primary shards,,0,min
Cumulative merge throttle time of primary shards,,0,min
Min cumulative merge throttle time across primary shards,,0,min
Median cumulative merge throttle time across primary shards,,0,min
Max cumulative merge throttle time across primary shards,,0,min
Cumulative refresh time of primary shards,,0,min
Cumulative refresh count of primary shards,,14,
Min cumulative refresh time across primary shards,,0,min
Median cumulative refresh time across primary shards,,0,min
Max cumulative refresh time across primary shards,,0,min
Cumulative flush time of primary shards,,0,min
Cumulative flush count of primary shards,,7,
Min cumulative flush time across primary shards,,0,min
Median cumulative flush time across primary shards,,0,min
Max cumulative flush time across primary shards,,0,min
Total Young Gen GC time,,0.027,s
Total Young Gen GC count,,2,
Total Old Gen GC time,,0,s
Total Old Gen GC count,,0,
Store size,,31.08053415827453,GB
Translog size,,3.5855919122695923e-07,GB
Heap used for segments,,0,MB
Heap used for doc values,,0,MB
Heap used for terms,,0,MB
Heap used for norms,,0,MB
Heap used for points,,0,MB
Heap used for stored fields,,0,MB
Segment count,,37,
Min Throughput,prod-queries,287.95,ops/s
Mean Throughput,prod-queries,464.55,ops/s
Median Throughput,prod-queries,482.34,ops/s
Max Throughput,prod-queries,493.91,ops/s
50th percentile latency,prod-queries,18.259749995195307,ms
90th percentile latency,prod-queries,19.341466900368687,ms
99th percentile latency,prod-queries,20.406325711373942,ms
99.9th percentile latency,prod-queries,47.77802453149796,ms
99.99th percentile latency,prod-queries,223.0032652234396,ms
100th percentile latency,prod-queries,223.10549300163984,ms
50th percentile service time,prod-queries,18.259749995195307,ms
90th percentile service time,prod-queries,19.341466900368687,ms
99th percentile service time,prod-queries,20.406325711373942,ms
99.9th percentile service time,prod-queries,47.77802453149796,ms
99.99th percentile service time,prod-queries,223.0032652234396,ms
100th percentile service time,prod-queries,223.10549300163984,ms
error rate,prod-queries,0.00,%
Mean recall@k,prod-queries,0.75,
Mean recall@1,prod-queries,0.91,

@MrFlap
Copy link
Contributor Author

MrFlap commented Oct 24, 2025

@0ctopus13prime fp16
Metric,Task,Value,Unit
Cumulative indexing time of primary shards,,0.9724,min
Min cumulative indexing time across primary shards,,0,min
Median cumulative indexing time across primary shards,,0.15175,min
Max cumulative indexing time across primary shards,,0.19615,min
Cumulative indexing throttle time of primary shards,,0,min
Min cumulative indexing throttle time across primary shards,,0,min
Median cumulative indexing throttle time across primary shards,,0,min
Max cumulative indexing throttle time across primary shards,,0,min
Cumulative merge time of primary shards,,0.19891666666666669,min
Cumulative merge count of primary shards,,2,
Min cumulative merge time across primary shards,,0,min
Median cumulative merge time across primary shards,,0,min
Max cumulative merge time across primary shards,,0.18803333333333333,min
Cumulative merge throttle time of primary shards,,0.0495,min
Min cumulative merge throttle time across primary shards,,0,min
Median cumulative merge throttle time across primary shards,,0,min
Max cumulative merge throttle time across primary shards,,0.0495,min
Cumulative refresh time of primary shards,,0.48813333333333336,min
Cumulative refresh count of primary shards,,34,
Min cumulative refresh time across primary shards,,0,min
Median cumulative refresh time across primary shards,,0.09416666666666668,min
Max cumulative refresh time across primary shards,,0.1016,min
Cumulative flush time of primary shards,,0,min
Cumulative flush count of primary shards,,1,
Min cumulative flush time across primary shards,,0,min
Median cumulative flush time across primary shards,,0,min
Max cumulative flush time across primary shards,,0,min
Total Young Gen GC time,,0.016,s
Total Young Gen GC count,,1,
Total Old Gen GC time,,0,s
Total Old Gen GC count,,0,
Store size,,0.591529805213213,GB
Translog size,,1.0812873849645257,GB
Heap used for segments,,0,MB
Heap used for doc values,,0,MB
Heap used for terms,,0,MB
Heap used for norms,,0,MB
Heap used for points,,0,MB
Heap used for stored fields,,0,MB
Segment count,,46,
Min Throughput,warmup-indices,7.93,ops/s
Mean Throughput,warmup-indices,7.93,ops/s
Median Throughput,warmup-indices,7.93,ops/s
Max Throughput,warmup-indices,7.93,ops/s
100th percentile latency,warmup-indices,125.19258499378338,ms
100th percentile service time,warmup-indices,125.19258499378338,ms
error rate,warmup-indices,0.00,%
Min Throughput,prod-queries,154.15,ops/s
Mean Throughput,prod-queries,389.45,ops/s
Median Throughput,prod-queries,414.39,ops/s
Max Throughput,prod-queries,474.27,ops/s
50th percentile latency,prod-queries,17.929102999914903,ms
90th percentile latency,prod-queries,22.143237998534463,ms
99th percentile latency,prod-queries,36.78591821764713,ms
99.9th percentile latency,prod-queries,165.76362053233635,ms
99.99th percentile latency,prod-queries,240.6351248842792,ms
100th percentile latency,prod-queries,241.05393800709862,ms
50th percentile service time,prod-queries,17.929102999914903,ms
90th percentile service time,prod-queries,22.143237998534463,ms
99th percentile service time,prod-queries,36.78591821764713,ms
99.9th percentile service time,prod-queries,165.76362053233635,ms
99.99th percentile service time,prod-queries,240.6351248842792,ms
100th percentile service time,prod-queries,241.05393800709862,ms
error rate,prod-queries,0.00,%
Mean recall@k,prod-queries,0.02,
Mean recall@1,prod-queries,0.02,

@0ctopus13prime
Copy link
Collaborator

@MrFlap
Oh, recall is 2%? That does not seem like right.. sorry I thought it's 92%.
Can we rerun with search client 1 in OSB? Sometime, it gives bad recall when there are multiple clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants