Skip to content

Conversation

atesgoral
Copy link

@atesgoral atesgoral commented Oct 30, 2024

Release GVL for parallel search operations

This PR improves multi-threading performance by releasing Ruby's Global VM Lock (GVL) during Faiss search operations, allowing multiple threads to perform searches in parallel.

Changes

  1. Release GVL during search - Wrap search operations in rb_thread_call_without_gvl to allow parallel execution
  2. Ensure thread-safety - Only release GVL for frozen (immutable) indexes to prevent concurrent modifications

Usage

index = Faiss::IndexFlatL2.new(dimensions)
index.add(vectors)
index.freeze  # Makes index immutable and enables parallel searches

index.search(query_vectors, k) # GVL gets released while search is being performed

Notes

  • Fully backward compatible - non-frozen indexes work as before

@ankane
Copy link
Owner

ankane commented Nov 1, 2024

Hi @atesgoral, thanks for the PR! However, this doesn't seem safe to do without more locking (as add can now be called while search is running). I'm not sure I'd like to maintain the additional complexity, but happy to take another look if you decide to implement it.

https://github.com/facebookresearch/faiss/wiki/Threads-and-asynchronous-calls

@tenderlove
Copy link

@ankane are you sure that's true? According to the link you sent:

Python interface releases the Global Interpreter Lock for all calls, so using python multithreading will effectively use several cores.

I'm not familiar with the Python extension, but it seems like we should be in the same boat? Am I missing something?

@ankane
Copy link
Owner

ankane commented Nov 1, 2024

Will check out the Python code, but this causes the test to consume all available CPU and hang (which doesn't happen when the GVL is enabled).

--- a/test/index_test.rb
+++ b/test/index_test.rb
@@ -279,17 +279,18 @@ class IndexTest < Minitest::Test
       [1, 1, 2, 1],
       [5, 4, 6, 5],
       [1, 2, 1, 2]
-    ]
+    ] * 100
     index = Faiss::IndexFlatL2.new(4)
     index.add(objects)
 
     concurrency = 0
     max_concurrency = 0
 
-    threads = 2.times.map {
+    threads = 100.times.map {
       Thread.new {
         concurrency += 1
         max_concurrency = [max_concurrency, concurrency].max
+        index.add(objects)
         index.search(objects, 3)
         concurrency -= 1
       }

@atesgoral
Copy link
Author

atesgoral commented Nov 2, 2024

@ankane Thanks for having a look.

I didn't need more data or 100 iters to get it to lock. I haven't looked at faiss in detail yet but I see the word "lock" all over the repo. This is enough to cause a deadlock(?):

diff --git a/test/index_test.rb b/test/index_test.rb
index 64e093b..095b8ea 100644
--- a/test/index_test.rb
+++ b/test/index_test.rb
@@ -290,7 +290,13 @@ class IndexTest < Minitest::Test
       Thread.new {
         concurrency += 1
         max_concurrency = [max_concurrency, concurrency].max
+        puts "adding"
+        index.add(objects)
+        puts "added"
+        sleep(10)
+        puts "searching"
         index.search(objects, 3)
+        puts "searched"
         concurrency -= 1
       }
     }

I say "deadlock(?)" because it doesn't act like an forever deadlock, but something gives eventually and the tests ends with a success. 🤔

@obie
Copy link

obie commented Nov 5, 2024

keeping 👀 on this

@atesgoral
Copy link
Author

@ankane I can rejig this to add a read-only mode to an index so it can be used with GVL release in read-only use cases.

Option 1

index.read_only! to lock the instance to a read-only mode where the read operations start releasing GVL while write operations throw.

Option 2

Faiss::Index.load_read_only to do the same as above but at index load time.

Option 3

index.read_only or Faiss::ReadOnly::Index facades to return read-only interfaces where the write operations are either non-existent or throw. The only way this will be safe is if an application ensures that access to an index is only made through a singleton that returns this facade.

Co-authored-by: Ufuk Kayserilioglu <[email protected]>
Co-authored-by: Aaron Patterson <[email protected]>

Move without_gvl to utils

Add parallelism test

Simplify config
@tavianator tavianator force-pushed the gvl-release branch 2 times, most recently from a016bd4 to 525069c Compare August 13, 2025 19:07
@tavianator
Copy link

I updated this PR to hang the decision to drop the GVL off of index.frozen?. Once you freeze an index, you can no longer mutate it so it becomes safe to allow parallel search calls. Let me know if you like that approach!

@atesgoral atesgoral changed the title [WIP] GVL release Add support for .freeze for a read-only mode that releases the GVL Aug 13, 2025
@atesgoral atesgoral marked this pull request as ready for review August 13, 2025 20:00

self.search(n, objects.read_ptr(), k, distances.write_ptr(), labels.write_ptr());
if (wrapper.is_frozen()) {
without_gvl([&] {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be helfpul to add to Rice. Pybind does it like this:

https://pybind11.readthedocs.io/en/stable/advanced/misc.html#global-interpreter-lock-gil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to add it to Rice! I don't think it's easy to write a similar API to the pybind11 one since the Ruby ones all take a callback, but I can add something like this without_gvl easily.

Any suggestions for where to put it? Not familiar with the code structure of Rice

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a PR for Rice here: ruby-rice/rice#313

@ankane
Copy link
Owner

ankane commented Aug 15, 2025

Hi @tavianator, thanks for sharing. I like the simplicity of this approach. Can you share a benchmark of the performance improvement?

Also, I agree with @cfis that it'd be nice to add the GVL code to Rice.

Add a new check to all mutating methods that we're not operating on a
frozen Index.  After that, it should be safe to drop the GVL for search
on a frozen Index, since nothing can be mutating it in parallel.
@atesgoral
Copy link
Author

atesgoral commented Aug 16, 2025

@ankane It looks like a win.

tl:dr;

  • Single-threaded baseline: 979.61 queries/sec
  • Best multi-threaded (unfrozen): 1036.73 queries/sec
  • Best multi-threaded (frozen): 3584.14 queries/sec
  • Overall improvement from freezing: 245.7%
  • Maximum scaling achieved: 3.66x (45.8% parallel efficiency)

I got Claude Code vibe-code these benchmark scripts: 2367f1c

Output of the more intense one:

GVL Release Intensive Benchmark for Faiss Index
==================================================
Configuration:
  Dimensions: 256
  Index vectors: 50000
  Queries per iteration: 100
  Iterations: 10
  K neighbors: 100
  Thread counts to test: [1, 2, 4, 8]

Generating random data... done!
Creating and training index... done! (50000 vectors in index)

Running benchmarks...
--------------------------------------------------

Testing with 1 thread(s):
  Unfrozen index: 1.021s (979.61 queries/sec)
  Frozen index:   1.012s (988.1 queries/sec)

Testing with 2 thread(s):
  Unfrozen index: 1.018s (982.68 queries/sec)
  Frozen index:   0.566s (1767.32 queries/sec)
  → Freezing improved performance by 79.8% (1.8x speedup)

Testing with 4 thread(s):
  Unfrozen index: 1.038s (963.32 queries/sec)
  Frozen index:   0.292s (3425.27 queries/sec)
  → Freezing improved performance by 255.6% (3.56x speedup)

Testing with 8 thread(s):
  Unfrozen index: 0.965s (1036.73 queries/sec)
  Frozen index:   0.279s (3584.14 queries/sec)
  → Freezing improved performance by 245.7% (3.46x speedup)

==================================================
Performance Summary:
--------------------------------------------------
Thread Count | Unfrozen QPS | Frozen QPS | Improvement
--------------------------------------------------
           1 |       979.61 |      988.1 | 0%
           2 |       982.68 |    1767.32 | +79.8%
             | (1.0x scaling) | (1.8x scaling) |
           4 |       963.32 |    3425.27 | +255.6%
             | (0.98x scaling) | (3.5x scaling) |
           8 |      1036.73 |    3584.14 | +245.7%
             | (1.06x scaling) | (3.66x scaling) |

==================================================
Key Findings:
  • Single-threaded baseline: 979.61 queries/sec
  • Best multi-threaded (unfrozen): 1036.73 queries/sec
  • Best multi-threaded (frozen): 3584.14 queries/sec
  • Overall improvement from freezing: 245.7%
  • Maximum scaling achieved: 3.66x (45.8% parallel efficiency)

@atesgoral
Copy link
Author

@cfis Is a new Rice release with the GVL-free function calls on the horizon?

@cfis
Copy link

cfis commented Oct 13, 2025

Yes, will try to push out a release this week. Have been updating documentation first, but almost done with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants