Skip to content

Conversation

@LHT129
Copy link
Collaborator

@LHT129 LHT129 commented Oct 10, 2025

  • vt pool waste too much time on fast search in HGraph Index

Summary by Sourcery

Optimize k-means clustering performance by reducing contention in the resource pool and tuning the HGraph index parameters dynamically.

Enhancements:

  • Introduce thread-local caching in ResourceObjectPool (with a configurable local pool capacity) to minimize global mutex contention when taking and returning objects.
  • Compute HGraph index degree as max(32, dim/8) and use it to format the index creation string for better adaptation to data dimensionality.
  • Mark the HGraph index as immutable after building to accelerate subsequent search queries.

@LHT129 LHT129 self-assigned this Oct 10, 2025
@LHT129 LHT129 added kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) version/0.18 labels Oct 10, 2025
@gemini-code-assist
Copy link

Important

Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services.

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 10, 2025

Reviewer's Guide

This PR optimizes object pooling by introducing thread-local caches to minimize global locking and enhances HGraph index setup in KMeans by parameterizing the degree and marking the index as immutable.

Class diagram for updated ResourceObjectPool with thread-local pool

classDiagram
class ResourceObjectPool {
    +TakeOne() std::shared_ptr<T>
    +ReturnOne(std::shared_ptr<T>&)
    +GetSize() uint64_t
    -resize(uint64_t)
    -get_local_pool() std::deque<std::shared_ptr<T>>&
    -pool_ std::unique_ptr<Deque<std::shared_ptr<T>>>
    -pool_size_ std::atomic<uint64_t>
    -owned_allocator_ std::shared_ptr<Allocator>
    -kLocalPoolCapacity static const uint64_t
}
class Deque
class Allocator
ResourceObjectPool --> Deque : uses
ResourceObjectPool --> Allocator : owns
Loading

Class diagram for updated KMeansCluster HGraph index setup

classDiagram
class KMeansCluster {
    +find_nearest_one_with_hgraph(const float* query, ...)
    -dim_ int
    -allocator_ Allocator
    -thread_pool_ ThreadPool
}
class InnerIndexInterface {
    +FastCreateIndex(string, param)
}
class HGraphIndex {
    +Build(base)
    +SetImmutable()
}
class Dataset {
    +Make()
}
KMeansCluster --> InnerIndexInterface : creates index
KMeansCluster --> HGraphIndex : builds and sets immutable
KMeansCluster --> Dataset : uses
Loading

File-Level Changes

Change Details Files
Add per-thread local caching in ResourceObjectPool to reduce contention
  • Introduce thread-local deque via get_local_pool()
  • Modify TakeOne() to first attempt retrieval from local pool, then from global pool with atomic size updates
  • Modify ReturnOne() to push to local pool up to capacity before falling back to global pool
  • Add kLocalPoolCapacity constant and use atomic fetch_add/fetch_sub for pool_size
  • Refactor lock scopes and ensure objects are reset upon reuse
src/utils/resource_object_pool.h
Parameterize HGraph index creation and set immutability in KMeansCluster
  • Compute max_degree dynamically as max(32, dim_/8)
  • Use fmt::format to build the HGraph index descriptor string
  • Replace hardcoded degree value with formatted max_degree
  • Invoke SetImmutable() on the built HGraph before search
src/impl/cluster/kmeans_cluster.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `src/utils/resource_object_pool.h:71-75` </location>
<code_context>
-        std::unique_lock<std::mutex> lock(mutex_);
-        if (pool_->empty()) {
+        auto& local_pool = this->get_local_pool();
+        if (not local_pool.empty()) {
+            std::shared_ptr<T> obj = local_pool.front();
+            local_pool.pop_front();
+            return obj;
+        }
</code_context>

<issue_to_address>
**suggestion:** Consider resetting objects from the local pool before returning.

Objects from the local pool are returned without Reset(), unlike those from the shared pool. This may cause inconsistent state if Reset() is required before reuse. Please review whether Reset() should be called for local pool objects as well.

```suggestion
        if (not local_pool.empty()) {
            std::shared_ptr<T> obj = local_pool.front();
            local_pool.pop_front();
            if (obj) {
                obj->Reset();
            }
            return obj;
        }
```
</issue_to_address>

### Comment 2
<location> `src/impl/cluster/kmeans_cluster.cpp:255` </location>
<code_context>
     param.allocator_ = std::make_shared<SafeAllocator>(this->allocator_);
     param.thread_pool_ = this->thread_pool_;
     param.metric_ = MetricType::METRIC_TYPE_L2SQR;
+    auto max_degree = std::max(32, dim_ / 8);

-    auto hgraph = InnerIndexInterface::FastCreateIndex("hgraph|32|fp32", param);
</code_context>

<issue_to_address>
**question:** Check for potential integer division issues with dim_ / 8.

If dim_ is less than 8, max_degree will default to 32. Please verify this is correct for small values of dim_.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

param.allocator_ = std::make_shared<SafeAllocator>(this->allocator_);
param.thread_pool_ = this->thread_pool_;
param.metric_ = MetricType::METRIC_TYPE_L2SQR;
auto max_degree = std::max(32, dim_ / 8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Check for potential integer division issues with dim_ / 8.

If dim_ is less than 8, max_degree will default to 32. Please verify this is correct for small values of dim_.


private:
std::shared_ptr<Allocator> owned_allocator_{nullptr};
static const uint64_t kLocalPoolCapacity = 4; // Tunable parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: rename kLocalPoolCapacity to local_pool_capacity_ or LOCAL_POOL_CAPACITY

private:
inline void
resize(uint64_t size) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unrelated change

return;
}
{
std::lock_guard<std::mutex> lock(mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change std::lock_guard to std::scoped_lock

@codecov
Copy link

codecov bot commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

@@            Coverage Diff             @@
##             main    #1219      +/-   ##
==========================================
+ Coverage   91.42%   91.52%   +0.10%     
==========================================
  Files         318      318              
  Lines       17622    17621       -1     
==========================================
+ Hits        16111    16128      +17     
+ Misses       1511     1493      -18     
Flag Coverage Δ
cpp 91.52% <92.59%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
common 91.78% <ø> (ø)
datacell 92.95% <ø> (-0.04%) ⬇️
index 90.60% <ø> (+0.16%) ⬆️
simd 100.00% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae9220f...25c2eab. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@inabao inabao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- vt pool waste too much time on fast search in HGraph Index

Signed-off-by: LHT129 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Code improvements (variable/function renaming, refactoring, etc. ) size/L version/0.18

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants