Skip to content

Conversation

@norrishuang
Copy link
Contributor

awsopensearch add normalize for cosine similarity, solved issue of recall rate with Cohere dataset

alwayslove2013
alwayslove2013 previously approved these changes Oct 16, 2025
if self.case_config.metric_type.upper() == "COSINE":
log.info("cosine dataset need normalize.")
return True
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need_normalize_cosine function is designed for databases that do NOT natively support cosine similarity metrics. When testing cosine-based datasets, this function normalizes all training and query vectors before tests, to make the metric equivalent to L2 (Euclidean) or IP (Inner Product), ensuring accurate recall calculations. This is a workaround solution when cosine similarity is not directly supported.

For databases that already support cosine similarity, we do not recommend enabling this feature. The goal is to maintain consistent benchmarking across different databases and ensure fair testing conditions.

@alwayslove2013 alwayslove2013 dismissed their stale review October 16, 2025 02:08

my bad. let me take that back

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: norrishuang
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants