-
Notifications
You must be signed in to change notification settings - Fork 4
[v1.33] 1-bit RQ #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1-33/main
Are you sure you want to change the base?
[v1.33] 1-bit RQ #144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orca Security Scan Summary
Status | Check | Issues by priority | |
---|---|---|---|
![]() |
Infrastructure as Code | ![]() ![]() ![]() ![]() |
View in Orca |
![]() |
SAST | ![]() ![]() ![]() ![]() |
View in Orca |
![]() |
Secrets | ![]() ![]() ![]() ![]() |
View in Orca |
![]() |
Vulnerabilities | ![]() ![]() ![]() ![]() |
View in Orca |
Great to see you again! Thanks for the contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orca Security Scan Summary
Status | Check | Issues by priority | |
---|---|---|---|
![]() |
Secrets | ![]() ![]() ![]() ![]() |
View in Orca |
@@ -114,26 +116,53 @@ When SQ is enabled, Weaviate boosts recall by over-fetching compressed results. | |||
|
|||
:::caution Technical preview | |||
|
|||
Rotational quantization (RQ) was added in **`v1.32`** as a **technical preview**.<br/><br/> | |||
**8-bit Rotational quantization (RQ)** was added in **`v1.32`** as a **technical preview**.<br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the "preview" nomenclature as Alvin proposed in the QAgent discussion?
Also, is RQ still in preview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change all "technical preview" into "preview" for this release and use it going forward.
8 bit is GA
This means that the feature is still under development and may change in future releases, including potential breaking changes. | ||
**We do not recommend using this feature in production environments at this time.** | ||
|
||
::: | ||
|
||
**Rotational quantization (RQ)** is an untrained 8-bit quantization technique that provides 4x compression while maintaining 98-99% recall on most datasets. Unlike SQ, RQ requires no training phase and can be enabled immediately at index creation. RQ works in two steps: | ||
**Rotational quantization (RQ)** is an untrained quantization technique that provides significant compression while maintaining high recall on most datasets. Unlike SQ, RQ requires no training phase and can be enabled immediately at index creation. RQ is available in two variants: **8-bit RQ** and **1-bit RQ**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Untrained" sounds a bit unusual to me, and a bit negative maybe. Wdyt about "non-parametric" or just leaving it out?
|
||
### 8-bit RQ | ||
|
||
8-bit RQ provides 4x compression while maintaining 98-99% recall on most datasets. RQ works in two steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"on most datasets" -> "in internal testing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let's not overpromise
|
||
1. **Fast pseudorandom rotation**: The input vector is transformed using a fast rotation based on the Walsh Hadamard Transform. This rotation takes approximately 7-10 microseconds for a 1536-dimensional vector. The output dimension is rounded up to the nearest multiple of 64. | ||
|
||
2. **Scalar quantization**: Each entry of the rotated vector is quantized to an 8-bit integer. The minimum and maximum values of each individual rotated vector define the quantization interval. | ||
|
||
### 1-bit RQ | ||
|
||
1-bit RQ is an untrained asymmetric quantization method that provides close to 32x compression as dimensionality increases. This method is inspired by 1-bit RaBitQ and works as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mention RaBitQ in our docs at all? I don't see much upside. Maybe we could broadly mention just once somewhere that they share some similarities
|
||
2. **Asymmetric quantization**: | ||
- **Data vectors**: Quantized using 1 bit per dimension by storing only the sign of each entry | ||
- **Query vectors**: Scalar quantized using 5 bits per dimension during search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow this is interesting. How do we compare an array of signs to these 5-bit arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will clarify with the team and add later
The rotation step provides multiple benefits. It tends to reduce the quantization interval and decrease quantization error by distributing values more uniformly. It also distributes the distance information more evenly across all dimensions, providing a better starting point for distance estimation. | ||
|
||
It's worth noting that RQ rounds up dimensions to multiples of 64 which means that low-dimensional data (< 64 or 128 dimensions) might result in less than optimal compression. | ||
It's worth noting that both RQ variants round up dimensions to multiples of 64, which means that low-dimensional data (< 64 or 128 dimensions) might result in less than optimal compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clarify where we are talking about each dimension and cases like here where we talk about the "number of dimensions".
It's worth noting that RQ rounds up dimensions to multiples of 64 which means that low-dimensional data (< 64 or 128 dimensions) might result in less than optimal compression. | ||
It's worth noting that both RQ variants round up dimensions to multiples of 64, which means that low-dimensional data (< 64 or 128 dimensions) might result in less than optimal compression. | ||
|
||
While inspired by extended RaBitQ, this implementation differs significantly for performance reasons. It uses fast pseudorandom rotations instead of truly random rotations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to suggest leaving out these interleaved discussion of RQ vs RaBitQ.
RaBitQ isn't an available option to the user so I'm not sure how useful these comparisons are.
Most users will be interested in choosing between available algos in Weaviate.
If we want to keep these comments, we could maybe have one subsection where we acknowledge the inspiration and make comparisons there.
Wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, not much value in this info. I just left one reference here mentioning the inspiration from RaBitQ and linking to the original paper
|
||
While inspired by extended RaBitQ, this implementation differs significantly for performance reasons. It Uses fast pseudorandom rotations instead of truly random rotations and it employs scalar quantization instead of RaBitQ's encoding algorithm, which becomes prohibitively slow with more bits per entry. | ||
From the user perspective, 1-bit RQ is not a separate quantization method, but rather a configuration setting for RQ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this sentence.
- This (from the user perspective) implies that the rest of the docs are not for user consumption.
- I thought it was pretty self evident that 1-bit RQ is a config setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove this, it's actually a leftover from when I wanted to include a code snippet
@@ -15,13 +15,18 @@ import JavaCode from '!!raw-loader!/\_includes/code/howto/java/src/test/java/io/ | |||
|
|||
:::caution Technical preview | |||
|
|||
Rotational quantization (RQ) was added in **`v1.32`** as a **technical preview**.<br/><br/> | |||
**8-bit Rotational quantization (RQ)** was added in **`v1.32`** as a **technical preview**.<br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re preview vs technical preview
This means that the feature is still under development and may change in future releases, including potential breaking changes. | ||
**We do not recommend using this feature in production environments at this time.** | ||
|
||
::: | ||
|
||
[**Rotational quantization (RQ)**](../../concepts/vector-quantization.md#rotational-quantization) is a fast untrained vector compression technique that offers 4x compression while retaining almost perfect recall (98-99% on most datasets). | ||
[**Rotational quantization (RQ)**](../../concepts/vector-quantization.md#rotational-quantization) is a fast untrained vector compression technique. Two RQ variants are available in Weaviate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re untrained
What's being changed:
Docs for 1-bit rotational quantization (RQ).
Type of change:
How has this been tested?
yarn start