-
Notifications
You must be signed in to change notification settings - Fork 1
AIBrix KV cache API #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
158fc25
to
50b87af
Compare
42693cd
to
553489b
Compare
// store all the tokens (including prefix tokens and current tokens | ||
// cached in the chunk), which will be used to avoid hash conflicts. | ||
// | ||
// In its metadata, we store the namespace (i.e., `ns_`), which will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also describe the intended usage for namespaces: e.g., is it intended to be used in a multi-tenant deployment scheme to distinguish the different tenants/models that store their KV chunks into the same KV cache?
// compare it with the md5sum in the metadata. If they are the same, | ||
// we consider the chunk is valid. Otherwise, we consider the chunk is | ||
// corrupted. By far, we don't use the md5sum of the tensors to alleviate | ||
// the compute overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why this decision is made: e.g., for simplicity or for performance?
BTW: "By far" => "So far".
|
||
private: | ||
std::shared_ptr<Buffer> buffer_; | ||
// number of prefix tokens and current tokens in the chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define the number of prefix tokens seems more natural here. In the comment, you can define "total #. tokens in the prompt up to this chunk = <num_prefix_tokens> + <chunk_size>".
public: | ||
static std::unique_ptr<Object> Create() __attribute__((used)) { | ||
return std::static_pointer_cast<Object>( | ||
std::unique_ptr<KVCacheChunk>{new KVCacheChunk()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about simply std::make_unique<KVCacheChunk>()
here instead of new
explicitly?
|
||
void Construct(const ObjectMeta& meta) override; | ||
|
||
int GetChunkSize() { return chunk_size_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const method
const std::vector<std::vector<std::pair<LLMKV, LLMKV>>>& kv_tensor); | ||
|
||
Status Query(const std::vector<int>& prefix, const std::vector<int>& tokens, | ||
std::vector<std::vector<std::pair<LLMKV, LLMKV>>>& kv_tensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pointer type for output parameters.
static Status Make(std::shared_ptr<KVCacheChunkBuilder>& chunk_builder, | ||
RPCClient& rpc_client, int tensor_nbytes, int layer, | ||
int chunk_size, const std::string& kv_cache_ns, | ||
ObjectID chunk_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When querying/loading KV chunk from vineyard, the shape of the chunk , e.g., KV tensors dimensions are actually determined by the existing vineyard objects, so there is no need to repeat them here.
Alternatively, if the shape is specified here, it should be validated against the shape loaded from vineyard.
|
||
namespace vineyard { | ||
|
||
std::string md5(const std::string& content) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add method comment to point out this method produce a MD5 as a human-readable string, instead of the byte sequence.
} | ||
candidates.insert(candidates.end(), tokens.begin() + i, | ||
tokens.begin() + i + chunkSize); | ||
auto currHash = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An explicit type specifier is more suitable here so it's clear what is the hash value (bit-)size.
* The function thus produces a series of interdependent hash values, each | ||
* influenced by the previous hash. | ||
*/ | ||
Status computeChunkHashesForTokens(const std::vector<int>& tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Use pointer type for output parameter.
-
Please update the comment to mention the hash value bit-size.
std::unique_ptr<KVCacheChunk>{new KVCacheChunk()}); | ||
} | ||
|
||
void Construct(const ObjectMeta& meta) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this method can only be called after meta
has been populated by fetching object metadata & data (since you're populating buffers also)? If so, please clearly document this or refer readers to the base class for this requirement.
553489b
to
27531d8
Compare
- rolling hash based approach to maintain the token lineage - S3-FIFO inspired sync and GC mechanism Signed-off-by: DwyaneShi <[email protected]>
some unit tests fail if using arrow 18.0.0 Signed-off-by: DwyaneShi <[email protected]>
Signed-off-by: DwyaneShi <[email protected]>
Signed-off-by: DwyaneShi <[email protected]>
Signed-off-by: DwyaneShi <[email protected]>
27531d8
to
a034f48
Compare
RETURN_ON_ASSERT(meta.GetKeyValue<int>(KVCacheChunk::kFieldNameLayer) == | ||
layer_); | ||
// We assume it's not possilbe to have same name and md5 of all tokens | ||
RETURN_ON_ASSERT(meta.GetKeyValue<std::string>(KVCacheChunk::kFieldNameMd5) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `We assume that it's not possible for two different sequences of tokens to have the same name and md5.
- How safe is this assumption? Can we afford to compare the sequence of tokens here to guard against hash-collision?
~KVCacheChunkBuilder() = default; | ||
|
||
private: | ||
Status Construct(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add API comments here, in particular about its intended usage.
|
||
RETURN_ON_ASSERT(rpc_client_.Connected()); | ||
|
||
VLOG(100) << "Constructing " << ObjectIDToString(chunk_id_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of constructing a new KVCacheChunk instance, the chunk id initialized by Make
above is an invalid id. Is chunk_id_
here still the invalid id? If not, when and where does it gets initialized?
} | ||
|
||
// fetch from remote | ||
if (object == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section (from line 140 to 147) could be better structured as:
if (object == nullptr) { | |
if (!status.ok()) { | |
... | |
return Status::ObjectNotExists(); | |
} | |
if (object_meta.IsLocal()) { | |
object = rpc_client_.GetObject(chunk_id_); | |
RETURN_ON_ASSERT(object != nullptr); | |
return OkStatus(); | |
} | |
// fetch from remote | |
... |
RETURN_ON_ERROR(rpc_client_.ClusterInfo(cluster_info)); | ||
std::string rpc_endpoint = | ||
cluster_info[object_meta.GetInstanceId()].value("rpc_endpoint", ""); | ||
if (!rpc_endpoint.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!rpc_endpoint.empty()) { | |
if (rpc_endpoint.empty()) { | |
// log error obtaining rpc endpoint | |
return error_status; | |
} | |
// continue with the remaining steps |
This is more specific than return a generic object is nullptr
error below.
Summary
This PR introduces AIBrixBlobStorage, a new storage layer for KV tensors in which each unit of storage corresponds to a "chunk" of tokens (a fixed-size grouping of tokens). This approach improves storage efficiency and metadata management by:
Design Details
1. Chunk-based Storage Model: AIBrixBlobStorage stores KV tensors at the chunk level, allowing efficient retrieval and storage by grouping data in fixed-size chunks of tokens.
2. S3-FIFO Replacement Policy
3. Chunk Naming
name = namespace + "_" + hash(hash(previous chunk) + tokens of current chunk)
.4. TTL-based Global GC
access_time
label, which is only pushed to the global metadata during LocalSync to reduce frequent updates.5. KVCacheChunk Abstraction
KVCacheChunk represents a fixed number of tokens within a chunk and organizes its object blob with:
Metadata in each chunk includes:
For more details on the S3-FIFO policy, refer to the post here.