Skip to content

AIBrix KV cache API #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open

AIBrix KV cache API #6

wants to merge 5 commits into from

Conversation

DwyaneShi
Copy link
Collaborator

@DwyaneShi DwyaneShi commented Nov 6, 2024

Summary

This PR introduces AIBrixBlobStorage, a new storage layer for KV tensors in which each unit of storage corresponds to a "chunk" of tokens (a fixed-size grouping of tokens). This approach improves storage efficiency and metadata management by:

  • Reducing metadata overhead compared to the v6d v1 API.
  • Avoiding the access overhead of the Posix File API seen in v6d v2 API.

Design Details

1. Chunk-based Storage Model: AIBrixBlobStorage stores KV tensors at the chunk level, allowing efficient retrieval and storage by grouping data in fixed-size chunks of tokens.

2. S3-FIFO Replacement Policy

  • The S3-FIFO policy optimizes chunk retention by being scan-resistant and recognizing frequently accessed ("hot") chunks.
  • The Main FIFO list mirrors the global chunk list.
  • The LocalSync function periodically persists new chunks in the Main FIFO list to the global chunk list. Persisted chunks evicted from the Main FIFO list will be deleted from the global chunk list., maintaining a streamlined cache.

3. Chunk Naming

  • Each chunk has a unique name calculated by name = namespace + "_" + hash(hash(previous chunk) + tokens of current chunk).
  • This hashing mechanism provides unique chunk names that support efficient lookup and conflict avoidance
  • This naming mechanism also includes the chunk lineage info.

4. TTL-based Global GC

  • Each chunk’s last access time is recorded in its access_time label, which is only pushed to the global metadata during LocalSync to reduce frequent updates.
  • The GlobalGC function periodically checks all chunks for expiration by TTL.

5. KVCacheChunk Abstraction

KVCacheChunk represents a fixed number of tokens within a chunk and organizes its object blob with:

  • All KV tensors stored first.
  • All tokens (including prefix and current tokens) appended afterward, allowing for exact token list matching.

Metadata in each chunk includes:

  • Namespace: Used as a chunk name prefix and to list all chunks.
  • Access time: Tracks chunk access for TTL-based GC.
  • md5sum: Covers all tokens to detect corruption when reconstructing chunks. This checksum excludes tensors to reduce compute overhead.

For more details on the S3-FIFO policy, refer to the post here.

// store all the tokens (including prefix tokens and current tokens
// cached in the chunk), which will be used to avoid hash conflicts.
//
// In its metadata, we store the namespace (i.e., `ns_`), which will

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also describe the intended usage for namespaces: e.g., is it intended to be used in a multi-tenant deployment scheme to distinguish the different tenants/models that store their KV chunks into the same KV cache?

// compare it with the md5sum in the metadata. If they are the same,
// we consider the chunk is valid. Otherwise, we consider the chunk is
// corrupted. By far, we don't use the md5sum of the tensors to alleviate
// the compute overhead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why this decision is made: e.g., for simplicity or for performance?

BTW: "By far" => "So far".


private:
std::shared_ptr<Buffer> buffer_;
// number of prefix tokens and current tokens in the chunk

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define the number of prefix tokens seems more natural here. In the comment, you can define "total #. tokens in the prompt up to this chunk = <num_prefix_tokens> + <chunk_size>".

public:
static std::unique_ptr<Object> Create() __attribute__((used)) {
return std::static_pointer_cast<Object>(
std::unique_ptr<KVCacheChunk>{new KVCacheChunk()});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply std::make_unique<KVCacheChunk>() here instead of new explicitly?


void Construct(const ObjectMeta& meta) override;

int GetChunkSize() { return chunk_size_; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const method

const std::vector<std::vector<std::pair<LLMKV, LLMKV>>>& kv_tensor);

Status Query(const std::vector<int>& prefix, const std::vector<int>& tokens,
std::vector<std::vector<std::pair<LLMKV, LLMKV>>>& kv_tensor);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pointer type for output parameters.

static Status Make(std::shared_ptr<KVCacheChunkBuilder>& chunk_builder,
RPCClient& rpc_client, int tensor_nbytes, int layer,
int chunk_size, const std::string& kv_cache_ns,
ObjectID chunk_id);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When querying/loading KV chunk from vineyard, the shape of the chunk , e.g., KV tensors dimensions are actually determined by the existing vineyard objects, so there is no need to repeat them here.

Alternatively, if the shape is specified here, it should be validated against the shape loaded from vineyard.


namespace vineyard {

std::string md5(const std::string& content) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add method comment to point out this method produce a MD5 as a human-readable string, instead of the byte sequence.

}
candidates.insert(candidates.end(), tokens.begin() + i,
tokens.begin() + i + chunkSize);
auto currHash =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explicit type specifier is more suitable here so it's clear what is the hash value (bit-)size.

* The function thus produces a series of interdependent hash values, each
* influenced by the previous hash.
*/
Status computeChunkHashesForTokens(const std::vector<int>& tokens,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Use pointer type for output parameter.

  2. Please update the comment to mention the hash value bit-size.

std::unique_ptr<KVCacheChunk>{new KVCacheChunk()});
}

void Construct(const ObjectMeta& meta) override;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this method can only be called after meta has been populated by fetching object metadata & data (since you're populating buffers also)? If so, please clearly document this or refer readers to the base class for this requirement.

- rolling hash based approach to maintain the token lineage
- S3-FIFO inspired sync and GC mechanism

Signed-off-by: DwyaneShi <[email protected]>
some unit tests fail if using arrow 18.0.0

Signed-off-by: DwyaneShi <[email protected]>
Signed-off-by: DwyaneShi <[email protected]>
RETURN_ON_ASSERT(meta.GetKeyValue<int>(KVCacheChunk::kFieldNameLayer) ==
layer_);
// We assume it's not possilbe to have same name and md5 of all tokens
RETURN_ON_ASSERT(meta.GetKeyValue<std::string>(KVCacheChunk::kFieldNameMd5) ==

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. `We assume that it's not possible for two different sequences of tokens to have the same name and md5.
  2. How safe is this assumption? Can we afford to compare the sequence of tokens here to guard against hash-collision?

~KVCacheChunkBuilder() = default;

private:
Status Construct();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add API comments here, in particular about its intended usage.


RETURN_ON_ASSERT(rpc_client_.Connected());

VLOG(100) << "Constructing " << ObjectIDToString(chunk_id_);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of constructing a new KVCacheChunk instance, the chunk id initialized by Make above is an invalid id. Is chunk_id_ here still the invalid id? If not, when and where does it gets initialized?

}

// fetch from remote
if (object == nullptr) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (from line 140 to 147) could be better structured as:

Suggested change
if (object == nullptr) {
if (!status.ok()) {
...
return Status::ObjectNotExists();
}
if (object_meta.IsLocal()) {
object = rpc_client_.GetObject(chunk_id_);
RETURN_ON_ASSERT(object != nullptr);
return OkStatus();
}
// fetch from remote
...

RETURN_ON_ERROR(rpc_client_.ClusterInfo(cluster_info));
std::string rpc_endpoint =
cluster_info[object_meta.GetInstanceId()].value("rpc_endpoint", "");
if (!rpc_endpoint.empty()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!rpc_endpoint.empty()) {
if (rpc_endpoint.empty()) {
// log error obtaining rpc endpoint
return error_status;
}
// continue with the remaining steps

This is more specific than return a generic object is nullptr error below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants