Skip to content

[Store] Enable Client SSD Offload And Storage Persistence #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

SgtPepperr
Copy link

I am currently working on the KVcache SSD offload feature for the client side in the Mooncake project. The primary implementation approach involves

  • Initialize: specifying the storage path for files during the initialization phase. If the storage path is invalid, subsequent persistence operations will fail.
  • Put: During the put stage, for each successful put request, the client writes the data to the designated storage path using POSIX file write operations.
  • Get: In the get stage, for each failed get request, the system attempts to locate the corresponding KVcache from the local storage path. If found, the data is read from the file and returned correctly.

The persistence functionality is enabled through a precompiled parameter USE_CLIENT_PERSISTENCE.

[TODO] Currently, file write operations are still performed synchronously, but the related thread pool asynchronous interfaces have been implemented and will be modified to asynchronous operations in a subsequent commit after debugging is completed.

The current code represents the initial implementation of KVcache persistence on the client side. Future work will focus on refining the existing implementation, including adding comments, removing redundant code, improving readability and extensibility, adding test code, and updating documentation.

…gen-style comments to all header files - Removed redundant code and outdated comments - Optimized function execution logic in LocalFile
… add related description in doc"

This reverts commit 159442d.

revert old high-level api test and doc modification
…roduce the storage path through environment variables.
- Refactor write operations to use thread pool for async file I/O
- Fix potential double-unlock bug by adding atomic is_locked_ flag
- Add corrupted file cleanup on write failure:
  - Auto-delete files with failed writes in destructor
  - Prevent subsequent reads of corrupted data
@xiaguan
Copy link
Collaborator

xiaguan commented Jun 5, 2025

Thanks a lot for the contribution! This PR is a bit on the large side—would it be possible to break it up into smaller pieces?

BTW, we probably don't need USE_CLIENT_PERSISTENCE here, since this feature doesn't introduce any new dependencies.

@stmatengss stmatengss requested a review from Copilot June 5, 2025 12:01
Copilot

This comment was marked as outdated.

@stmatengss stmatengss requested a review from xiaguan June 16, 2025 06:51
@SgtPepperr
Copy link
Author

Updates Implemented:

1. Asynchronous Persistence
Implemented file persistence functionality using thread pools for asynchronous writes.

2. Support for All Upper-layer Interfaces
Unified support for all Python top-level interfaces:
Get operations: get, get_buffer, get_size
Put operations: put, put_parts
Remove operations: remove, remove_all, tearDownAll

Clarifications:
a. For get/put operations, file I/O operations are encapsulated at the client layer, hiding implementation details from upper layers.
b. remove deletes both memory and disk KV caches. remove_all deletes all data in subdirectories. tearDownAll performs no file removal.

3. Persistence Isolation
Each cluster's persistent data resides in an isolated directory for easy remove_all and cluster restart operations.
Mechanism:
a. During master initialization, a microsecond-precision timestamp is generated as session ID
b. Clients request this session ID from master, using storage_root_path/session-id as persistence directory
c. All cluster clients share the same DFS persistence location

4. Storage Path Configuration
Modified from direct path specification in top-level interfaces to environment variable reading during client initialization:
○ Checks storage path existence
○ Disables persistence if path doesn't exist

5. Extended Replica Types
Expanded replica Descriptor to support:
memory type: Uses transfer engine
disk type: Uses file I/O operations
Read operations automatically select method based on descriptor type.

6. File Read-Write Locks
Added thread-safe file access through read-write locks.

7. Code Quality Improvements
○ Added comprehensive comments
○ Removed redundant code
○ Enhanced readability and extensibility

8. Testing Infrastructure
○ Added C++/Python test cases for persistence functionality
○ Enabled SSD offload testing in CI/CD pipelines

Future Work:

1. Metadata Management Migration
Move file metadata management from client to master node.

2. Async Batch Operations
Support asynchronous file reading in batchget interface.

@SgtPepperr SgtPepperr changed the title [MooncakeStore] Enable Client SSD Offload And Storage Persistence [Store] Enable Client SSD Offload And Storage Persistence Jun 17, 2025
@stmatengss stmatengss requested a review from Copilot June 17, 2025 07:02
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces client‐side support for SSD offload and storage persistence in the Mooncake project by implementing file‐based persistence alongside in‐memory caching. Key changes include:

  • New thread pool implementation and asynchronous storage operations.
  • Enhancements to master service/client for session ID handling.
  • Implementation of file storage backend and local file I/O for persisting KVcache data.

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
thread_pool.cpp / thread_pool.h Added a new thread pool for asynchronous task processing.
master_service.cpp / master_client.cpp Added session ID generation and retrieval for storage backend.
local_file.cpp / file_storage_backend.cpp Implemented RAII-based file I/O and filesystem storage operations.
client.cpp Modified client logic to integrate storage persistence support.
types.h Updated replica descriptor to use a variant for memory/disk info.
Other .cpp/.h files Updated related components (e.g. rpc_service, CMakeLists.txt) accordingly.
Comments suppressed due to low confidence (2)

mooncake-store/src/master_service.cpp:92

  • [nitpick] Consider adding a comment explaining why std::chrono::steady_clock is used for generating the session ID. Clarifying the rationale can help future maintainers understand that the monotonic nature of steady_clock is leveraged intentionally.
session_id_ = std::to_string(

mooncake-store/src/file_storage_backend.cpp:17

  • Ensure that the behavior when a file already exists (returning FILE_OPEN_FAIL) aligns with the overall design and document this decision. If overwriting is not intended, an explicit comment clarifying this policy would be beneficial.
if(std::filesystem::exists(path) == true) {

}

write_thread_pool_.enqueue([backend = storage_backend_, key, value = std::move(value)] {
backend->StoreObject(key, value);
Copy link
Preview

Copilot AI Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider handling or logging the error code returned from backend->StoreObject in the asynchronous task. This would improve error monitoring and facilitate debugging if persistence operations fail.

Suggested change
backend->StoreObject(key, value);
ErrorCode err = backend->StoreObject(key, value);
if (err != ErrorCode::OK) {
LOG(ERROR) << "store_object_failed key=" << key << " error_code=" << static_cast<int>(err);
}

Copilot uses AI. Check for mistakes.

@stmatengss stmatengss self-assigned this Jun 17, 2025

// Client persistent thread pool for async operations
ThreadPool write_thread_pool_;
std::shared_ptr<StorageBackend> storage_backend_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the thread pool into the StorageManager? since it's meant for the StorageManager, not the client.

* - Slice-based format (for scattered data)
* - Contiguous string format
*/
class StorageBackend {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an abstract here, since we don’t have a second backend anyway?

@james0zan james0zan requested review from stmatengss and xiaguan June 18, 2025 02:46
@james0zan
Copy link
Member

Great PR!
Before merging, please include an evaluation of compatibility with the HA feature in this PR, as well as detailed performance testing. It's important to ensure that the throughput of the SSD layer is comparable to the raw performance of 3FS.

…issue of significant performance degradation when writing files with put.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants