-
Notifications
You must be signed in to change notification settings - Fork 290
[Store] Enable Client SSD Offload And Storage Persistence #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ated description in doc
…gen-style comments to all header files - Removed redundant code and outdated comments - Optimized function execution logic in LocalFile
… add related description in doc" This reverts commit 159442d. revert old high-level api test and doc modification
…roduce the storage path through environment variables.
- Refactor write operations to use thread pool for async file I/O - Fix potential double-unlock bug by adding atomic is_locked_ flag - Add corrupted file cleanup on write failure: - Auto-delete files with failed writes in destructor - Prevent subsequent reads of corrupted data
Thanks a lot for the contribution! This PR is a bit on the large side—would it be possible to break it up into smaller pieces? BTW, we probably don't need USE_CLIENT_PERSISTENCE here, since this feature doesn't introduce any new dependencies. |
* Remove precompilation parameters to simplify build configuration * Add session ID mechanism for cluster isolation: - Master node now generates unique session IDs on initialization - All persistent operations are scoped under session-specific subdirectories
…r than store_py.cpp
…upport file and memory type
Updates Implemented:1. Asynchronous Persistence 2. Support for All Upper-layer Interfaces Clarifications: 3. Persistence Isolation 4. Storage Path Configuration 5. Extended Replica Types 6. File Read-Write Locks 7. Code Quality Improvements 8. Testing Infrastructure Future Work:1. Metadata Management Migration 2. Async Batch Operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces client‐side support for SSD offload and storage persistence in the Mooncake project by implementing file‐based persistence alongside in‐memory caching. Key changes include:
- New thread pool implementation and asynchronous storage operations.
- Enhancements to master service/client for session ID handling.
- Implementation of file storage backend and local file I/O for persisting KVcache data.
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
thread_pool.cpp / thread_pool.h | Added a new thread pool for asynchronous task processing. |
master_service.cpp / master_client.cpp | Added session ID generation and retrieval for storage backend. |
local_file.cpp / file_storage_backend.cpp | Implemented RAII-based file I/O and filesystem storage operations. |
client.cpp | Modified client logic to integrate storage persistence support. |
types.h | Updated replica descriptor to use a variant for memory/disk info. |
Other .cpp/.h files | Updated related components (e.g. rpc_service, CMakeLists.txt) accordingly. |
Comments suppressed due to low confidence (2)
mooncake-store/src/master_service.cpp:92
- [nitpick] Consider adding a comment explaining why std::chrono::steady_clock is used for generating the session ID. Clarifying the rationale can help future maintainers understand that the monotonic nature of steady_clock is leveraged intentionally.
session_id_ = std::to_string(
mooncake-store/src/file_storage_backend.cpp:17
- Ensure that the behavior when a file already exists (returning FILE_OPEN_FAIL) aligns with the overall design and document this decision. If overwriting is not intended, an explicit comment clarifying this policy would be beneficial.
if(std::filesystem::exists(path) == true) {
} | ||
|
||
write_thread_pool_.enqueue([backend = storage_backend_, key, value = std::move(value)] { | ||
backend->StoreObject(key, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider handling or logging the error code returned from backend->StoreObject in the asynchronous task. This would improve error monitoring and facilitate debugging if persistence operations fail.
backend->StoreObject(key, value); | |
ErrorCode err = backend->StoreObject(key, value); | |
if (err != ErrorCode::OK) { | |
LOG(ERROR) << "store_object_failed key=" << key << " error_code=" << static_cast<int>(err); | |
} |
Copilot uses AI. Check for mistakes.
|
||
// Client persistent thread pool for async operations | ||
ThreadPool write_thread_pool_; | ||
std::shared_ptr<StorageBackend> storage_backend_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the thread pool into the StorageManager? since it's meant for the StorageManager, not the client.
* - Slice-based format (for scattered data) | ||
* - Contiguous string format | ||
*/ | ||
class StorageBackend { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an abstract here, since we don’t have a second backend anyway?
Great PR! |
…issue of significant performance degradation when writing files with put.
I am currently working on the KVcache SSD offload feature for the client side in the Mooncake project. The primary implementation approach involves
The persistence functionality is enabled through a precompiled parameter
USE_CLIENT_PERSISTENCE
.[TODO] Currently, file write operations are still performed synchronously, but the related thread pool asynchronous interfaces have been implemented and will be modified to asynchronous operations in a subsequent commit after debugging is completed.
The current code represents the initial implementation of KVcache persistence on the client side. Future work will focus on refining the existing implementation, including adding comments, removing redundant code, improving readability and extensibility, adding test code, and updating documentation.