-
Notifications
You must be signed in to change notification settings - Fork 548
Persistence subsystem refactoring #4133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
for more information, see https://pre-commit.ci
Thanks @leventov It's going to take me a second to properly parse through all 16k words, but I'll try to sometime later today. After a quick read just some initial thoughts (feel free to chime in if it seems like I misunderstood something): PersisterI can put in the "Persister" (I think our previous discussions I called it an Adapter, but maybe we can compromise somewhere. Store? Storage? Interface?). I've been eager to rig this up with Redis, and also try directly leverage Nix's caching by manually creating Just a note, Loader vs BasePersistenceLoader has already been decoupled to some extent, but to disambiguate, loader might be better named @dataclass
class EntryKey:
execution_hash: str
hash: str
cache_type: CacheType
path: Optional[Path]
# Created is particular to the proposed "Persister/Adapter method"
class Store:
@abstractmethod
def get(
self,
entry: EntryKey, # feel free to build execution key from this
**kwargs: Any, # Whatever other args might be needed
) -> Optional[bytes]:
"""
Retrieve an entry from the persistence subsystem.
"""
@abstractmethod
def put(
self,
cache: Cache,
BaseConverter: converter
**kwargs: Any,
) -> None:
"""
Store EntryMetadata on self if needed
"""
blob: bytes = converter.to_blob(cache)
# Do whatever with the blob, potentially switching on loader.
...
@abstractmethod
def post_execution_hook(
self,
cell: CellImpl,
runner: cell_runner.Runner,
run_result: cell_runner.RunResult,
) -> None: ...
# Do something like cleanup here
@abstractmethod
def __del__(self) -> None:
"""
__del__ over on_exit
Suggested, since should be singleton- only deleted on shutdown.
Motivation: no current, easy, shutdown hook.
"""
... CacheI've not sure what's gained renaming Cache -> Entry. Also, in your description, this removes a fair amount of effort and care required for managing State and UI objects, in the original Cache Spec: class Cache:
defs: dict[Name, Any]
hash: str
stateful_refs: set[str]
cache_type: CacheType
hit: bool What I will do is expose execution_hash : str On the cache object (Which is computed prior to the content hash variables and should be a sub for your base seed for your executionkey, which you might want to bake the file-contents/ path into). App
I don't think this requires a serialization changes so much as just a user ( |
@dataclass
class EntryKey:
execution_hash: str
hash: str
cache_type: CacheType
path: Optional[Path] Execution hash - is this the equivalent for
If these changes are OK for you, I'll add
@abstractmethod
def put(
self,
cache: Cache,
BaseConverter: converter
**kwargs: Any,
) -> None: As part of my wanderings in the design space (software abstraction and interface space, in this case), I considered providing an argument like converter (object) to
Not sure what you mean by this. Why the logic should be different from the proposed in the doc? In the current proposal, cleanup may be triggered right from
https://stackoverflow.com/a/1481512/ argues against del, any objections to using https://docs.python.org/3/library/atexit.html?
They main reason why I want to rename is because "Cache" is just a confusing name for this abstraction: "cache" is universally understood as the store as a whole, not a single entry within it. Also, it's a misnomer because "cache" is something ephemeral (even if on-disk), while Having said that, we cannot actually remove nor even rename
If my description does this it's my ignorance and I should change the description. I didn't intend to affect the existing functionality. I thought I already did this with putting
All |
I got Store working with Redis yesterday, and was working on a remote Nix cache and tests, but I'll push what I have up- I think having something concrete to review will go a long way.
I'm not sure it should be abstract level the
Maybe, we'd have to look over marimo teardown. It's not system exit, it's runtime when this would need to trigger
You'll see the pattern I use for Redis, this isn't a problem with just config. I think it makes sense to not change serialization much. Top level function changes will potentially make notebooks more dangerous, and considering a push to statically load apps for edit mode. |
@leventov do you think you could build on #4147 (after merge) in a way that doesn't disrupt current functionality? Ideally starting with a proof of concept, that can be expanded with successive PRs? I'm still not 100% clear on your implementation plans, and your doc doesn't relate on why it's an improvement (even if it's self evident to you, what are the limitations of the current mechanism, and how does your proposal address those concerns. Cleanup and concurrency are fair, but solutions like a Redis with LRU turn on already address those). I've quickly read through your doc several times, and it needs a bit more organization and concision Happy to put work where I think there is synergy (like #4147) |
@dmadisetti I already rebased the code on top of the
Focusing on Redis contradicts the desideratum in the first message of #3176:
Redis in specific is not durable (and if there is a durable mode, I don't trust it: I don't trust Redis in general, I avoid it whenever possible), it's an extra stateful runtime that should be kept around where it seems many serverless things can work (or piggy-backing other server things that are already used by the user, such as Git server). Another big point is deduplication. Entry/object separation is needed to enable deduplication basically. For persistent execution use case, deduplication should be a big deal because it's expected that there will be lots of cells with differing inputs or codes (and hence different Entries (execution keys more specifically) are also designed to be extensible in the future to accommodate versions, I discuss them in #3176. What I'm implementing is not a "new" design, it's exactly the design that you agreed with (seemingly) in the end of #3176 discussion, already with concessions from my perspective. The doc committed in this PR is just an elaboration of that design because what seemed "simple" on the hand-weavy level and could be described in a couple of sentences turned out to require a lot of careful adjustment of certain details about locking, operation sequencing, formats, etc. |
📝 Summary
Implements the first stage of persistence subsystem refactoring, as discussed at length in #3176.
🔍 Description of Changes
This PR only has textual detailed design and implementation plan yet. This is the first version of this design and plan that looks consistent to me (doesn't have internal contradictions) and is complete (sufficient for implementation): no high-level hand-weaving and descriptions, just specific implementation details and algorithms.
The hardest part was to get inter-process locking and "new snapshot creation" + "entry writing" machinery correct, especially in the presence of non-cooperative synchronization systems like Dropbox. I iterated through a lot of designs internally and was founding race conditions with them until I arrived at the one described in the document. I don't claim it's optimal (likely not), but at least I'm relatively confident that it's concurrency bug-free.
I'll add the actual code implementation of this detailed plan soon (I already starting to write it with Claude Code), and I expect it to take an order of magnitude less time than it took to create the plan itself (more than 1 month), unless new concurrency or other problems are be found in the process.
I suggest reviewers (@mscolnick and @dmadisetti) to also reason and review on the level of the text description because the code is going to be even bigger in volume than this description.
📋 Checklist
📜 Reviewers
@mscolnick @dmadisetti