Skip to content

[discussion] (de)serialize data in "C"? #60

@r2evans

Description

@r2evans

The notion that redis is storing strings is fine, but R is unique among most languages in that strings can be particularly punishing. When retrieving larger objects (e.g., 1000-row frame), retrieving the JSON (or however it is stringified depending on the creation mechanism) as a string, bring into R-memory, and then deserializing from string can be much less efficient than it strictly needs to be.

What are your thoughts on including (which means writing from scratch, I believe) inline (de)serialization of data?

In my use-case, we have a rather larger cache-in-redis of relatively large amounts of data. The efficiency of in-memory caching of large objects is not my point here (since a partner company is hosting and pushing data to their redis in a cloud). The long-term storage is an arrow datamart, but for many other (non-R) apps they are using redis as a cache. The total dataset is in the millions of rows, but each redis object is a 300-1000 row (70+ column) frame. Just deserializing takes an extra 60MB (300 row frame) above what is actually used once deserialized, and all apps load hundreds of thousands of these frames at once, so 60MB will add up. (R's global string pool.) (For reference, toJSON(dat) strings are between 465K-1553K characters. Not huge, but thousands of these add up.)

Clearly this doesn't need to support every serialization mechanism that can work with redis, but some industry standards might support R's native and JSON.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions