Skip to content

Conversation

jyeshe
Copy link

@jyeshe jyeshe commented Sep 1, 2025

What/Why

This PR is a complement for #440.

It enables the sync from a LMDB store to any other by allowing to iterate through out all LMDB key values (visiting all the branches and leaves for all hashpath trees).

Additional notes

It depends on twilson63/elmdb-rs#12

@jyeshe
Copy link
Author

jyeshe commented Sep 1, 2025

1> FromLmdbStore = #{~"store-module" => hb_store_lmdb, ~"name" => ~"cache-mainnet", ~"resolve" => false}.
#{<<"name">> => <<"cache-mainnet">>,<<"resolve">> => false,
  <<"store-module">> => hb_store_lmdb}
2> ToLmdbStore2 = #{~"store-module" => hb_store_lmdb, ~"name" => ~"cache-mainnet-lmdb2"}.
#{<<"name">> => <<"cache-mainnet-lmdb2">>,
  <<"store-module">> => hb_store_lmdb}
3> hb_store:sync(FromLmdbStore, ToLmdbStore2).
ok
4> f(FromLmdbStore). FromLmdbStore = #{~"store-module" => hb_store_lmdb, ~"name" => ~"cache-mainnet"}.
ok
5> FromLmdbStore = #{~"store-module" => hb_store_lmdb, ~"name" => ~"cache-mainnet"}.
#{<<"name">> => <<"cache-mainnet">>,
  <<"store-module">> => hb_store_lmdb}
6> {ok, FromEntries} = hb_store:list(FromLmdbStore, ~"/"), length(FromEntries).
173
7> Groups = [X || X <- FromEntries, composite == hb_store:type(ToLmdbStore2, X)], length(Groups).
173
8> Paths = [<<G/binary, "/", F/binary>> || G <- Groups, F <- element(2, hb_store:list(ToLmdbStore2, G))], hd(Paths).
<<"-UNEqPQpf9GS-rI2lMblUgHFaWOTbEnC_sCV00DY6WE/name">>
9> SimplePaths = [P || P <- Paths, hb_store:type(ToLmdbStore2, P) == simple], length(SimplePaths).
1406
10> [SP || SP <- SimplePaths, hb_store:read(ToLmdbStore2, SP) =/= hb_store:read(FromLmdbStore, SP)].
[]
11> ProcAssignments = ~"assignments/yegjj-WALfaZLP2sqSdpmWJu6aLk1C2pYiPBUv2Ns2s".
<<"assignments/yegjj-WALfaZLP2sqSdpmWJu6aLk1C2pYiPBUv2Ns2s">>
12> Slots = [<<ProcAssignments/binary, "/", S/binary>> || S <- element(2, hb_store:list(ToLmdbStore2, ProcAssignments))], length(Slots).
6
13> SlotListings = [<<S/binary, "/", L/binary>> || S <- Slots, L <- element(2, hb_store:list(ToLmdbStore2, S))], hd(SlotListings).
<<"assignments/yegjj-WALfaZLP2sqSdpmWJu6aLk1C2pYiPBUv2Ns2s/0/cursor">>
14> [SL || SL <- SlotListings, hb_store:read(ToLmdbStore2, SL) =/= hb_store:read(FromLmdbStore, SL)].
[]

@jyeshe jyeshe marked this pull request as ready for review September 1, 2025 12:28
@jyeshe jyeshe requested a review from twilson63 September 1, 2025 12:28
@jyeshe jyeshe changed the title eFor139/list groups feat: list root groups of an LMDB store (FOR-139) Sep 1, 2025
@jyeshe jyeshe force-pushed the for139/list-groups branch 2 times, most recently from 84faa1f to df1611f Compare September 1, 2025 12:35
list(Opts, Path) ->
case file:list_dir(add_prefix(Opts, Path)) of
{ok, Files} -> {ok, lists:map(fun hb_util:bin/1, Files)};
{ok, Files} ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the need for this arrises more from a default config issue than the code? hb_opts is currently defining cache-mainnet/lmdb as the default, then a fallback FS store at cache-mainnet IIRC. Better to change that config (FS store => cache-mainnet/fs?) than to add code here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

-spec list(map(), binary()) -> {ok, [binary()]} | {error, term()}.
list(Opts, <<"/">>) ->
#{ <<"db">> := DBInstance } = find_env(Opts),
case elmdb:match_prefix(DBInstance, <<"group:">>) of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to change the LMDB Rust side from :list to match_prefix? When I first read this I assumed we meant matching the value with a prefix, which could be much more expensive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @samcamwilliams , the reason for the match_prefix and the new KV with group: prefix on the key is because we need a way to index the groups to prevent iterating the whole database to get the groups.
It follows this approach in order to satisfy the requirement to have a generic sync based on hb_store behaviour.
If we could have more specialized implementations we could adopt another strategy like iterating all the keys from source store and copying them to the destination store.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed match_prefix in favor of the iterator/fold

make_group(Opts, GroupName) when is_map(Opts), is_binary(GroupName) ->
write(Opts, GroupName, <<"group">>);
write(Opts, GroupName, <<"group">>),
write(Opts, <<"group:", GroupName/binary>>, <<"">>);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make another new key? Is there a way around this? The line 442 is already a bit of a nuisance (the existence of ID/keys in the DB implies that ID is a group), but is passable. Adding 2 extra keys to the DB per group will add complexity long-term, I think.

Copy link
Author

@jyeshe jyeshe Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left it to allow a single call for hb_store:type on groups. But as we've just discussed, we intend to apply a global iterator approach. cc @twilson63

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed once we can get the next process with the iterator

@samcamwilliams
Copy link
Collaborator

Left some comments!

@jyeshe jyeshe changed the title feat: list root groups of an LMDB store (FOR-139) feat: iterate entries to sync from LMDB store (FOR-139) Sep 2, 2025
@jyeshe jyeshe force-pushed the for139/list-groups branch 5 times, most recently from b1aed7e to 9f12523 Compare September 2, 2025 13:52
@jyeshe jyeshe force-pushed the for139/list-groups branch from 9ef8ab2 to 5683f90 Compare September 9, 2025 19:22
@jyeshe jyeshe force-pushed the for139/list-groups branch 2 times, most recently from 748b30d to 6013e0c Compare September 30, 2025 17:58
With it, iterate_start and iterate_cont are no more exported
@jyeshe jyeshe force-pushed the for139/list-groups branch from 6013e0c to da05265 Compare October 1, 2025 14:10
@jyeshe jyeshe merged commit 22dc07a into for139/store-sync Oct 6, 2025
@jyeshe jyeshe deleted the for139/list-groups branch October 6, 2025 11:23
@jyeshe
Copy link
Author

jyeshe commented Oct 6, 2025

Left some comments!

@samcamwilliams This gonna be helpful to merge hydration results from different push machines and to cleanup/reset certain processes. I have addressed all the comments and merged to the base branch for139/store-sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants