Skip to content

fix: shutdown connect on relevant settings changes #1413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

filipecabaco
Copy link
Member

What kind of change does this PR introduce?

shutdown connect on relevant settings changes

Copy link

vercel bot commented Jun 5, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
realtime-demo ⬜️ Ignored (Inspect) Visit Preview Jun 6, 2025 0:18am

@coveralls
Copy link

coveralls commented Jun 5, 2025

Coverage Status

coverage: 83.655% (+0.6%) from 83.027%
when pulling 531b961 on fix/shutdown-connect-on-settings-change
into 42fc73d on main.

@filipecabaco filipecabaco force-pushed the fix/shutdown-connect-on-settings-change branch 2 times, most recently from 8d0bb7a to 9f42738 Compare June 5, 2025 22:49
@filipecabaco filipecabaco force-pushed the fix/shutdown-connect-on-settings-change branch from 9f42738 to 823aa27 Compare June 5, 2025 22:51
Copy link
Member

@edgurgel edgurgel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a couple of comments but they are not blockers

Comment on lines +23 to +25
# Warm cache to avoid Cachex and Ecto.Sandbox ownership issues
Cachex.put!(Cache, {{:get_tenant_by_external_id, 1}, [tenant1.external_id]}, {:cached, tenant1})
Cachex.put!(Cache, {{:get_tenant_by_external_id, 1}, [tenant2.external_id]}, {:cached, tenant2})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's running on async: false this can probably be removed completely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be changed 😂 forgot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason it's breaking with async true as such going to avoid it for now

Copy link
Member

@edgurgel edgurgel Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right that's what I meant. If you are leaving async: false then just remove the Cachex.put calls here. It shouldn't be needed

})
when is_map_key(changes, :jwt_jwks) or is_map_key(changes, :jwt_secret) do
Phoenix.PubSub.broadcast!(Realtime.PubSub, "realtime:operations:" <> external_id, :disconnect)
maybe_invalidate_cache(changeset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a race condition here?

What if the invalidation takes a long time and the DB reconnects before the Tenant cache has been discarded 🤔 This way it might still connect to the database with the old settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue honestly ... we might just need to trust cachex? 😅


defp maybe_invalidate_cache(%Ecto.Changeset{changes: changes, valid?: true, data: %{external_id: external_id}})
when changes != %{} do
Tenants.Cache.invalidate_tenant_cache(external_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be calling distributed_invalidate_tenant_cache ? Otherwise a remote node won't have the new settings when Connect.lookup_or_start_connection is called the next time.

And that's what I mean by race condition:

  • Tell all nodes to invalidate cache (broadcast)
  • Tell Connect to shutdown (broadcast)

There is no real guarantee which one of the two above will be processed first on each node ^

Then at some point Connect.lookup_or_start_connection will be called and it's not 100% clear that the new settings will be available if this makes sense.

🤔

@edgurgel edgurgel self-requested a review June 8, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants