-
Notifications
You must be signed in to change notification settings - Fork 946
Avoid unnecessary CMS DB hits for pages that will 404 #15628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15628 +/- ##
==========================================
+ Coverage 79.29% 79.30% +0.01%
==========================================
Files 159 161 +2
Lines 8347 8443 +96
==========================================
+ Hits 6619 6696 +77
- Misses 1728 1747 +19 ☔ View full report in Codecov by Sentry. |
dc54a7d to
dbd3748
Compare
8be921d to
c7873f7
Compare
|
Talking with Pmac, we're gonna try using postgres as a DB-backed networked cache, possibly/probably with a locmem cache on each pod. |
dbd3748 to
b42a955
Compare
845f97e to
df5b15d
Compare
|
So the whole db-based cache thing didn't work out, because getting from the cache also triggers the invalidation check, which then can result in an error when called on a readonly postgres DB (which is the situation for the web pods). Instead, new-new plan:
|
2af4211 to
eabc17c
Compare
Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms
…CMS for a page ...because if the page isn't in the lookahead, we can avoid touching the DB just to ultimately return a 404
…cache In order to balance the need for a distributed cache with the speed of a local-memory cache, we've come up with a couple of helper functions that wrap the following behaviour: * If it's in the local-memory cache, return that immediately. * If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through * If the local memory cache and DB cache both miss, just return the default value for the helper function * Set the value in the local memory cache and DB cache at (almost) the same time * If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure. IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.
eabc17c to
edf49d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevejalim This makes sense to me. The only situation I can think of right now that could break this is if we ever decide to use the RoutablePageMixin: https://docs.wagtail.org/en/stable/reference/contrib/routablepage.html#routable-page-mixin
Maybe we could check if that app is installed and, if a page inherits from that class, we never raise a 404 for paths that start with that page's URL.
Yeah, I remember discussing this with someone else and we agreed routable pages would be a problem with this approach - but because we (currently) don't use them, it wasn't a big concern.
I'm thinking along similar lines - there must be a way to introspect the page and get all the routes that defined via the |
This changeset adds a lookahead check before we call
wagtail.views.serveto know whether it's worth asking the CMS to serve a page.The idea is that if the page isn't in the lookahead's data source, we can avoid touching the DB just to ultimately return a 404 earlier.
This will save us DB load, particularly when we get drive-by scans that pepper the site with irrelevant URLs.
Significant changes and points to review
Please be sceptical about this, particularly around cache invalidation/cache updating - e.g. when a page is published, unpublished or moved in the page tree.
Important: this change will need to go hand in hand with an infra update that does give us a networked cache (Redis or Memcached) - if we stick with LocMemCache, then while the pods can and will build their own lookahead in local cache, that cache will not be invalidated when a page is published, unpublished or moved.Issue / Bugzilla link
#15505 #14742
Testing
Details to come