Avoid unnecessary CMS DB hits for pages that will 404 #15628

stevejalim · 2024-12-03T13:28:15Z

This changeset adds a lookahead check before we call wagtail.views.serve to know whether it's worth asking the CMS to serve a page.

The idea is that if the page isn't in the lookahead's data source, we can avoid touching the DB just to ultimately return a 404 earlier.

This will save us DB load, particularly when we get drive-by scans that pepper the site with irrelevant URLs.

I used an AI to write some of this code - I bounced some ideas around with ChatGPT, and reworked some of the code suggestions to fit with Bedrock

Significant changes and points to review

Please be sceptical about this, particularly around cache invalidation/cache updating - e.g. when a page is published, unpublished or moved in the page tree.

Important: this change will need to go hand in hand with an infra update that does give us a networked cache (Redis or Memcached) - if we stick with LocMemCache, then while the pods can and will build their own lookahead in local cache, that cache will not be invalidated when a page is published, unpublished or moved.

Issue / Bugzilla link

#15505 #14742

Testing

Details to come

codecov · 2024-12-03T13:36:12Z

Codecov Report

Attention: Patch coverage is 80.41237% with 19 lines in your changes missing coverage. Please review.

Project coverage is 79.30%. Comparing base (aa2f6a0) to head (2af4211).

Files with missing lines	Patch %	Lines
bedrock/cms/signal_handlers.py	51.35%	18 Missing ⚠️
bedrock/cms/apps.py	87.50%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15628      +/-   ##
==========================================
+ Coverage   79.29%   79.30%   +0.01%     
==========================================
  Files         159      161       +2     
  Lines        8347     8443      +96     
==========================================
+ Hits         6619     6696      +77     
- Misses       1728     1747      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stevejalim · 2025-01-10T16:57:29Z

Talking with Pmac, we're gonna try using postgres as a DB-backed networked cache, possibly/probably with a locmem cache on each pod.

stevejalim · 2025-01-27T11:04:04Z

So the whole db-based cache thing didn't work out, because getting from the cache also triggers the invalidation check, which then can result in an error when called on a readonly postgres DB (which is the situation for the web pods).

Instead, new-new plan:

simple new model that holds the latest tree info in the the DB - available to web and CMS pods.
web pods will only ever read from the DB table and cache the info in their locmem caches, so no networked-db-cache-invalidation pain should happen.
we'll retain the signal-based approach to updating the tree info when CMS changes occur, and put it in the DB table instead.
we'll add a cron job to keep that tree up to date, just in case we have strangeness around the signals

Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms

…ngful a name

…CMS for a page ...because if the page isn't in the lookahead, we can avoid touching the DB just to ultimately return a 404

…cache In order to balance the need for a distributed cache with the speed of a local-memory cache, we've come up with a couple of helper functions that wrap the following behaviour: * If it's in the local-memory cache, return that immediately. * If it's not, fall back to the DB cache, and if the key exists there, return that, cacheing it in local memory again on the way through * If the local memory cache and DB cache both miss, just return the default value for the helper function * Set the value in the local memory cache and DB cache at (almost) the same time * If the DB cache is not reachable (eg the DB is a read-only replica), log this loudly, as it's a sign the helper has not been used appropriately, but still set the local-memory version for now, to prevent total failure. IMPORTANT: before this can be used in production, we need to create the cache table in the database with ./manage.py createcachetable AFTER this code has been deployed. This sounds a bit chicken-and-egg but we hopefully can do it via direct DB connection in the worst case.

maribedran

@stevejalim This makes sense to me. The only situation I can think of right now that could break this is if we ever decide to use the RoutablePageMixin: https://docs.wagtail.org/en/stable/reference/contrib/routablepage.html#routable-page-mixin

Maybe we could check if that app is installed and, if a page inherits from that class, we never raise a 404 for paths that start with that page's URL.

stevejalim · 2025-07-24T09:38:50Z

@stevejalim This makes sense to me. The only situation I can think of right now that could break this is if we ever decide to use the RoutablePageMixin: https://docs.wagtail.org/en/stable/reference/contrib/routablepage.html#routable-page-mixin

Yeah, I remember discussing this with someone else and we agreed routable pages would be a problem with this approach - but because we (currently) don't use them, it wasn't a big concern.

Maybe we could check if that app is installed and, if a page inherits from that class, we never raise a 404 for paths that start with that page's URL.

I'm thinking along similar lines - there must be a way to introspect the page and get all the routes that defined via the @path decorator. I'll have a rummage through the Wagtail source when I get a moment

stevejalim force-pushed the 15505-bedrock-perf-pass branch from dc54a7d to dbd3748 Compare December 13, 2024 10:56

stevejalim force-pushed the 15505-add-precog-behaviour branch from 8be921d to c7873f7 Compare December 13, 2024 10:56

stevejalim force-pushed the 15505-bedrock-perf-pass branch from dbd3748 to b42a955 Compare January 13, 2025 13:18

Base automatically changed from 15505-bedrock-perf-pass to main January 13, 2025 13:30

stevejalim force-pushed the 15505-add-precog-behaviour branch 2 times, most recently from 845f97e to df5b15d Compare January 13, 2025 14:36

stevejalim mentioned this pull request Jan 14, 2025

Add helpers to support a 'hybrid cache' option that uses locmem + DB cache #15859

Merged

1 task

stevejalim force-pushed the 15505-add-precog-behaviour branch 2 times, most recently from 2af4211 to eabc17c Compare January 27, 2025 11:30

stevejalim added Wagtail Development related to our use of Wagtail CMS WMO and FXC Code relevant to both mozilla/bedrock (www.mozilla.org) and mozmeao/springfield (www.firefox.com) Backend Server stuff yo labels Jan 27, 2025

stevejalim added 5 commits July 21, 2025 13:03

Add cacheing to geo.valid_country_code for performance boost

d893bd3

Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms

Rename 'default' cache time to CACHE_TIME_SHORT to make it more meani…

560f8ae

…ngful a name

Add a cache-baced 'lookahead' to know whether it's worth hitting the …

5b161ce

…CMS for a page ...because if the page isn't in the lookahead, we can avoid touching the DB just to ultimately return a 404

REBASE ME - WIP on wrapped cache approach

edf49d4

stevejalim force-pushed the 15505-add-precog-behaviour branch from eabc17c to edf49d4 Compare July 21, 2025 12:03

maribedran reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid unnecessary CMS DB hits for pages that will 404 #15628

Avoid unnecessary CMS DB hits for pages that will 404 #15628

Uh oh!

stevejalim commented Dec 3, 2024 •

edited

Loading

Uh oh!

codecov bot commented Dec 3, 2024 •

edited

Loading

Uh oh!

stevejalim commented Jan 10, 2025 •

edited

Loading

Uh oh!

stevejalim commented Jan 27, 2025

Uh oh!

maribedran left a comment

Uh oh!

stevejalim commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Avoid unnecessary CMS DB hits for pages that will 404 #15628

Are you sure you want to change the base?

Avoid unnecessary CMS DB hits for pages that will 404 #15628

Uh oh!

Conversation

stevejalim commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Significant changes and points to review

Issue / Bugzilla link

Testing

Uh oh!

codecov bot commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stevejalim commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevejalim commented Jan 27, 2025

Uh oh!

maribedran left a comment

Choose a reason for hiding this comment

Uh oh!

stevejalim commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevejalim commented Dec 3, 2024 •

edited

Loading

codecov bot commented Dec 3, 2024 •

edited

Loading

stevejalim commented Jan 10, 2025 •

edited

Loading