← all writing

Nº02 // WRITING

Redis cache gotchas

I run a single Redis daemon on my prod box (a 15 GB Virtualmin host: nginx, PHP-FPM 8.3, MariaDB 10.11) that serves the object cache for half a dozen WordPress sites at once. Redis went in during a perf-tuning pass on 2026-05-19, alongside flipping PHP-FPM from dynamic to ondemand, a 3 GB InnoDB buffer pool, and an OPcache bump to memory_consumption=256. The honest punchline of the whole exercise: for the one query that actually hurt, Redis did nothing. Here are the traps I hit, all of them real.

One Redis, many tenants: use logical DBs, not key prefixes

Redis ships with 16 logical databases (0–15), each addressable with SELECT n or -n n on the CLI. Instead of cramming every site into one keyspace with prefixes, I give each WordPress install its own logical DB via Till Krüss’s Redis Object Cache plugin and one line in wp-config.php:

define( 'WP_REDIS_DATABASE', 2 ); // site-c prod; staging is 4, etc.

The map on this box, which I keep written down because it matters:

SiteWP_REDIS_DATABASE
site-a0
site-b1
site-c (prod)2
site-d3
staging4

The daemon itself is small and defensive: maxmemory 512mb, eviction allkeys-lru. The reason for logical DBs over prefixes is one word: FLUSHDB. With separate DBs, clearing one site’s cache is redis-cli -n 4 FLUSHDB and it physically cannot touch another tenant. A shared keyspace plus a botched FLUSHALL wipes everyone. The map turns a footgun into a scalpel.

The 301 loop: a correct database is not a correct site

This is the one that cost me an evening. I cloned prod into staging.example.com — same server, separate Virtualmin domain, Redis DB 4. The procedure was the obvious one:

rsync -a /home/example/public_html/ /home/staging/public_html/
mysqldump --no-tablespaces example_wordpress | mysql staging
wp config set DB_NAME staging
wp config set WP_REDIS_DATABASE 4 --raw
wp search-replace https://example.com https://staging.example.com --skip-columns=guid
#   -> 127 replacements

Then I opened https://staging.example.com and got instantly 301-bounced to production. Every single request. The DB was perfect — the siteurl and home rows read https://staging.example.com, I checked them by hand.

The problem: wp search-replace updates the database but never touches Redis. WordPress caches the whole alloptions bundle in the object cache, so home_url() and site_url() in PHP were still returning the cached https://example.com. Then redirect_canonical() compared the request host (staging) against the cached canonical home (prod), decided they didn’t match, and fired a 301 to prod on every hit. A redirect loop caused purely by a stale object cache, on top of a database that was already 100% correct. The classic “I fixed it in the DB, why is it still wrong” trap.

The fix is one line:

redis-cli -n 4 FLUSHDB

The rule I wrote down afterward, verbatim: always FLUSHDB the cloned site’s Redis DB after a search-replace.

Out-of-band writes bypass invalidation — every time

The deeper lesson generalizes well beyond cloning. WordPress only invalidates the object cache when a write goes through its own path — update_option(), wp_cache_set(), the hooks those fire. Anything that mutates the database out of band leaves Redis holding stale data with zero warning:

None of those fire the cache hooks, because they speak SQL directly. The cure is always the same: a scoped flush of that site’s DB right after. Which is exactly why FLUSHALL should never be in your normal vocabulary on a shared box — redis-cli FLUSHALL drops all 16 logical DBs, meaning site-a (0), site-b (1), prod (2), site-d (3) and staging (4) all get nuked at once. On a single-tenant server people reflexively type FLUSHALL; on this one it’s an outage for five sites. Always -n N FLUSHDB.

rsync clobbers wp-config — your cache silently re-points

Refreshing staging from prod means rsyncing files from /home/example/public_html/, and that rsync overwrites wp-config.php with prod’s copy. Prod’s copy has WP_REDIS_DATABASE = 2. So the instant the rsync finishes, staging is pointed at prod’s Redis DB 2 (and prod’s DB creds) until I re-fix it. If I forget, staging reads and writes into the production keyspace — cross-tenant cache pollution that is subtle and genuinely nasty to debug, because nothing errors; it just serves the wrong site’s data.

The full refresh checklist exists precisely because every env-specific value gets clobbered by the source — files by rsync, DB options by the dump:

rsync -a /home/example/public_html/ /home/staging/public_html/
mysqldump example_wordpress | mysql staging
wp config set DB_NAME staging
wp config set DB_PASSWORD '...'
wp config set WP_REDIS_DATABASE 4 --raw   # <-- clobbered by rsync
wp search-replace https://example.com https://staging.example.com --skip-columns=guid
redis-cli -n 4 FLUSHDB                     # <-- THE flush
wp option update blog_public 0             # <-- clobbered by the DB import

Two different mechanisms clobber two different things: rsync eats WP_REDIS_DATABASE (a file constant), while the mysqldump | mysql import eats blog_public (a DB option). You have to re-apply both.

And remember there are two caches, not one. On every code deploy I flush Redis (data) and OPcache (compiled bytecode) — they’re independent layers and both bite. My deploy log for a feature I shipped on 2026-05-22 literally reads “Redis DB 2 + opcache flushed, front/admin 200.” Redis caches options and queried objects; OPcache (/etc/php/8.3/fpm/conf.d/99-opcache-tuning.ini) caches the .php bytecode that FPM workers already compiled. Flush only one and you’ll debug a “fixed” file that still behaves old, or vice versa. Under FPM the OPcache flush is usually systemctl reload php8.3-fpm, not a redis-cli command — different layer, different mechanism.

When Redis does nothing: the query that ignored the cache

Here’s the one that humbled me. I’d installed Redis on prod and, when a directory-style search page on one of my sites felt sluggish, I profiled it on 2026-05-27 fully confident the object cache would carry me. It provided zero benefit.

The reason is worth internalizing, and it’s true of a whole class of plugins, not any one of them. Redis only accelerates code written to WordPress’s object-cache API — WP_Query, get_option, get_post, transients. But plenty of plugins in the booking / CRM / e-commerce space run their reads as raw $wpdb->get_results() / $wpdb->query() straight at MySQL. Those queries never call wp_cache_get(), so the object cache never sees them, and Redis does precisely nothing for those paths. “Install Redis object cache” is not a blanket speedup; it only helps the code paths that actually ask the cache. If your hot path is hand-rolled SQL, the cache is invisible to it and you have to fix the query itself.

That’s exactly what profiling showed me on this page. It assembled its list in a loop that ran several extra queries per result row, and it scaled dead-linearly: 77 queries for 11 results, 235 for 37 — a textbook N+1, and (per the above) nothing the cache could touch. Two of the joins were hitting an unindexed table, so MySQL fell back to a full scan and a Block Nested Loop, with a GROUP BY forcing a temp table plus filesort on top.

My low-risk fix was indexes — three of them, online DDL, instant on tables this small — covering the foreign-key columns the joins filtered on and the (status, tenant_id) pair the list was scoped by. The worst join went from type = ALL with a Block Nested Loop over hundreds of rows to a clean type = ref index lookup over a handful. Temp table and filesort: gone. Wall time ~62 ms → ~35 ms, roughly 44% faster, from three one-line ALTERs. The win came from the database, not the cache that was supposed to fix slow reads.

Redis does eventually get to help here — but only at a layer I control. My deferred plan caches the assembled result set explicitly in my own code: key = md5 of the normalized request params, TTL 300 s, and crucially skip caching when a date filter is set, because date/availability results are too volatile and need their own narrower cache. WordPress transients route into Redis when an object cache is present, so a set_transient() there lands in Redis — the deliberate way to use it when the layer underneath won’t ask the cache on its own. But first you batch the N+1 with IN(...) sets to drop 235 queries down to ~12–15. Cache at the boundary you own, after you’ve killed the obvious waste — don’t reach for Redis to paper over a query layer that was never going to ask it for help.