Caching & row-scoping

How BigQuery's cross-user result cache actually works under a shared service account, how it composes with materialization, and how row-level permission scoping fragments it — plus the materialize-unscoped pattern.

Two runtime behaviors change how much the precompute architecture buys you in practice: BigQuery's free result cache, and per-user row scoping. They pull in opposite directions, and the resolution is the same in both cases — materialize the shared, expensive work; keep the per-user part cheap.

The cross-user result cache

BigQuery's result cache is technically per-principal: cached results live in the running user's anonymous cache dataset, and a different user re-running identical SQL normally gets a fresh run.

It becomes cross-user in Nexus because every artifact query runs under one per-client service account (the warehouse credential, pooled by warehouseId). To BigQuery, all of a client's end-users are the same principal — so the per-user cache collapses into one cache shared across the entire user base. Second user, same artifact, same day → free instant result, zero bytes scanned.

Three conditions must hold for the hit, and core/canonical queries plus the shipped optimizations satisfy all three:

  1. Byte-identical SQL. A core/canonical query is stable shared text — the same for every user and every component. (The strongest argument for lifting parameters out of the core: a parameterless canonical query is identical for all users all day. Per-user filters and date ranges fork the cache into separate entries.)
  2. Deterministic SQL. This is what the CURRENT_* daily-anchor rewrite buys — without it, CURRENT_TIMESTAMP() differs per run and the cache never hits. That rewrite is the prerequisite for the cross-user benefit, not just the single-user one.
  3. Unchanged tables. Holds between refreshes — so in practice the cache is "shared within a business day, post-build," resetting on each dbt refresh, not a rolling 24h.

The core query is both the expensive part and the most-shared part (same spine for every user, every component), which makes it the highest-value thing to land in cache.

Cache vs. materialization — complementary, not redundant

It's tempting to conclude that materializing a core query makes the cache irrelevant. It doesn't — they cover different parts of the refresh cycle.

A materialized table is just a regular table, so component reads against it are cached exactly like before. What changes is the cost of a miss:

Cache hit Cache miss
Un-materialized instant expensive — full joins, full scan
Materialized instant cheap — scan one small, clustered table

Because a materialized table changes on every refresh (invalidating its cache), the two layers split the work cleanly:

  • Cache covers the warm path — repeat queries between refreshes, free.
  • Materialization covers the cold path — the first query after each refresh (cache empty) and any unique parameter combo, cheap.

So materialization makes the cache low-stakes rather than irrelevant — you stop depending on the hit. In the limit of a tiny serving table (a few MB, fully clustered, sub-second), the gap between free and cheap shrinks to nothing and the cache genuinely stops mattering.

One clean corollary: materializing freezes CURRENT_* at build time (the table is a snapshot as of the last refresh), so the determinism rewrite becomes moot for materialized paths — it stays relevant only for the un-materialized L1 tail. As you climb the materialization ladder, the whole cache + determinism layer fades to a backstop for the queries you haven't promoted yet.

Row-scoping fragments the cache

Permissioned artifacts inject a row filter at a -- SCOPE_FILTER marker in the query: the runner resolves the viewer's allowed resource IDs and replaces the marker with a literal predicate.

-- before: the artifact's core CTE
WHERE household_active = TRUE
  -- SCOPE_FILTER

-- after: per-viewer injection (empty scope → AND FALSE)
WHERE household_active = TRUE
  AND household_id IN ('id1', 'id2', …)   -- this viewer's exact ID set

Because the viewer's IDs are interpolated as literals, the SQL is byte-identical only across viewers with the identical resource-ID set. Two advisors with different books → different text → separate cache entries. The broad "second user gets it free" benefit survives only for:

  • unscoped / shared-scope views (firm-wide dashboards, no marker — everyone's text matches), and
  • same-scope cohorts (viewers who resolve to the same ID set).

By-design, you can't cache-share across users who are supposed to see different rows — there's nothing to share.

Stabilize the ID list

The injected IN (…) list must be deterministically ordered. If the scope resolver returns IDs in unstable order, the same viewer gets different SQL bytes load-to-load and misses their own cache. Sorting the IDs before injection is a near-free fix that restores the same-viewer and same-cohort hits — the highest-value cache action in a permissioned artifact.

The materialize-unscoped pattern

Since the cache can't broadcast across scopes, each viewer's read has to be cheap on its own — which is exactly what materialization gives you. Permissions are an argument for materializing the core, not a complication:

  1. Materialize the core unscoped, and carry the scope key as a column (household_id / advisor_id). The expensive joins run once, shared by everyone.
  2. Move the -- SCOPE_FILTER marker from the core CTE to the serving query. Scope is just another parameter to lift out of the materialization — never materialize per-viewer (that's the per-parameter explosion).
  3. The per-viewer scope becomes a cheap, partition-pruned filter on the shared table.

Security: the marker must be structurally enforced

Materializing changes the blast radius of a missing scope filter from "one over-broad result" to "a standing table of every viewer's data that any unmarked query exposes." And BigQuery-native row-level security cannot back you up here — under the shared service account every end-user is the same principal, so SESSION_USER()-based policies see nothing to filter on.

So enforcement has to be structural at the runner: register the scoped materialized tables and reject any query that references one without the -- SCOPE_FILTER marker. Do not leave "did you remember the scope filter?" to artifact-author discipline once the underlying table is the unscoped superset. (Today, an unmarked query passes through unscoped — a safe default when every artifact builds its own scoped spine, but a leak waiting to happen against a shared unscoped materialization.)

Reference