Performance
Warehouse-side optimizations dbt-nexus applies to the canonical core models — BigQuery partitioning, clustering, and informational PK/FK constraints — and how to measure their impact.
dbt-nexus's substrate is join-heavy by design. Every artifact query
typically stitches nexus_events ⋈ nexus_event_dimensions ⋈
nexus_event_measurements ⋈ nexus_entity_participants ⋈
nexus_entities ⋈ nexus_relationships, often 4–6 joins deep. This
trades raw query speed for schema flexibility (any source can emit
any dimension without DDL coordination) and write-time decoupling.
The package's job is to recover as much of that lost speed as possible at build time — pushing the canonical core models into shapes the warehouse can prune, hash-join, and reason about cheaply — without giving up the flexibility.
Pages
Shipped — build-time optimizations on the package's core models:
- BigQuery optimizations — Optional
partition + cluster + informational PK/FK on the six core models,
controlled by a single
nexus.warehouse_optimization.enabledtoggle. Default on for BigQuery, no-op on Snowflake. - Measuring impact — How to A/B query
cost with
dryRun, read the BigQuery query plan, and toggle the feature off for a controlled comparison.
Architecture — accelerating the read path above the substrate:
- Precompute architecture — The precompute ladder (at what grain to materialize), core & canonical queries as the unit, the ~6 grain tables the sparse join graph collapses to, and edges vs. paths for relationships.
- Caching & row-scoping — How the cross-user result cache works under a shared service account, how it composes with materialization across the refresh cycle, and how row-level scoping fragments it — with the materialize-unscoped pattern.
TL;DR
If you're on BigQuery and the feature flag is at its default, you already have:
- Monthly partition prune on
occurred_at-filtered queries againstnexus_events,nexus_event_dimensions,nexus_event_measurements, andnexus_entity_participants. Typically ~10× scan-byte reduction on a 90-day window. - Block prune via clustering on the columns artifact queries
actually filter (
event_name,source,entity_type,relationship_type,event_id). - Join elimination via BigQuery informational PK/FK constraints
— a
LEFT JOIN nexus_event_dimensionswhose dim columns aren't selected scans only the left side. BigQuery drops the join.
Together these compound: a date-windowed dashboard query that joins the substrate but selects narrowly often goes from "scan everything" to "scan one month of one table".
Scope of this section
The two build-time pages make the existing joins cheaper; the two architecture pages are about not running them at read time at all. Between them they cover query shape and materialization strategy. What remains genuinely out of scope:
- Incremental rebuild mechanics. Whether to materialize is covered in Precompute architecture; the mechanics of cheap incremental refresh (partition write-truncate, late-arriving data, identity-resolution recompute) are a separate topic.
- Snowflake build-time tuning. The toggle defaults off there because Snowflake clustering is billed background work, not free — opt in deliberately if you've measured a benefit. The read-path architecture is warehouse-agnostic; only the build-time partition/cluster specifics are BigQuery-shaped.
- Diagnosing an un-prunable query. A query that doesn't filter by
occurred_atcan't partition-prune; see Measuring impact to tell that apart from a build-time miss.