Performance

Warehouse-side optimizations dbt-nexus applies to the canonical core models — BigQuery partitioning, clustering, and informational PK/FK constraints — and how to measure their impact.

dbt-nexus's substrate is join-heavy by design. Every artifact query typically stitches nexus_eventsnexus_event_dimensionsnexus_event_measurementsnexus_entity_participantsnexus_entitiesnexus_relationships, often 4–6 joins deep. This trades raw query speed for schema flexibility (any source can emit any dimension without DDL coordination) and write-time decoupling.

The package's job is to recover as much of that lost speed as possible at build time — pushing the canonical core models into shapes the warehouse can prune, hash-join, and reason about cheaply — without giving up the flexibility.

Pages

Shipped — build-time optimizations on the package's core models:

  • BigQuery optimizations — Optional partition + cluster + informational PK/FK on the six core models, controlled by a single nexus.warehouse_optimization.enabled toggle. Default on for BigQuery, no-op on Snowflake.
  • Measuring impact — How to A/B query cost with dryRun, read the BigQuery query plan, and toggle the feature off for a controlled comparison.

Architecture — accelerating the read path above the substrate:

  • Precompute architecture — The precompute ladder (at what grain to materialize), core & canonical queries as the unit, the ~6 grain tables the sparse join graph collapses to, and edges vs. paths for relationships.
  • Caching & row-scoping — How the cross-user result cache works under a shared service account, how it composes with materialization across the refresh cycle, and how row-level scoping fragments it — with the materialize-unscoped pattern.

TL;DR

If you're on BigQuery and the feature flag is at its default, you already have:

  1. Monthly partition prune on occurred_at-filtered queries against nexus_events, nexus_event_dimensions, nexus_event_measurements, and nexus_entity_participants. Typically ~10× scan-byte reduction on a 90-day window.
  2. Block prune via clustering on the columns artifact queries actually filter (event_name, source, entity_type, relationship_type, event_id).
  3. Join elimination via BigQuery informational PK/FK constraints — a LEFT JOIN nexus_event_dimensions whose dim columns aren't selected scans only the left side. BigQuery drops the join.

Together these compound: a date-windowed dashboard query that joins the substrate but selects narrowly often goes from "scan everything" to "scan one month of one table".

Scope of this section

The two build-time pages make the existing joins cheaper; the two architecture pages are about not running them at read time at all. Between them they cover query shape and materialization strategy. What remains genuinely out of scope:

  • Incremental rebuild mechanics. Whether to materialize is covered in Precompute architecture; the mechanics of cheap incremental refresh (partition write-truncate, late-arriving data, identity-resolution recompute) are a separate topic.
  • Snowflake build-time tuning. The toggle defaults off there because Snowflake clustering is billed background work, not free — opt in deliberately if you've measured a benefit. The read-path architecture is warehouse-agnostic; only the build-time partition/cluster specifics are BigQuery-shaped.
  • Diagnosing an un-prunable query. A query that doesn't filter by occurred_at can't partition-prune; see Measuring impact to tell that apart from a build-time miss.