dbt-nexus LLM Context Pack

Compact briefing for LLMs that need to answer questions about the dbt-nexus package. Essential context for AI assistants working with entity resolution and event tracking.

Mission

The dbt-nexus package provides a way of structuring all company data in your data warehouse so it's operationally useful, not just good for dashboards. It's designed to help close sales, speed up customer support, and reduce churn by creating complete customer timelines from any data source.

Specifically, it's a standardized, source-agnostic dbt framework that lets data engineers quickly merge and organize any data source into a combined view of people, companies, and events. This enables organizations to consolidate scattered customer data (Gmail, Stripe, Shopify, etc.) into unified timelines that support teams, sales teams, and AI tools can actually use operationally.

Core Concepts

Primary Concepts

  • Entities: Unified table for all entities, distinguished by entity_type. Two classes: ER entities (persons, groups) that go through identity resolution, and non-ER entities (subscriptions, contracts, projects) that register directly via register_entities() with entity_resolution: false.
  • Events: Timestamped actions/occurrences that generate identifiers, traits, measurements, and state changes
  • Relationships: Connections between any two entities (e.g., person belongs to group, company owns subscription), distinguished by relationship_type
  • Measurements: Quantitative observations extracted from events (e.g., revenue, hours), following the EAV facet pattern
  • Entity States: SCD2 timeline in nexus_entity_states with dimension columns (STRING, e.g., lifecycle status) and measurement columns (NUMERIC, e.g., MRR), plus precomputed _delta columns for time-series queries

Key Processes

  • Entity Resolution: Recursive CTE-based deduplication using configurable matching rules
  • State Management: Timeline-based state tracking with derived state capabilities
  • Event Processing: Standardized event logging with identifier and trait extraction
  • Source Integration: Adapter pattern for connecting any data source

Architecture Layers

  1. Source Adapters: Transform source data into standardized formats
  2. Event Log: Core models for events, identifiers, traits, measurements (nexus_events, nexus_entity_identifiers, nexus_entity_traits, nexus_event_measurements, nexus_relationship_declarations)
  3. Entity Resolution: Deduplication logic producing resolved entities (nexus_resolved_person_identifiers, nexus_resolved_group_identifiers, nexus_resolved_entity_traits, nexus_resolved_relationship_declarations)
  4. State Management: Timeline tracking with derived states (nexus_states)
  5. Final Tables: Production-ready resolved entities and relationships (nexus_entities, nexus_relationships, nexus_entity_participants)

Demo Data

The package includes comprehensive demo data for exploration and testing:

Demo Data Sources

  • Gadget Shopify App Data: Shopify shop information from custom Shopify app built in Gadget
  • Gmail Messages: Email records with support tickets, billing communications
  • Google Calendar: Calendar events with meetings and appointments
  • Stripe Data: Billing and payment records with subscriptions

Demo Data Usage

  • Location: dbt_packages/nexus/ directory
  • Schema: Compiles to nexus_demo_data schema
  • Running: cd dbt_packages/nexus && dbt build
  • Configuration: Requires demo-data: +schema: demo_data in consumer dbt_project.yml

Demo Data Value

  • Complete working example of the dbt-nexus data model
  • Multi-source customer journey scenarios
  • Entity resolution examples across sources
  • Realistic event timelines and state management

Canonical Entry Points

Key Models

  • Event Log: nexus_events, nexus_entity_identifiers, nexus_entity_traits, nexus_event_measurements, nexus_relationship_declarations
  • Entity Resolution: nexus_resolved_person_identifiers, nexus_resolved_group_identifiers, nexus_resolved_entity_traits, nexus_resolved_relationship_declarations
  • Final Tables: nexus_entities, nexus_relationships, nexus_entity_participants, nexus_entity_states
  • States: nexus_states (union of all state models), nexus_entity_states (SCD2 pivoted output with dimensions, measurements, and deltas)

Essential Macros

  • Entity Resolution: resolve_identifiers(), resolve_traits(), create_edges()
  • Event Processing: process_identifiers(), process_traits(), event_filter()
  • State Management: derived_state(), common_state_fields()
  • Utilities: unpivot_identifiers(), pivot_identifiers(), get_first_or_last_row(), finalize_entity()

Critical Configuration

  • nexus_max_recursion: Controls recursive CTE depth for identity resolution (default: 5)
  • sources: List defining which source systems provide which entity types
  • nexus model configs: Schema, materialization, and tag settings

Source Integration Pattern

Four-Layer Architecture

Sources should follow a four-layer architecture pattern for optimal organization:

  1. Base Layer: Raw SELECT * from source tables (e.g., base_{source}_{table})
  2. Normalized Layer: Clean, joined business entities (e.g., {source}_{entity})
  3. Intermediate Layer: Event-type specific formatting using Nexus macros
  4. Unioned Layer: Combined models using dbt_utils.union_relations()

Model Naming Convention

Sources must provide models following naming convention {source_name}_{entity_type}_{data_type}:

  • Events: {source}_events
  • Identifiers: {source}_person_identifiers, {source}_group_identifiers
  • Traits: {source}_person_traits, {source}_group_traits
  • Relationships: {source}_relationship_declarations
models/sources/{source_name}/
├── base/
│   ├── base_{source}_table1.sql
│   └── base_{source}_table2.sql
├── normalized/
│   ├── {source}_orders.sql
│   └── {source}_customers.sql
├── intermediate/
│   ├── {source}_order_events.sql
│   ├── {source}_order_person_identifiers.sql
│   └── {source}_order_person_traits.sql
└── {source}_events.sql

State Management

States follow format {namespace}_{subject}[_{qualifier}] (e.g., subscription_lifecycle, sliderule_app_installation). State models produce two categories:

  • Dimensions (state_category = 'dimension'): Categorical state values
  • Measurements (state_category = 'measurement'): Numeric values

Both feed into nexus_entity_states — an SCD2 table where each dimension and measurement becomes its own column. Measurement columns get precomputed _delta columns (e.g., mrr_amount_delta) for efficient time-series queries. The first delta equals the initial value (change from 0 to initial amount).

Time-series query pattern: Use opening state + cumulative deltas instead of date-spine joins. See the MRR and Time-Series Query Patterns documentation.

Derived states combine multiple base states using timeline merging logic.

Gotchas & Important Notes

Database Compatibility

  • Primary support: Snowflake and BigQuery (both fully tested and optimized)
  • Secondary: Postgres, Redshift, Databricks
  • Database-specific optimizations available for both Snowflake and BigQuery
  • Recursive CTEs behave differently across warehouses

Performance Considerations

  • Recursive entity resolution can be expensive; tune nexus_max_recursion carefully
  • Incremental models require careful handling of late-arriving data
  • Large identity graphs may need partitioning strategies

Common Pitfalls

  • Source models must exactly match expected schema (column names, types)
  • Entity resolution assumes transitivity (A=B, B=C → A=C)
  • State models require manual addition to nexus_states union
  • Event filtering depends on proper _ingested_at timestamps

Incremental Model Behavior

  • Event log models use _ingested_at for incremental filtering
  • Entity resolution models may need full refresh when logic changes
  • State models track changes over time, not point-in-time snapshots

Quick Reference

Common Tasks

  • Explore demo data: cd dbt_packages/nexus && dbt build to run demo data
  • Add new source: Define in sources var, create {source}_{entity}_{type} models
  • Create custom state: Make individual state model, add to nexus_states union
  • Debug entity resolution: Check nexus_entity_identifiers_edges for edge creation
  • Performance tuning: Adjust nexus_max_recursion, review incremental strategies

Troubleshooting

  • Missing identities: Verify source model naming and schema compliance
  • Recursive CTE errors: Check nexus_max_recursion setting and data quality
  • State timeline gaps: Ensure events have proper occurred_at timestamps
  • Incremental issues: Review _ingested_at values and watermark logic
  • Blog Post: Data Beyond Dashboards
  • Documentation: /docs/index.md
  • Demo Data Guide: /docs/tutorials/demo-data.md
  • Use Cases: /docs/explanations/use-cases.md
  • Model Reference: /docs/reference/models/
  • Macro Reference: /docs/reference/macros/
  • State Naming Guide: /models/nexus-models/states/STATES.md
  • Derived State Macro: /macros/states/DERIVED_STATE_MACRO.md
  • Configuration Guide: /docs/getting-started/configuration.md
  • Architecture Deep Dive: /docs/explanations/architecture.md

Real-World Applications (SlideRule Analytics)

Operational Use Cases

  • Timeline Apps: Complete customer context for support/sales teams
  • Daily Updates: Automated summaries of key business events
  • Email Marketing: Up-to-date customer lists and segmentation
  • Abandoned Setup Notifications: Automated onboarding outreach
  • AI Integration: Complete customer context for AI tools
  • Metrics & Dashboards: Consistent business metrics across all tools

Business Value

  • Faster customer support (complete context in one view)
  • Higher sales conversion (full customer timeline)
  • Reduced churn (proactive engagement based on events)
  • Operational flexibility (add/change tools without rebuilding integrations)