Package Overview

High-level overview of the dbt-nexus package for AI assistants.

Mission

The dbt-nexus package provides a standardized, source-agnostic dbt framework that transforms scattered customer data into unified, operationally useful views of people, companies, and events.

Core Purpose

  • Unify customer data from multiple sources (Gmail, Stripe, Shopify, etc.)
  • Resolve identities across systems using recursive CTE-based deduplication
  • Track state changes over time with timeline-based state management
  • Enable operational use of data for sales, support, and AI tools

Primary Entities

Nexus entities are any business object that needs state tracking, relationships, or entity resolution. There are two classes:

ER Entities (Entity Resolution)

Entities that exist across multiple sources and need identity merging:

  • Persons: Identifiers like email, phone, user_id; traits like name
  • Groups: Identifiers like domain, company_id; traits like company name

Non-ER Entities (Registered)

Entities from a single source that need state tracking or relationships but not identity merging:

  • Subscriptions, Contracts, Projects, Tasks: Registered via register_entities() macro with entity_resolution: false
  • Participate in events, relationships, and states like any other entity

Events

Timestamped actions/occurrences that:

  • Generate identifiers and traits
  • Trigger state changes
  • Create relationship data

States

Timeline-based tracking of entity conditions in nexus_entity_states (SCD2):

  • Dimensions: Categorical states (e.g., lifecycle status)
  • Measurements: Numeric values (e.g., MRR, contract value)
  • Delta columns: Precomputed differences for efficient time-series queries
  • Timeline: valid_from, valid_to, is_current

Architecture Layers

  1. Source Adapters: Transform source data into standardized formats
  2. Event Log: Core models for events, identifiers, traits
  3. Entity Resolution: Deduplication logic producing resolved entities
  4. State Management: Timeline tracking with derived states
  5. Final Tables: Production-ready resolved entities

Sources should follow a four-layer architecture pattern:

  1. Base Layer: Raw SELECT * from source tables
  2. Normalized Layer: Clean, joined business entities with explicit field selection
  3. Intermediate Layer: Event-type specific formatting using Nexus macros
  4. Unioned Layer: Combined models using dbt_utils.union_relations()

This pattern ensures data quality, maintainability, and scalability while providing clear separation of concerns.

Key Benefits

  • Operational Data: Beyond dashboards - data that drives actions
  • Source Agnostic: Works with any data source following naming conventions
  • Entity Resolution: Automatic deduplication across systems
  • State Tracking: Timeline-based state management
  • AI Ready: Structured data perfect for AI/ML applications

Database Support

  • Primary: Snowflake, BigQuery (fully tested and optimized)
  • Secondary: PostgreSQL, Redshift, Databricks
  • Recursive CTEs: Optimized for each supported database

Demo Data

Comprehensive demo data includes:

  • Gmail messages with support tickets and billing
  • Google Calendar events and meetings
  • Stripe billing and payment records
  • Shopify shop information and events

Real-World Applications

  • Customer Support: Complete context in one view
  • Sales Teams: Full customer timeline for better conversion
  • AI Integration: Structured data for AI tools
  • Marketing: Up-to-date customer lists and segmentation
  • Operations: Automated notifications and workflows