Package Overview
High-level overview of the dbt-nexus package for AI assistants.
Mission
The dbt-nexus package provides a standardized, source-agnostic dbt framework that transforms scattered customer data into unified, operationally useful views of people, companies, and events.
Core Purpose
- Unify customer data from multiple sources (Gmail, Stripe, Shopify, etc.)
- Resolve identities across systems using recursive CTE-based deduplication
- Track state changes over time with timeline-based state management
- Enable operational use of data for sales, support, and AI tools
Primary Entities
Nexus entities are any business object that needs state tracking, relationships, or entity resolution. There are two classes:
ER Entities (Entity Resolution)
Entities that exist across multiple sources and need identity merging:
- Persons: Identifiers like email, phone, user_id; traits like name
- Groups: Identifiers like domain, company_id; traits like company name
Non-ER Entities (Registered)
Entities from a single source that need state tracking or relationships but not identity merging:
- Subscriptions, Contracts, Projects, Tasks: Registered via
register_entities()macro withentity_resolution: false - Participate in events, relationships, and states like any other entity
Events
Timestamped actions/occurrences that:
- Generate identifiers and traits
- Trigger state changes
- Create relationship data
States
Timeline-based tracking of entity conditions in nexus_entity_states (SCD2):
- Dimensions: Categorical states (e.g., lifecycle status)
- Measurements: Numeric values (e.g., MRR, contract value)
- Delta columns: Precomputed differences for efficient time-series queries
- Timeline: valid_from, valid_to, is_current
Architecture Layers
- Source Adapters: Transform source data into standardized formats
- Event Log: Core models for events, identifiers, traits
- Entity Resolution: Deduplication logic producing resolved entities
- State Management: Timeline tracking with derived states
- Final Tables: Production-ready resolved entities
Recommended Source Structure
Sources should follow a four-layer architecture pattern:
- Base Layer: Raw
SELECT *from source tables - Normalized Layer: Clean, joined business entities with explicit field selection
- Intermediate Layer: Event-type specific formatting using Nexus macros
- Unioned Layer: Combined models using
dbt_utils.union_relations()
This pattern ensures data quality, maintainability, and scalability while providing clear separation of concerns.
Key Benefits
- Operational Data: Beyond dashboards - data that drives actions
- Source Agnostic: Works with any data source following naming conventions
- Entity Resolution: Automatic deduplication across systems
- State Tracking: Timeline-based state management
- AI Ready: Structured data perfect for AI/ML applications
Database Support
- Primary: Snowflake, BigQuery (fully tested and optimized)
- Secondary: PostgreSQL, Redshift, Databricks
- Recursive CTEs: Optimized for each supported database
Demo Data
Comprehensive demo data includes:
- Gmail messages with support tickets and billing
- Google Calendar events and meetings
- Stripe billing and payment records
- Shopify shop information and events
Real-World Applications
- Customer Support: Complete context in one view
- Sales Teams: Full customer timeline for better conversion
- AI Integration: Structured data for AI tools
- Marketing: Up-to-date customer lists and segmentation
- Operations: Automated notifications and workflows