Architecture Overview

Core tables, key joins, and the design principles behind dbt-nexus.

Nexus transforms raw data from any number of sources into a fixed set of core tables that describe who your entities are, what they did, how they relate, and what changed over time. Every query — whether from an analyst, an LLM, or a semantic layer — works against these tables.

Entities include both resolved identities (persons and groups that go through entity resolution) and registered objects (subscriptions, contracts, projects, etc.) that have their own lifecycle, state, or relationships.


Core Tables

erDiagram
    nexus_events {
        string event_id PK
        timestamp occurred_at
        string event_name
        string event_type
        string source
    }

    nexus_entities {
        string entity_id PK
        string entity_type
        string name "dynamic trait"
        string email "dynamic trait"
        string risk_tier "dynamic trait"
        timestamp first_interaction_at
        timestamp last_interaction_at
    }

    nexus_entity_participants {
        string entity_participant_id PK
        string event_id FK
        string entity_id FK
        string entity_type
        string role
        timestamp occurred_at
    }

    nexus_relationships {
        string relationship_id PK
        string entity_a_id FK
        string entity_b_id FK
        string relationship_type
        string relationship_direction
        boolean is_active
    }

    nexus_entity_states {
        string entity_state_id PK
        string entity_id FK
        timestamp valid_from
        timestamp valid_to
        boolean is_current
        string ___ "dynamic dimension columns"
        numeric ___ "dynamic measurement columns"
        numeric ___ "dynamic delta columns"
    }

    nexus_event_measurements {
        string event_id FK
        timestamp occurred_at
        string source
        float ___ "dynamic measurement columns"
    }

    nexus_event_dimensions {
        string event_id FK
        timestamp occurred_at
        string source
        boolean ___ "dynamic boolean columns"
        string ___ "dynamic string columns"
    }

    nexus_attribution_model_results {
        string attribution_model_result_id PK
        string touchpoint_event_id FK
        string attributed_event_id FK
        string entity_id FK
        string attribution_model_name
        timestamp touchpoint_occurred_at
    }

    nexus_entity_identifiers_to_entity_id {
        string identifier_id PK
        string entity_id FK
        string entity_type
        string identifier_type
        string identifier_value
    }

    nexus_events ||--o{ nexus_entity_participants : "participants"
    nexus_entities ||--o{ nexus_entity_participants : "events"
    nexus_entities ||--o{ nexus_relationships : "entity_a"
    nexus_entities ||--o{ nexus_relationships : "entity_b"
    nexus_entities ||--o{ nexus_entity_states : "state timeline"
    nexus_events ||--o| nexus_event_measurements : "measurements"
    nexus_events ||--o| nexus_event_dimensions : "dimensions"
    nexus_events ||--o{ nexus_attribution_model_results : "attributed"
    nexus_events ||--o{ nexus_attribution_model_results : "touchpoint"
    nexus_entities ||--o{ nexus_attribution_model_results : "entity"
    nexus_entities ||--o{ nexus_entity_identifiers_to_entity_id : "identifiers"

Table Descriptions

Table Purpose Primary Key
nexus_events Every event from every source, unified into a single schema. event_id
nexus_entities All entities with pivoted trait columns: pre-resolution traits from source extraction and computed traits from post-resolution models (churn scores, derived names, external data merges). Includes ER and non-ER entities. All columns discovered at compile time. entity_id
nexus_entity_participants Bridge table linking events to entities. One row per entity-role per event. entity_participant_id
nexus_relationships Entity-to-entity relationships (e.g., person belongs to company). Both entity_a_id and entity_b_id reference nexus_entities. relationship_id
nexus_entity_states Temporal state timeline per entity (SCD2). Dimension columns (STRING) for categorical states, measurement columns (NUMERIC) for quantitative values, plus precomputed _delta columns for efficient time-series queries. entity_state_id
nexus_event_measurements Pivoted quantitative observations per event (revenue, tax, expense, etc.). One row per event. Measurement columns are discovered at compile time. event_id
nexus_event_dimensions Pivoted categorical properties per event (is*revenue_earned, invoice_id, etc.). One row per event. Boolean dimensions (is* prefix) default to FALSE; string dimensions default to NULL. Columns are discovered at compile time. event_id
nexus_attribution_model_results Maps marketing touchpoints to conversion events, with model-specific attribution weights. attribution_model_result_id
nexus_entity_identifiers_to_entity_id Maps every known identifier (all emails, phones, domains, etc.) to the resolved entity_id. Use this to look up entities — the traits on nexus_entities only show the latest value. identifier_id

Dynamic Columns

Four core tables have columns that are generated at dbt compile time by querying the intermediate tables for distinct values:

  • nexus_entities — one column per trait name (e.g., name, email, domain, company_name) from pre-resolution source traits, plus one column per computed trait (e.g., risk_tier, display_name) from post-resolution computed trait models. Both kinds are discovered at compile time and appear as regular columns. See Entities for details.
  • nexus_entity_states — one column per dimension (STRING, e.g., subscription_lifecycle) and per measurement (NUMERIC, e.g., mrr_amount), plus a precomputed _delta column for each measurement (e.g., mrr_amount_delta). Varies by project based on which states are defined.
  • nexus_event_measurements — one column per measurement name (e.g., revenue, tax, expense). Varies by project based on which measurements sources extract.
  • nexus_event_dimensions — one column per dimension name (e.g., is_revenue_earned, invoice_id). is_ prefix columns are BOOLEAN (default FALSE); others are STRING (default NULL). Varies by project based on which dimensions sources extract.

This means adding a new source that extracts a new trait, state, measurement, or dimension automatically produces the corresponding column on the next dbt build — no manual schema changes required.

Metadata Tables

Each core table has a companion metadata table that catalogs what exists in the data:

Metadata Table Describes
nexus_events_metadata Event names, types, sources, custom columns
nexus_entity_traits_metadata Available trait names per entity type
nexus_entity_identifiers_metadata Available identifier types per entity type
nexus_states_metadata State names, possible values, counts
nexus_event_measurements_metadata Measurement names, units, value statistics
nexus_event_dimensions_metadata Dimension names, types, sources, event counts
nexus_relationships_metadata Relationship types and entity type combinations
nexus_attribution_model_results_metadata Attribution model names and fields

These metadata tables are what LLMs and semantic layers use to discover what columns and values are available without scanning the full tables.


Key Joins

The two central tables are nexus_events and nexus_entities. nexus_entity_participants is the bridge between them.

Never Join on Traits

Trait columns on nexus_entities (e.g., email, salesforce_contact_id, salesforce_account_id) are descriptive attributes, not join keys. When identity resolution merges multiple identifiers into a single entity, traits are collapsed via max() — only one value survives. Joining on a trait can silently point to the wrong source record.

  • To connect entities to each other: use nexus_relationships.
  • To get data about an entity: read trait columns from nexus_entities.
  • To look up an entity by identifier: use nexus_entity_identifiers_to_entity_id.
  • Never use a trait column as a foreign key to reach another table.

For example, if a household entity has a client_agreement trait, read it directly from nexus_entities — don't join the household's salesforce_account_id trait back to salesforce_accounts to get the value. The trait already contains the answer.

Events for an entity:

Examples: "Show me all events for jane@example.com", "What happened with this customer recently?", "Give me a timeline for this person"

SELECT nexus_events.*, nexus_entities.email
FROM nexus_events
JOIN nexus_entity_participants ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE nexus_entities.email = 'jane@example.com'

Entities for events from a source:

Examples: "Who are all the people in Stripe?", "Which companies have Gmail activity?", "List entities from Notion"

SELECT nexus_entities.*, nexus_events.event_name
FROM nexus_entities
JOIN nexus_entity_participants ON nexus_entities.entity_id = nexus_entity_participants.entity_id
JOIN nexus_events ON nexus_entity_participants.event_id = nexus_events.event_id
WHERE nexus_events.source = 'stripe'

Relationships between entities:

Examples: "What companies does this person belong to?", "Which clients does this employee have?", "Show me all members of Acme Corp"

nexus_relationships connects two entities. entity_a and entity_b are both foreign keys to nexus_entities. Use relationship_type to filter and relationship_direction to understand directionality:

SELECT
    nexus_entities.name AS company_name,
    nexus_relationships.relationship_type,
    nexus_relationships.is_active
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_relationships
    ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_relationships.entity_a_id
JOIN nexus_entities
    ON nexus_relationships.entity_b_id = nexus_entities.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
    AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
    AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@example.com'
    AND nexus_relationships.relationship_type = 'membership'

Participant roles:

Examples: "Who sent this email?", "Who was assigned to this task?", "List the recipients of this calendar invite", "Which contact is on this invoice?"

A single event can involve multiple entities in different roles. The role column on nexus_entity_participants distinguishes them (e.g., sender vs recipient, contact vs assignee). Filter by role when you care about a specific side of the interaction:

SELECT nexus_entities.name, nexus_entity_participants.role
FROM nexus_events
JOIN nexus_entity_participants ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE nexus_events.event_id = 'evt_abc123'

Measurements:

Examples: "What was my revenue by source?", "Total revenue by year", "How much did I spend on expenses last quarter?"

SELECT source, SUM(revenue) AS total_revenue
FROM nexus_event_measurements
GROUP BY 1

Measurements by entity:

Examples: "Revenue by customer", "Total expenses by vendor", "Hours worked per client"

SELECT nexus_entities.name, SUM(nexus_event_measurements.revenue) AS total_revenue
FROM nexus_event_measurements
JOIN nexus_entity_participants ON nexus_event_measurements.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
GROUP BY 1

Measurements with dimensions (metrics):

Examples: "What's my revenue this year?", "Revenue by customer", "Total earned revenue by source"

Dimensions tell you which events to aggregate for a given business concept. Measurements tell you what value to aggregate. Together they form metrics:

SELECT
    EXTRACT(YEAR FROM m.occurred_at) AS year,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned
GROUP BY 1
ORDER BY 1

Revenue by customer with dimensions:

SELECT
    nexus_entities.name AS customer,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
JOIN nexus_entity_participants ON m.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE d.is_revenue_earned
  AND nexus_entities.entity_type = 'group'
GROUP BY 1
ORDER BY 2 DESC

Attribution:

Examples: "What touchpoint drove each purchase?", "Which marketing channel led to this conversion?", "Show me the last touch attribution for all sales"

Attribution results are per-entity — an event can involve multiple entities, each with their own touchpoint history. Join through entity_id to match:

SELECT
    nexus_events.event_name,
    nexus_events.occurred_at,
    nexus_entities.name,
    nexus_entities.email,
    nexus_attribution_model_results.source AS touchpoint_source,
    nexus_attribution_model_results.attribution_model_name
FROM nexus_events
JOIN nexus_entity_participants
    ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities
    ON nexus_entity_participants.entity_id = nexus_entities.entity_id
JOIN nexus_attribution_model_results
    ON nexus_attribution_model_results.attributed_event_id = nexus_events.event_id
    AND nexus_attribution_model_results.entity_id = nexus_entities.entity_id
WHERE nexus_events.event_name = 'purchase'
ORDER BY nexus_events.occurred_at DESC

Look up an entity by any identifier:

Examples: "Find this person by their old email", "Look up who has this phone number", "Which entity owns this domain?"

The email column on nexus_entities is the latest email trait — but after entity resolution, an entity may have multiple known emails, phone numbers, etc. Use nexus_entity_identifiers_to_entity_id to find an entity by any identifier ever associated with them. Always filter by entity_type because different entity types (e.g., a person and a group) can share the same identifier value:

SELECT nexus_entities.*
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_entities
    ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_entities.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
    AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
    AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@oldcompany.com'

Current state of an entity:

Examples: "What's this person's current billing status?", "Is this customer still active?", "What subscription tier is this company on?"

SELECT nexus_entity_states.*
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_entity_states
    ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_entity_states.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
    AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
    AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@oldcompany.com'
    AND nexus_entity_states.is_current = TRUE

State at a specific date:

Examples: "What was this subscription's MRR on January 1st?", "What was the customer's lifecycle status last quarter?"

SELECT entity_id, subscription_lifecycle, mrr_amount
FROM nexus_entity_states
WHERE entity_type = 'subscription'
  AND valid_from <= '2025-06-01'
  AND (valid_to > '2025-06-01' OR valid_to IS NULL)

Time-series: MRR over time (opening state + deltas):

Examples: "Show me MRR over time", "How did active subscription count change this year?", "Chart our revenue growth"

The _delta columns on nexus_entity_states are precomputed differences between each state version. The first delta for each entity equals the initial value (the change from "nothing" to "something"). This enables an efficient time-series pattern that avoids expensive date-spine joins:

-- Step 1: Opening state — total MRR at the start of the period
WITH opening_state AS (
    SELECT COALESCE(SUM(COALESCE(mrr_amount, 0)), 0) as opening_mrr
    FROM nexus_entity_states
    WHERE entity_type = 'subscription'
      AND valid_from <= '2025-01-01'
      AND (valid_to > '2025-01-01' OR valid_to IS NULL)
),

-- Step 2: All deltas within the period
deltas AS (
    SELECT
        DATE(valid_from) as change_date,
        SUM(COALESCE(mrr_amount_delta, 0)) as daily_delta
    FROM nexus_entity_states
    WHERE entity_type = 'subscription'
      AND valid_from > '2025-01-01'
      AND valid_from <= '2026-01-01'
    GROUP BY DATE(valid_from)
),

-- Step 3: Cumulative sum from opening state
running AS (
    SELECT
        change_date,
        daily_delta,
        (SELECT opening_mrr FROM opening_state)
          + SUM(daily_delta) OVER (ORDER BY change_date) as running_mrr
    FROM deltas
)

SELECT
    '2025-01-01' as change_date,
    0 as daily_delta,
    (SELECT opening_mrr FROM opening_state) as running_mrr
UNION ALL
SELECT * FROM running
ORDER BY change_date

This scales with the number of state changes — not entities times days. See MRR and Time-Series Query Patterns for more examples including active counts, churn, and company-level aggregation.

Source-specific event fields:

Examples: "Get the Stripe invoice ID for this event", "What subscription is attached to this event?", "Show me the FreshBooks expense category"

nexus_events has a fixed schema across all sources. Source-specific columns (like invoice_id, subscription, subscription_status) stay on the source event table. Join on event_id to access them:

SELECT
    nexus_events.event_name,
    nexus_events.occurred_at,
    stripe_events.invoice_id,
    stripe_events.subscription,
    stripe_events.subscription_status
FROM nexus_events
JOIN stripe_events ON nexus_events.event_id = stripe_events.event_id
WHERE nexus_events.source = 'stripe'

Normalized models for current record state:

Examples: "List all unpaid Stripe invoices", "Show me active subscriptions", "Which FreshBooks clients have outstanding balances?"

Nexus events capture what happened over time. But sometimes you need the current state of a source record — not its history. Each source has normalized models that reflect the latest snapshot from the source system. Use these when filtering by current status would be tedious to reconstruct from events:

SELECT
    stripe_invoices.id,
    stripe_invoices.customer_name,
    stripe_invoices.status,
    stripe_invoices.amount_remaining,
    stripe_invoices.due_at
FROM stripe_invoices
WHERE stripe_invoices.status = 'open'
ORDER BY stripe_invoices.due_at ASC