Architecture Overview
Core tables, key joins, and the design principles behind dbt-nexus.
Nexus transforms raw data from any number of sources into a fixed set of core tables that describe who your entities are, what they did, how they relate, and what changed over time. Every query — whether from an analyst, an LLM, or a semantic layer — works against these tables.
Entities include both resolved identities (persons and groups that go through entity resolution) and registered objects (subscriptions, contracts, projects, etc.) that have their own lifecycle, state, or relationships.
Core Tables
erDiagram
nexus_events {
string event_id PK
timestamp occurred_at
string event_name
string event_type
string source
}
nexus_entities {
string entity_id PK
string entity_type
string name "dynamic trait"
string email "dynamic trait"
string risk_tier "dynamic trait"
timestamp first_interaction_at
timestamp last_interaction_at
}
nexus_entity_participants {
string entity_participant_id PK
string event_id FK
string entity_id FK
string entity_type
string role
timestamp occurred_at
}
nexus_relationships {
string relationship_id PK
string entity_a_id FK
string entity_b_id FK
string relationship_type
string relationship_direction
boolean is_active
}
nexus_entity_states {
string entity_state_id PK
string entity_id FK
timestamp valid_from
timestamp valid_to
boolean is_current
string ___ "dynamic dimension columns"
numeric ___ "dynamic measurement columns"
numeric ___ "dynamic delta columns"
}
nexus_event_measurements {
string event_id FK
timestamp occurred_at
string source
float ___ "dynamic measurement columns"
}
nexus_event_dimensions {
string event_id FK
timestamp occurred_at
string source
boolean ___ "dynamic boolean columns"
string ___ "dynamic string columns"
}
nexus_attribution_model_results {
string attribution_model_result_id PK
string touchpoint_event_id FK
string attributed_event_id FK
string entity_id FK
string attribution_model_name
timestamp touchpoint_occurred_at
}
nexus_entity_identifiers_to_entity_id {
string identifier_id PK
string entity_id FK
string entity_type
string identifier_type
string identifier_value
}
nexus_events ||--o{ nexus_entity_participants : "participants"
nexus_entities ||--o{ nexus_entity_participants : "events"
nexus_entities ||--o{ nexus_relationships : "entity_a"
nexus_entities ||--o{ nexus_relationships : "entity_b"
nexus_entities ||--o{ nexus_entity_states : "state timeline"
nexus_events ||--o| nexus_event_measurements : "measurements"
nexus_events ||--o| nexus_event_dimensions : "dimensions"
nexus_events ||--o{ nexus_attribution_model_results : "attributed"
nexus_events ||--o{ nexus_attribution_model_results : "touchpoint"
nexus_entities ||--o{ nexus_attribution_model_results : "entity"
nexus_entities ||--o{ nexus_entity_identifiers_to_entity_id : "identifiers"
Table Descriptions
| Table | Purpose | Primary Key |
|---|---|---|
| nexus_events | Every event from every source, unified into a single schema. | event_id |
| nexus_entities | All entities with pivoted trait columns: pre-resolution traits from source extraction and computed traits from post-resolution models (churn scores, derived names, external data merges). Includes ER and non-ER entities. All columns discovered at compile time. | entity_id |
| nexus_entity_participants | Bridge table linking events to entities. One row per entity-role per event. | entity_participant_id |
| nexus_relationships | Entity-to-entity relationships (e.g., person belongs to company). Both entity_a_id and entity_b_id reference nexus_entities. |
relationship_id |
| nexus_entity_states | Temporal state timeline per entity (SCD2). Dimension columns (STRING) for categorical states, measurement columns (NUMERIC) for quantitative values, plus precomputed _delta columns for efficient time-series queries. |
entity_state_id |
| nexus_event_measurements | Pivoted quantitative observations per event (revenue, tax, expense, etc.). One row per event. Measurement columns are discovered at compile time. | event_id |
| nexus_event_dimensions | Pivoted categorical properties per event (is*revenue_earned, invoice_id, etc.). One row per event. Boolean dimensions (is* prefix) default to FALSE; string dimensions default to NULL. Columns are discovered at compile time. |
event_id |
| nexus_attribution_model_results | Maps marketing touchpoints to conversion events, with model-specific attribution weights. | attribution_model_result_id |
| nexus_entity_identifiers_to_entity_id | Maps every known identifier (all emails, phones, domains, etc.) to the resolved entity_id. Use this to look up entities — the traits on nexus_entities only show the latest value. |
identifier_id |
Dynamic Columns
Four core tables have columns that are generated at dbt compile time by querying the intermediate tables for distinct values:
- nexus_entities — one column per trait name (e.g.,
name,email,domain,company_name) from pre-resolution source traits, plus one column per computed trait (e.g.,risk_tier,display_name) from post-resolution computed trait models. Both kinds are discovered at compile time and appear as regular columns. See Entities for details. - nexus_entity_states — one column per dimension (STRING, e.g.,
subscription_lifecycle) and per measurement (NUMERIC, e.g.,mrr_amount), plus a precomputed_deltacolumn for each measurement (e.g.,mrr_amount_delta). Varies by project based on which states are defined. - nexus_event_measurements — one column per measurement name (e.g.,
revenue,tax,expense). Varies by project based on which measurements sources extract. - nexus_event_dimensions — one column per dimension name (e.g.,
is_revenue_earned,invoice_id).is_prefix columns are BOOLEAN (default FALSE); others are STRING (default NULL). Varies by project based on which dimensions sources extract.
This means adding a new source that extracts a new trait, state, measurement, or
dimension automatically produces the corresponding column on the next
dbt build — no manual schema changes required.
Metadata Tables
Each core table has a companion metadata table that catalogs what exists in the data:
| Metadata Table | Describes |
|---|---|
nexus_events_metadata |
Event names, types, sources, custom columns |
nexus_entity_traits_metadata |
Available trait names per entity type |
nexus_entity_identifiers_metadata |
Available identifier types per entity type |
nexus_states_metadata |
State names, possible values, counts |
nexus_event_measurements_metadata |
Measurement names, units, value statistics |
nexus_event_dimensions_metadata |
Dimension names, types, sources, event counts |
nexus_relationships_metadata |
Relationship types and entity type combinations |
nexus_attribution_model_results_metadata |
Attribution model names and fields |
These metadata tables are what LLMs and semantic layers use to discover what columns and values are available without scanning the full tables.
Key Joins
The two central tables are nexus_events and nexus_entities.
nexus_entity_participants is the bridge between them.
Never Join on Traits
Trait columns on nexus_entities (e.g., email, salesforce_contact_id,
salesforce_account_id) are descriptive attributes, not join keys. When
identity resolution merges multiple identifiers into a single entity, traits
are collapsed via max() — only one value survives. Joining on a trait can
silently point to the wrong source record.
- To connect entities to each other: use
nexus_relationships. - To get data about an entity: read trait columns from
nexus_entities. - To look up an entity by identifier: use
nexus_entity_identifiers_to_entity_id. - Never use a trait column as a foreign key to reach another table.
For example, if a household entity has a client_agreement trait, read it
directly from nexus_entities — don't join the household's
salesforce_account_id trait back to salesforce_accounts to get the value.
The trait already contains the answer.
Events for an entity:
Examples: "Show me all events for jane@example.com", "What happened with this customer recently?", "Give me a timeline for this person"
SELECT nexus_events.*, nexus_entities.email
FROM nexus_events
JOIN nexus_entity_participants ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE nexus_entities.email = 'jane@example.com'
Entities for events from a source:
Examples: "Who are all the people in Stripe?", "Which companies have Gmail activity?", "List entities from Notion"
SELECT nexus_entities.*, nexus_events.event_name
FROM nexus_entities
JOIN nexus_entity_participants ON nexus_entities.entity_id = nexus_entity_participants.entity_id
JOIN nexus_events ON nexus_entity_participants.event_id = nexus_events.event_id
WHERE nexus_events.source = 'stripe'
Relationships between entities:
Examples: "What companies does this person belong to?", "Which clients does this employee have?", "Show me all members of Acme Corp"
nexus_relationships connects two entities. entity_a and entity_b are both
foreign keys to nexus_entities. Use relationship_type to filter and
relationship_direction to understand directionality:
SELECT
nexus_entities.name AS company_name,
nexus_relationships.relationship_type,
nexus_relationships.is_active
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_relationships
ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_relationships.entity_a_id
JOIN nexus_entities
ON nexus_relationships.entity_b_id = nexus_entities.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@example.com'
AND nexus_relationships.relationship_type = 'membership'
Participant roles:
Examples: "Who sent this email?", "Who was assigned to this task?", "List the recipients of this calendar invite", "Which contact is on this invoice?"
A single event can involve multiple entities in different roles. The role
column on nexus_entity_participants distinguishes them (e.g., sender vs
recipient, contact vs assignee). Filter by role when you care about a
specific side of the interaction:
SELECT nexus_entities.name, nexus_entity_participants.role
FROM nexus_events
JOIN nexus_entity_participants ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE nexus_events.event_id = 'evt_abc123'
Measurements:
Examples: "What was my revenue by source?", "Total revenue by year", "How much did I spend on expenses last quarter?"
SELECT source, SUM(revenue) AS total_revenue
FROM nexus_event_measurements
GROUP BY 1
Measurements by entity:
Examples: "Revenue by customer", "Total expenses by vendor", "Hours worked per client"
SELECT nexus_entities.name, SUM(nexus_event_measurements.revenue) AS total_revenue
FROM nexus_event_measurements
JOIN nexus_entity_participants ON nexus_event_measurements.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
GROUP BY 1
Measurements with dimensions (metrics):
Examples: "What's my revenue this year?", "Revenue by customer", "Total earned revenue by source"
Dimensions tell you which events to aggregate for a given business concept. Measurements tell you what value to aggregate. Together they form metrics:
SELECT
EXTRACT(YEAR FROM m.occurred_at) AS year,
SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned
GROUP BY 1
ORDER BY 1
Revenue by customer with dimensions:
SELECT
nexus_entities.name AS customer,
SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
JOIN nexus_entity_participants ON m.event_id = nexus_entity_participants.event_id
JOIN nexus_entities ON nexus_entity_participants.entity_id = nexus_entities.entity_id
WHERE d.is_revenue_earned
AND nexus_entities.entity_type = 'group'
GROUP BY 1
ORDER BY 2 DESC
Attribution:
Examples: "What touchpoint drove each purchase?", "Which marketing channel led to this conversion?", "Show me the last touch attribution for all sales"
Attribution results are per-entity — an event can involve multiple entities,
each with their own touchpoint history. Join through entity_id to match:
SELECT
nexus_events.event_name,
nexus_events.occurred_at,
nexus_entities.name,
nexus_entities.email,
nexus_attribution_model_results.source AS touchpoint_source,
nexus_attribution_model_results.attribution_model_name
FROM nexus_events
JOIN nexus_entity_participants
ON nexus_events.event_id = nexus_entity_participants.event_id
JOIN nexus_entities
ON nexus_entity_participants.entity_id = nexus_entities.entity_id
JOIN nexus_attribution_model_results
ON nexus_attribution_model_results.attributed_event_id = nexus_events.event_id
AND nexus_attribution_model_results.entity_id = nexus_entities.entity_id
WHERE nexus_events.event_name = 'purchase'
ORDER BY nexus_events.occurred_at DESC
Look up an entity by any identifier:
Examples: "Find this person by their old email", "Look up who has this phone number", "Which entity owns this domain?"
The email column on nexus_entities is the latest email trait — but after
entity resolution, an entity may have multiple known emails, phone numbers, etc.
Use nexus_entity_identifiers_to_entity_id to find an entity by any identifier
ever associated with them. Always filter by entity_type because different
entity types (e.g., a person and a group) can share the same identifier value:
SELECT nexus_entities.*
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_entities
ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_entities.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@oldcompany.com'
Current state of an entity:
Examples: "What's this person's current billing status?", "Is this customer still active?", "What subscription tier is this company on?"
SELECT nexus_entity_states.*
FROM nexus_entity_identifiers_to_entity_id
JOIN nexus_entity_states
ON nexus_entity_identifiers_to_entity_id.entity_id = nexus_entity_states.entity_id
WHERE nexus_entity_identifiers_to_entity_id.entity_type = 'person'
AND nexus_entity_identifiers_to_entity_id.identifier_type = 'email'
AND nexus_entity_identifiers_to_entity_id.identifier_value = 'jane@oldcompany.com'
AND nexus_entity_states.is_current = TRUE
State at a specific date:
Examples: "What was this subscription's MRR on January 1st?", "What was the customer's lifecycle status last quarter?"
SELECT entity_id, subscription_lifecycle, mrr_amount
FROM nexus_entity_states
WHERE entity_type = 'subscription'
AND valid_from <= '2025-06-01'
AND (valid_to > '2025-06-01' OR valid_to IS NULL)
Time-series: MRR over time (opening state + deltas):
Examples: "Show me MRR over time", "How did active subscription count change this year?", "Chart our revenue growth"
The _delta columns on nexus_entity_states are precomputed differences
between each state version. The first delta for each entity equals the initial
value (the change from "nothing" to "something"). This enables an efficient
time-series pattern that avoids expensive date-spine joins:
-- Step 1: Opening state — total MRR at the start of the period
WITH opening_state AS (
SELECT COALESCE(SUM(COALESCE(mrr_amount, 0)), 0) as opening_mrr
FROM nexus_entity_states
WHERE entity_type = 'subscription'
AND valid_from <= '2025-01-01'
AND (valid_to > '2025-01-01' OR valid_to IS NULL)
),
-- Step 2: All deltas within the period
deltas AS (
SELECT
DATE(valid_from) as change_date,
SUM(COALESCE(mrr_amount_delta, 0)) as daily_delta
FROM nexus_entity_states
WHERE entity_type = 'subscription'
AND valid_from > '2025-01-01'
AND valid_from <= '2026-01-01'
GROUP BY DATE(valid_from)
),
-- Step 3: Cumulative sum from opening state
running AS (
SELECT
change_date,
daily_delta,
(SELECT opening_mrr FROM opening_state)
+ SUM(daily_delta) OVER (ORDER BY change_date) as running_mrr
FROM deltas
)
SELECT
'2025-01-01' as change_date,
0 as daily_delta,
(SELECT opening_mrr FROM opening_state) as running_mrr
UNION ALL
SELECT * FROM running
ORDER BY change_date
This scales with the number of state changes — not entities times days. See MRR and Time-Series Query Patterns for more examples including active counts, churn, and company-level aggregation.
Source-specific event fields:
Examples: "Get the Stripe invoice ID for this event", "What subscription is attached to this event?", "Show me the FreshBooks expense category"
nexus_events has a fixed schema across all sources. Source-specific columns
(like invoice_id, subscription, subscription_status) stay on the source
event table. Join on event_id to access them:
SELECT
nexus_events.event_name,
nexus_events.occurred_at,
stripe_events.invoice_id,
stripe_events.subscription,
stripe_events.subscription_status
FROM nexus_events
JOIN stripe_events ON nexus_events.event_id = stripe_events.event_id
WHERE nexus_events.source = 'stripe'
Normalized models for current record state:
Examples: "List all unpaid Stripe invoices", "Show me active subscriptions", "Which FreshBooks clients have outstanding balances?"
Nexus events capture what happened over time. But sometimes you need the current state of a source record — not its history. Each source has normalized models that reflect the latest snapshot from the source system. Use these when filtering by current status would be tedious to reconstruct from events:
SELECT
stripe_invoices.id,
stripe_invoices.customer_name,
stripe_invoices.status,
stripe_invoices.amount_remaining,
stripe_invoices.due_at
FROM stripe_invoices
WHERE stripe_invoices.status = 'open'
ORDER BY stripe_invoices.due_at ASC