Entities
How entities work in dbt-nexus — identity resolution, traits (pre-resolution and computed), entity types, and the nexus_entities table.
An entity is any business object that Nexus tracks: a person, a company, a
subscription, a contract. Entities live in nexus_entities — one row per
entity, with trait columns discovered dynamically at compile time.
Two Kinds of Traits
Entity properties come from two pipeline stages. Both produce columns on
nexus_entities. Consumers don't need to know which stage produced a column.
Pre-Resolution Traits (Stage 1)
Source-observed properties extracted from events. These are keyed by
identifier_type / identifier_value because the resolved entity_id isn't
known yet. They flow through identity resolution:
{source}_entity_traits → nexus_entity_traits → identity resolution
→ nexus_resolved_entity_traits → nexus_entities
Examples: email, name, plan, country, domain.
Traits are not join keys. When identity resolution merges multiple identifiers,
finalize_entitiescollapses trait values viamax()— only one value survives per trait column. Never use a trait column as a foreign key to join back to source tables. Usenexus_relationshipsto connect entities andnexus_entity_identifiers_to_entity_idto look up entities by identifier.
Pre-resolution traits are defined in source models. Any source with entities
configured in dbt_project.yml must provide a {source}_entity_traits model
producing the standard EAV schema:
| Column | Description |
|---|---|
entity_trait_id |
Surrogate key |
event_id |
Originating event |
entity_type |
person, group, etc. |
identifier_type |
email, user_id, etc. |
identifier_value |
The actual identifier |
trait_name |
e.g., name, plan |
trait_value |
String value |
source |
Source system name |
occurred_at |
When observed |
Computed Traits (Stage 2)
Post-resolution properties derived from resolved entities, states, events, or
external data. These are keyed directly by entity_id — identity resolution has
already happened, so no identifier mapping is needed.
nexus_resolved_entity_traits ──→ computed trait models ──→ nexus_computed_traits
nexus_entity_states ──────────→ │
nexus_events ─────────────────→ ▼
nexus_entities
Examples:
- Derived properties:
display_namefromcoalesce(name, first_name || ' ' || last_name, email) - Analytical model output:
risk_tierfrom a churn survival model - External dataset merges: demographic data joined on resolved address
- Cross-source resolution: best
nameacross multiple sources using a trust hierarchy
Computed traits run after entity resolution and states. They cannot create
circular dependencies because they don't feed back into identity resolution —
they only flow forward into nexus_entities.
Schema
Computed trait models produce EAV rows with a simpler schema than pre-resolution traits (no identifiers, no event linkage):
| Column | Description |
|---|---|
computed_trait_id |
Surrogate key (prefix ct) |
entity_id |
Resolved entity |
entity_type |
person, group, etc. |
trait_name |
e.g., risk_tier, display_name |
trait_value |
String value |
source |
Source or system |
Configuration
Computed traits follow the same config pattern as states — an
explicit list of model names under nexus.computed_traits:
vars:
nexus:
computed_traits:
- sendowl_computed_traits
- verisk_demographic_traits
Each model in the list must produce the schema above. The
process_computed_traits() macro unions them into nexus_computed_traits.
finalize_entities() pivots the trait names into columns on nexus_entities
alongside pre-resolution traits.
Example: Churn Risk Tier
{{ config(materialized='table', tags=['nexus', 'computed-traits']) }}
with churn_scores as (
select entity_id, entity_type, risk_tier
from {{ ref('sendowl_churn_risk_scores') }}
where risk_tier is not null
)
select
{{ nexus.create_nexus_id('computed_trait',
['entity_id', "'risk_tier'", 'risk_tier']) }} as computed_trait_id,
entity_id,
entity_type,
'risk_tier' as trait_name,
risk_tier as trait_value,
'sendowl' as source
from churn_scores
After dbt build, risk_tier appears as a column on nexus_entities:
SELECT entity_id, name, email, risk_tier
FROM nexus_entities
WHERE risk_tier = 'high'
Pipeline
The full entity pipeline has five stages. Stages 1-3 are pre-resolution. Stages 4-5 are post-resolution.
flowchart TD
subgraph stage1 [Stage 1: Source Extraction]
srcTraits["{source}_entity_traits"]
srcIdent["{source}_entity_identifiers"]
end
subgraph stage2 [Stage 2: Identity Resolution]
nxTraits["nexus_entity_traits"]
nxIdent["nexus_entity_identifiers"]
edges["nexus_entity_identifiers_edges"]
resolved["nexus_resolved_{type}_identifiers"]
resolvedTraits["nexus_resolved_entity_traits"]
end
subgraph stage3 [Stage 3: States]
states["nexus_entity_states"]
end
subgraph stage4 [Stage 4: Computed Traits]
ctModels["computed trait models"]
nxCT["nexus_computed_traits"]
end
subgraph stage5 [Stage 5: Entity Table]
entities["nexus_entities"]
end
srcTraits --> nxTraits --> resolvedTraits
srcIdent --> nxIdent --> edges --> resolved
resolved --> resolvedTraits
resolvedTraits --> entities
resolvedTraits --> ctModels
states --> ctModels
ctModels --> nxCT --> entities
Subtopics
- Entity Resolution — the algorithm that merges identifiers into resolved entities
- Entity Types — ER vs non-ER entities, when to promote a concept to an entity
Related
- States — temporal state tracking per entity (SCD2)
- Sources — how to create source trait and identifier models
- Architecture Overview — core tables and key joins