Entities

How entities work in dbt-nexus — identity resolution, traits (pre-resolution and computed), entity types, and the nexus_entities table.

An entity is any business object that Nexus tracks: a person, a company, a subscription, a contract. Entities live in nexus_entities — one row per entity, with trait columns discovered dynamically at compile time.

Two Kinds of Traits

Entity properties come from two pipeline stages. Both produce columns on nexus_entities. Consumers don't need to know which stage produced a column.

Pre-Resolution Traits (Stage 1)

Source-observed properties extracted from events. These are keyed by identifier_type / identifier_value because the resolved entity_id isn't known yet. They flow through identity resolution:

{source}_entity_traits → nexus_entity_traits → identity resolution
    → nexus_resolved_entity_traits → nexus_entities

Examples: email, name, plan, country, domain.

Traits are not join keys. When identity resolution merges multiple identifiers, finalize_entities collapses trait values via max() — only one value survives per trait column. Never use a trait column as a foreign key to join back to source tables. Use nexus_relationships to connect entities and nexus_entity_identifiers_to_entity_id to look up entities by identifier.

Pre-resolution traits are defined in source models. Any source with entities configured in dbt_project.yml must provide a {source}_entity_traits model producing the standard EAV schema:

Column Description
entity_trait_id Surrogate key
event_id Originating event
entity_type person, group, etc.
identifier_type email, user_id, etc.
identifier_value The actual identifier
trait_name e.g., name, plan
trait_value String value
source Source system name
occurred_at When observed

Computed Traits (Stage 2)

Post-resolution properties derived from resolved entities, states, events, or external data. These are keyed directly by entity_id — identity resolution has already happened, so no identifier mapping is needed.

nexus_resolved_entity_traits ──→ computed trait models ──→ nexus_computed_traits
nexus_entity_states ──────────→                                      │
nexus_events ─────────────────→                                      ▼
                                                          nexus_entities

Examples:

  • Derived properties: display_name from coalesce(name, first_name || ' ' || last_name, email)
  • Analytical model output: risk_tier from a churn survival model
  • External dataset merges: demographic data joined on resolved address
  • Cross-source resolution: best name across multiple sources using a trust hierarchy

Computed traits run after entity resolution and states. They cannot create circular dependencies because they don't feed back into identity resolution — they only flow forward into nexus_entities.

Schema

Computed trait models produce EAV rows with a simpler schema than pre-resolution traits (no identifiers, no event linkage):

Column Description
computed_trait_id Surrogate key (prefix ct)
entity_id Resolved entity
entity_type person, group, etc.
trait_name e.g., risk_tier, display_name
trait_value String value
source Source or system

Configuration

Computed traits follow the same config pattern as states — an explicit list of model names under nexus.computed_traits:

vars:
  nexus:
    computed_traits:
      - sendowl_computed_traits
      - verisk_demographic_traits

Each model in the list must produce the schema above. The process_computed_traits() macro unions them into nexus_computed_traits. finalize_entities() pivots the trait names into columns on nexus_entities alongside pre-resolution traits.

Example: Churn Risk Tier

{{ config(materialized='table', tags=['nexus', 'computed-traits']) }}

with churn_scores as (
    select entity_id, entity_type, risk_tier
    from {{ ref('sendowl_churn_risk_scores') }}
    where risk_tier is not null
)

select
    {{ nexus.create_nexus_id('computed_trait',
       ['entity_id', "'risk_tier'", 'risk_tier']) }} as computed_trait_id,
    entity_id,
    entity_type,
    'risk_tier' as trait_name,
    risk_tier as trait_value,
    'sendowl' as source
from churn_scores

After dbt build, risk_tier appears as a column on nexus_entities:

SELECT entity_id, name, email, risk_tier
FROM nexus_entities
WHERE risk_tier = 'high'

Pipeline

The full entity pipeline has five stages. Stages 1-3 are pre-resolution. Stages 4-5 are post-resolution.

flowchart TD
    subgraph stage1 [Stage 1: Source Extraction]
        srcTraits["{source}_entity_traits"]
        srcIdent["{source}_entity_identifiers"]
    end

    subgraph stage2 [Stage 2: Identity Resolution]
        nxTraits["nexus_entity_traits"]
        nxIdent["nexus_entity_identifiers"]
        edges["nexus_entity_identifiers_edges"]
        resolved["nexus_resolved_{type}_identifiers"]
        resolvedTraits["nexus_resolved_entity_traits"]
    end

    subgraph stage3 [Stage 3: States]
        states["nexus_entity_states"]
    end

    subgraph stage4 [Stage 4: Computed Traits]
        ctModels["computed trait models"]
        nxCT["nexus_computed_traits"]
    end

    subgraph stage5 [Stage 5: Entity Table]
        entities["nexus_entities"]
    end

    srcTraits --> nxTraits --> resolvedTraits
    srcIdent --> nxIdent --> edges --> resolved
    resolved --> resolvedTraits
    resolvedTraits --> entities
    resolvedTraits --> ctModels
    states --> ctModels
    ctModels --> nxCT --> entities

Subtopics

  • Entity Resolution — the algorithm that merges identifiers into resolved entities
  • Entity Types — ER vs non-ER entities, when to promote a concept to an entity