Trait-Derived States

How to derive lifecycle states from entity trait events, with patterns for state cycling (re-leading, upgrade/downgrade) using gap detection.

Some lifecycle states aren't triggered by explicit business events like "enrollment" or "subscription_created." Instead, they're implied by the presence of certain entity traits — for example, a person becomes a "lead" when they first provide an email address.

This page documents the pattern for deriving states from trait events, including how to handle state cycling (e.g., lead → stale → lead).

The Problem

A naive approach checks a boolean flag like has_contact_info on a snapshot table. This works for current-state queries but has two drawbacks:

  1. No event linkage. The state row has no trigger_event_id, so you can't trace it back to the form submission, phone call, or import that created the lead.
  2. No timeline. You only know the person currently has contact info, not when they first provided it. The state_entered_at has to be approximated (e.g., from first_interaction_at or _created_at).

The Solution: Join Traits to Resolved Identifiers

nexus_entity_traits stores every trait event with its originating event_id, but uses pre-resolution identifiers (not entity IDs). To get the resolved entity_id, join through the identity resolution table:

SELECT
    rpi.person_id AS entity_id,
    t.event_id,
    t.trait_name,
    t.trait_value,
    t.occurred_at
FROM nexus_entity_traits t
INNER JOIN nexus_resolved_person_identifiers rpi
    ON t.identifier_type = rpi.identifier_type
    AND t.identifier_value = rpi.identifier_value
WHERE t.entity_type = 'person'
  AND t.trait_name IN ('email', 'phone', 'mobile_phone')
  AND t.trait_value IS NOT NULL
  AND TRIM(t.trait_value) != ''

This gives you:

  • entity_id — the resolved person, ready to join to other state/event tables.
  • event_id — the specific event that carried the trait, usable as trigger_event_id.
  • occurred_at — the exact timestamp when the trait was provided.
  • Full history — every trait event, not just the latest.

Why Not nexus_resolved_entity_traits?

nexus_resolved_entity_traits is convenient (it already has entity_id) but has two limitations:

  1. Regenerated IDs. The entity_trait_id is recomputed with different hash inputs than nexus_entity_traits, so you can't join between them.
  2. Latest-only. The resolution macro applies ROW_NUMBER partitioned by (entity_id, trait_name) and keeps only the most recent value. You lose the full trait timeline, which matters for gap detection and cycling.

State Cycling with Gap Detection

Many state machines have cycles: lead → stale → lead, or active → downgraded → active. If the cycle is driven by recurring trait or event activity, the burst detection pattern handles it without recursion.

The Pattern

  1. Order events by occurred_at per entity.
  2. Compute gaps between consecutive events with LAG.
  3. Flag burst starts wherever the gap exceeds a threshold (or is NULL for the first event).
  4. Assign burst numbers with a conditional running SUM.
  5. Summarize each burst to get the entry timestamp, exit timestamp, and triggering event.
WITH contact_traits AS (
    SELECT
        rpi.person_id AS entity_id,
        t.event_id,
        t.occurred_at,
        DATEDIFF(day,
            LAG(t.occurred_at) OVER (
                PARTITION BY rpi.person_id ORDER BY t.occurred_at
            ),
            t.occurred_at
        ) AS days_since_prev
    FROM nexus_entity_traits t
    INNER JOIN nexus_resolved_person_identifiers rpi
        ON t.identifier_type = rpi.identifier_type
        AND t.identifier_value = rpi.identifier_value
    WHERE t.entity_type = 'person'
      AND t.trait_name IN ('email', 'phone', 'mobile_phone')
      AND t.trait_value IS NOT NULL
      AND TRIM(t.trait_value) != ''
),

lead_bursts AS (
    SELECT
        entity_id,
        event_id,
        occurred_at,
        SUM(
            CASE WHEN days_since_prev IS NULL OR days_since_prev > 45
                 THEN 1 ELSE 0 END
        ) OVER (PARTITION BY entity_id ORDER BY occurred_at) AS burst_num
    FROM contact_traits
),

burst_summary AS (
    SELECT
        entity_id,
        burst_num,
        MIN(occurred_at) AS lead_entered_at,
        MIN_BY(event_id, occurred_at) AS lead_event_id,
        DATEADD(day, 45, MAX(occurred_at)) AS stale_at
    FROM lead_bursts
    GROUP BY entity_id, burst_num
)

Each row in burst_summary produces:

  • A lead state row at lead_entered_at with trigger_event_id = lead_event_id.
  • A stale state row at stale_at (if stale_at <= CURRENT_TIMESTAMP()).

Rolling vs. Fixed Clocks

The burst pattern naturally supports a rolling clock: the stale threshold is computed from MAX(occurred_at) (the last activity in the burst), not MIN(occurred_at) (the first). This means continued engagement resets the clock.

If you need a fixed clock (e.g., 45 days from entering the state regardless of subsequent activity), use MIN(occurred_at) for the stale calculation instead.

Performance

The entire pattern is a single-pass window function per entity. No recursion, no self-joins, no iterative CTEs. Snowflake handles this efficiently even over millions of trait events because:

  • LAG and SUM windows are partitioned by entity_id (narrow partitions).
  • The GROUP BY in burst_summary reduces the row count dramatically.

Other Cycling Patterns

The burst detection pattern generalizes beyond lead/stale:

Pattern Active state Inactive state Gap threshold Trigger
Lead lifecycle lead stale 45 days Contact-info traits
Engagement tier active dormant 30 days Any interaction event
Subscription upgrade/downgrade premium standard N/A (event-driven) Plan change events
Re-engagement engaged lapsed 90 days Purchase or login events

For event-driven cycling (like upgrade/downgrade), the same burst detection works but the gap threshold is replaced by the event semantics — each upgrade event starts a new "premium" burst, each downgrade starts a new "standard" burst.

Combining with Event-Driven States

Trait-derived states (lead, stale) typically coexist with event-driven states (enrolled, paying, churned) in the same state model. The state_changes CTE unions them together, and the deduplication step resolves any timestamp collisions using stage_rank:

state_changes AS (
    SELECT * FROM pre_enrollment_states   -- lead rows (trait-derived)
    UNION ALL
    SELECT * FROM stale_lead_states       -- stale rows (synthetic)
    UNION ALL
    SELECT * FROM event_states            -- enrolled, paying, etc. (event-driven)
)

Assign stage_rank values so that event-driven states always take precedence over trait-derived states at the same timestamp. For example, if a person's enrollment event has the same occurred_at as a lead trait, the enrollment wins.