Trait-Derived States
How to derive lifecycle states from entity trait events, with patterns for state cycling (re-leading, upgrade/downgrade) using gap detection.
Some lifecycle states aren't triggered by explicit business events like "enrollment" or "subscription_created." Instead, they're implied by the presence of certain entity traits — for example, a person becomes a "lead" when they first provide an email address.
This page documents the pattern for deriving states from trait events, including how to handle state cycling (e.g., lead → stale → lead).
The Problem
A naive approach checks a boolean flag like has_contact_info on a snapshot
table. This works for current-state queries but has two drawbacks:
- No event linkage. The state row has no
trigger_event_id, so you can't trace it back to the form submission, phone call, or import that created the lead. - No timeline. You only know the person currently has contact info, not
when they first provided it. The
state_entered_athas to be approximated (e.g., fromfirst_interaction_ator_created_at).
The Solution: Join Traits to Resolved Identifiers
nexus_entity_traits stores every trait event with its originating event_id,
but uses pre-resolution identifiers (not entity IDs). To get the resolved
entity_id, join through the identity resolution table:
SELECT
rpi.person_id AS entity_id,
t.event_id,
t.trait_name,
t.trait_value,
t.occurred_at
FROM nexus_entity_traits t
INNER JOIN nexus_resolved_person_identifiers rpi
ON t.identifier_type = rpi.identifier_type
AND t.identifier_value = rpi.identifier_value
WHERE t.entity_type = 'person'
AND t.trait_name IN ('email', 'phone', 'mobile_phone')
AND t.trait_value IS NOT NULL
AND TRIM(t.trait_value) != ''
This gives you:
entity_id— the resolved person, ready to join to other state/event tables.event_id— the specific event that carried the trait, usable astrigger_event_id.occurred_at— the exact timestamp when the trait was provided.- Full history — every trait event, not just the latest.
Why Not nexus_resolved_entity_traits?
nexus_resolved_entity_traits is convenient (it already has entity_id) but
has two limitations:
- Regenerated IDs. The
entity_trait_idis recomputed with different hash inputs thannexus_entity_traits, so you can't join between them. - Latest-only. The resolution macro applies
ROW_NUMBERpartitioned by(entity_id, trait_name)and keeps only the most recent value. You lose the full trait timeline, which matters for gap detection and cycling.
State Cycling with Gap Detection
Many state machines have cycles: lead → stale → lead, or active → downgraded → active. If the cycle is driven by recurring trait or event activity, the burst detection pattern handles it without recursion.
The Pattern
- Order events by
occurred_atper entity. - Compute gaps between consecutive events with
LAG. - Flag burst starts wherever the gap exceeds a threshold (or is NULL for the first event).
- Assign burst numbers with a conditional running
SUM. - Summarize each burst to get the entry timestamp, exit timestamp, and triggering event.
WITH contact_traits AS (
SELECT
rpi.person_id AS entity_id,
t.event_id,
t.occurred_at,
DATEDIFF(day,
LAG(t.occurred_at) OVER (
PARTITION BY rpi.person_id ORDER BY t.occurred_at
),
t.occurred_at
) AS days_since_prev
FROM nexus_entity_traits t
INNER JOIN nexus_resolved_person_identifiers rpi
ON t.identifier_type = rpi.identifier_type
AND t.identifier_value = rpi.identifier_value
WHERE t.entity_type = 'person'
AND t.trait_name IN ('email', 'phone', 'mobile_phone')
AND t.trait_value IS NOT NULL
AND TRIM(t.trait_value) != ''
),
lead_bursts AS (
SELECT
entity_id,
event_id,
occurred_at,
SUM(
CASE WHEN days_since_prev IS NULL OR days_since_prev > 45
THEN 1 ELSE 0 END
) OVER (PARTITION BY entity_id ORDER BY occurred_at) AS burst_num
FROM contact_traits
),
burst_summary AS (
SELECT
entity_id,
burst_num,
MIN(occurred_at) AS lead_entered_at,
MIN_BY(event_id, occurred_at) AS lead_event_id,
DATEADD(day, 45, MAX(occurred_at)) AS stale_at
FROM lead_bursts
GROUP BY entity_id, burst_num
)
Each row in burst_summary produces:
- A lead state row at
lead_entered_atwithtrigger_event_id = lead_event_id. - A stale state row at
stale_at(ifstale_at <= CURRENT_TIMESTAMP()).
Rolling vs. Fixed Clocks
The burst pattern naturally supports a rolling clock: the stale threshold is
computed from MAX(occurred_at) (the last activity in the burst), not
MIN(occurred_at) (the first). This means continued engagement resets the
clock.
If you need a fixed clock (e.g., 45 days from entering the state regardless
of subsequent activity), use MIN(occurred_at) for the stale calculation
instead.
Performance
The entire pattern is a single-pass window function per entity. No recursion, no self-joins, no iterative CTEs. Snowflake handles this efficiently even over millions of trait events because:
LAGandSUMwindows are partitioned byentity_id(narrow partitions).- The
GROUP BYinburst_summaryreduces the row count dramatically.
Other Cycling Patterns
The burst detection pattern generalizes beyond lead/stale:
| Pattern | Active state | Inactive state | Gap threshold | Trigger |
|---|---|---|---|---|
| Lead lifecycle | lead | stale | 45 days | Contact-info traits |
| Engagement tier | active | dormant | 30 days | Any interaction event |
| Subscription upgrade/downgrade | premium | standard | N/A (event-driven) | Plan change events |
| Re-engagement | engaged | lapsed | 90 days | Purchase or login events |
For event-driven cycling (like upgrade/downgrade), the same burst detection works but the gap threshold is replaced by the event semantics — each upgrade event starts a new "premium" burst, each downgrade starts a new "standard" burst.
Combining with Event-Driven States
Trait-derived states (lead, stale) typically coexist with event-driven states
(enrolled, paying, churned) in the same state model. The state_changes CTE
unions them together, and the deduplication step resolves any timestamp
collisions using stage_rank:
state_changes AS (
SELECT * FROM pre_enrollment_states -- lead rows (trait-derived)
UNION ALL
SELECT * FROM stale_lead_states -- stale rows (synthetic)
UNION ALL
SELECT * FROM event_states -- enrolled, paying, etc. (event-driven)
)
Assign stage_rank values so that event-driven states always take precedence
over trait-derived states at the same timestamp. For example, if a person's
enrollment event has the same occurred_at as a lead trait, the enrollment
wins.