Troubleshooting Entity Resolution
Debugging entity resolution issues — missing merges, over-merging, and resolution verification.
Entities Not Merging
Symptom: Records that should be the same entity have different entity_id
values even though they share a common identifier like email.
Root cause: A source is not emitting the shared identifier in its
*_entity_identifiers model. If Gmail emits email as an identifier but Notion
only emits it as a trait, there's no edge connecting the two records.
Fix: Ensure all sources that should participate in resolution include the
shared identifier in their *_entity_identifiers models.
Debugging Steps
1. Check for duplicate identifiers in the final table:
SELECT
identifier_value,
identifier_type,
COUNT(DISTINCT entity_id) as entity_count,
STRING_AGG(DISTINCT entity_id, ', ') as entity_ids
FROM nexus_entity_identifiers_to_entity_id
WHERE entity_type = 'person'
AND identifier_type = 'email'
GROUP BY identifier_value, identifier_type
HAVING COUNT(DISTINCT entity_id) > 1
ORDER BY entity_count DESC
If this returns results, the same email resolved to multiple entities — the merge failed.
2. Check what identifiers each source contributes:
SELECT
source,
entity_type,
identifier_type,
COUNT(*) as count,
COUNT(DISTINCT identifier_value) as unique_values
FROM nexus_entity_identifiers
GROUP BY source, entity_type, identifier_type
ORDER BY source, entity_type
Compare sources. If one source is missing an identifier type that others have, that's likely the issue.
3. Check if edges exist between the identifiers:
SELECT *
FROM nexus_entity_identifiers_edges
WHERE entity_type = 'person'
AND (identifier_value_a = 'problem@email.com'
OR identifier_value_b = 'problem@email.com')
No results means no edge was created — the identifier isn't co-occurring with other identifiers in any event.
Over-Merging
Symptom: Unrelated entities are merged into one — a company entity contains records from multiple unrelated companies.
Root cause: A promiscuous identifier (shared email, generic domain, placeholder value) is connecting records that shouldn't be linked.
Fix: Use edge quality autofilters or filter the problematic identifier at the source level. See Edge Quality.
Debugging Steps
1. Find entities with too many identifiers:
SELECT
entity_id,
entity_type,
COUNT(DISTINCT identifier_value) as identifier_count,
COUNT(DISTINCT identifier_type) as type_count
FROM nexus_entity_identifiers_to_entity_id
GROUP BY entity_id, entity_type
HAVING COUNT(DISTINCT identifier_value) > 20
ORDER BY identifier_count DESC
2. Inspect the identifiers of a suspicious entity:
SELECT identifier_type, identifier_value
FROM nexus_entity_identifiers_to_entity_id
WHERE entity_id = 'ent_suspect...'
ORDER BY identifier_type
Look for the identifier that's connecting unrelated records — it's usually a shared email or generic domain.
3. Check edge distributions for that identifier:
SELECT *
FROM edge_distributions
WHERE identifier_value_a = 'suspect@shared.com'
Verifying Resolution is Working
Check that resolution is actually reducing identifier count:
SELECT
entity_type,
COUNT(DISTINCT identifier_value) as total_identifiers,
COUNT(DISTINCT entity_id) as resolved_entities,
ROUND(1.0 - COUNT(DISTINCT entity_id) * 1.0
/ NULLIF(COUNT(DISTINCT identifier_value), 0), 3) as merge_rate
FROM nexus_entity_identifiers_to_entity_id
GROUP BY entity_type
A merge_rate of 0 means no merging is happening. A rate above 0.5 might
indicate over-merging.
Check trait resolution for a specific entity:
SELECT entity_id, trait_name, trait_value
FROM nexus_resolved_entity_traits
WHERE entity_id = 'ent_abc123...'
ORDER BY trait_name
Key Models to Check
| Model | What It Shows |
|---|---|
nexus_entity_identifiers |
All source identifiers (pre-resolution) |
nexus_entity_identifiers_edges |
Edges connecting identifiers (post-filter) |
nexus_resolved_{type}_identifiers |
Resolution output per entity type |
nexus_resolved_entity_traits |
Resolved traits per entity |
nexus_entity_identifiers_to_entity_id |
Final identifier → entity_id mapping |
nexus_entities |
Final entity table with pivoted traits |
edge_distributions |
Edge quality analysis (unfiltered) |
Common Fixes
- Missing merges: Add the shared identifier to the source's
*_entity_identifiersintermediate model - Over-merging: Filter the promiscuous identifier at source level, or enable
error_autofilter: truein edge quality config - Slow resolution: Reduce
nexus.max_recursionto 3 (default 5). Check edge count — if edges are in the millions, deduplication may not be working - Non-deterministic results: Ensure
edge_idis set correctly in source identifier models. Usingevent_idalone asedge_idwhen events have multiple participants can create false connections