Troubleshooting Attribution Issues
Common attribution issues and how to diagnose them, including power user analysis, deduplication debugging, and performance troubleshooting.
This guide helps diagnose common issues in the Nexus attribution system, with real examples and debugging queries.
Common Attribution Issues
Issue 1: Lower Batch Count Than Expected
Symptom: nexus_touchpoint_path_batches has significantly fewer rows than
nexus_touchpoints
Expected: Batch count should be roughly 50-70% of touchpoint count after deduplication Actual: May see batch counts that are 30-40% of touchpoint count
Root Cause: High deduplication rates due to power users or internal users
Diagnosis Queries
Step 1: Check Overall Deduplication Impact
-- Compare touchpoint counts through the pipeline
SELECT 'Raw touchpoints' as stage, count(*) as count FROM nexus_touchpoints
UNION ALL
SELECT 'After deduplication' as stage, count(*) as count
FROM nexus_touchpoint_paths
UNION ALL
SELECT 'Final batches' as stage, count(*) as count
FROM nexus_touchpoint_path_batches
ORDER BY count DESC
Step 2: Identify Power Users
-- Find people with extreme touchpoint counts
WITH dedup_analysis AS (
SELECT
p.person_id,
count(*) as total_touchpoints,
count(CASE WHEN duplicate_touchpoint = false THEN 1 END) as touchpoints_kept,
count(CASE WHEN duplicate_touchpoint = true THEN 1 END) as touchpoints_removed
FROM (
SELECT
t.*,
p.person_id,
CASE
WHEN lag(t.attribution_deduplication_key) OVER (
PARTITION BY p.person_id
ORDER BY t.occurred_at
) = t.attribution_deduplication_key
THEN true
ELSE false
END as duplicate_touchpoint
FROM nexus_touchpoints t
INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
) dedup_check
GROUP BY p.person_id
)
SELECT
person_id,
total_touchpoints,
touchpoints_removed,
round(touchpoints_removed * 100.0 / total_touchpoints, 2) as percent_removed
FROM dedup_analysis
WHERE total_touchpoints > 1000 -- Focus on power users
ORDER BY total_touchpoints DESC
LIMIT 10
Step 3: Analyze Power User Behavior
-- Check if these are internal users
SELECT
e.event_name,
count(*) as event_count,
count(DISTINCT p.person_id) as unique_people
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
WHERE p.person_id IN ('per_39d2cd41d42a1e4e35899fa3d6a51a3d') -- Replace with actual power user IDs
GROUP BY e.event_name
ORDER BY event_count DESC
LIMIT 20
Expected Findings: Power User Patterns
Internal User Indicators:
- Extreme daily usage: 25-73 events per day for months
- Admin activities: Dashboard, my-listings, create-listing, notifications
- Brand searches: Google organic with no UTM campaigns
- Consistent attribution: Same dedup key (Google organic) for long periods
- High event variety: 100-1,500+ unique event types
Example Power User Profile:
Person ID: per_39d2cd41d42a1e4e35899fa3d6a51a3d
- 12,012 total events over 474 days (25 events/day average)
- 9,887 touchpoints → 45 kept (99.54% deduplication)
- Primary activities: Dashboard (4,806), Homepage (4,014), Admin pages
- Attribution: 100% Google organic referral traffic
- Pattern: Daily user accessing internal features via Google search
Resolution
This is typically expected behavior:
- ✅ High deduplication rates (90%+) for internal users are normal
- ✅ Effective noise filtering prevents internal usage from skewing attribution
- ✅ Real customer attribution is preserved while internal activity is deduplicated
Action: No fix needed - the system is working as designed to filter internal user noise.
Issue 2: Attribution Rate Lower Than Expected
Symptom: Only 60-70% of events receive attribution when expecting 80%+
Root Cause: Legitimate unattributed events (direct traffic, pre-touchpoint events)
Diagnosis Queries
Check Unattributed Event Breakdown
-- Analyze why events don't have attribution
WITH unattributed_events AS (
SELECT e.*
FROM nexus_events e
LEFT JOIN nexus_touchpoint_paths tp ON e.event_id = tp.event_id
WHERE tp.event_id IS NULL
),
person_touchpoint_check AS (
SELECT
ue.event_id,
ue.occurred_at,
pp.person_id,
CASE
WHEN pwt.person_id IS NOT NULL THEN 'Person HAS touchpoints elsewhere'
ELSE 'Person has NO touchpoints at all'
END as person_touchpoint_status
FROM unattributed_events ue
LEFT JOIN nexus_person_participants pp ON ue.event_id = pp.event_id
LEFT JOIN (
SELECT DISTINCT pp.person_id
FROM nexus_person_participants pp
INNER JOIN nexus_touchpoints tp ON pp.event_id = tp.touchpoint_event_id
) pwt ON pp.person_id = pwt.person_id
)
SELECT
person_touchpoint_status,
count(*) as event_count,
round(count(*) * 100.0 / sum(count(*)) OVER(), 2) as percentage
FROM person_touchpoint_check
WHERE person_id IS NOT NULL
GROUP BY person_touchpoint_status
Expected Results:
- ~60%: People with no touchpoints (direct traffic) → Normal
- ~30%: Events before person's first touchpoint → Normal
- ~10%: Other edge cases → Acceptable
Issue 3: Performance Problems
Symptom: Attribution models taking too long to run or timing out
Common Causes and Solutions
1. Cartesian Product Explosion
-- BAD: Creates massive row explosion
FROM touchpoints t
INNER JOIN events e ON t.person_id = e.person_id
WHERE t.occurred_at < e.event_occurred_at
Solution: Use MAX aggregation strategy (implemented in
nexus_touchpoint_paths)
-- GOOD: Find latest touchpoint first, then join
WITH latest_touchpoint_times AS (
SELECT event_id, MAX(touchpoint_occurred_at) as latest_touchpoint_at
FROM events e
INNER JOIN touchpoints t ON e.person_id = t.person_id
AND t.occurred_at < e.event_occurred_at
AND datediff('day', t.occurred_at, e.event_occurred_at) <= 90
GROUP BY event_id
)
2. Missing Attribution Window
- Problem: No time limit on touchpoint attribution
- Solution: Add 90-day attribution window to prevent runaway joins
3. No Materialization Strategy
- Problem: Models running as views instead of tables
- Solution: Use
materialized='table'for attribution models
Issue 4: Duplicate Attribution Results
Symptom: Events getting multiple touchpoints when expecting 1:1 relationship
Diagnosis
Check for Timestamp Ties
-- Find events with multiple touchpoints at same timestamp
SELECT
event_id,
count(*) as touchpoint_count
FROM nexus_touchpoint_paths
GROUP BY event_id
HAVING count(*) > 1
LIMIT 10
Solution: Tie-breaker logic (implemented in nexus_touchpoint_paths)
-- Add deterministic tie-breaker
ROW_NUMBER() OVER (
PARTITION BY event_id
ORDER BY touchpoint_id -- Deterministic ordering
) as tie_breaker_rank
Debugging Attribution Deduplication
Understanding High Deduplication Rates
Normal Deduplication Scenarios:
- Page refreshes: Same page, same attribution, within seconds
- Session continuation: Same campaign context across page views
- Internal users: Employees/power users with consistent Google organic traffic
Problematic Deduplication Scenarios:
- Different campaigns getting same dedup key
- Cross-session deduplication when sessions should be separate
- Time-based issues where old touchpoints affect new ones
Deduplication Analysis Queries
Check Dedup Key Distribution
-- Understand what's being deduplicated
SELECT
t.channel,
t.touchpoint_type,
t.referrer,
count(*) as touchpoint_count,
count(DISTINCT t.touchpoint_id) as unique_touchpoints,
count(DISTINCT p.person_id) as unique_people,
round(count(*) / count(DISTINCT p.person_id), 2) as avg_touchpoints_per_person
FROM nexus_touchpoints t
INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
WHERE t.attribution_deduplication_key = 'att_dedup_3612d05614d4825cab013879906fe684' -- Replace with problematic key
GROUP BY t.channel, t.touchpoint_type, t.referrer
ORDER BY touchpoint_count DESC
Analyze Power User Patterns
-- Check if high deduplication users are internal
SELECT
p.person_id,
count(DISTINCT e.event_id) as total_events,
count(DISTINCT date(e.occurred_at)) as unique_days_active,
datediff('day', min(e.occurred_at), max(e.occurred_at)) as total_days_span,
count(DISTINCT e.event_id) / datediff('day', min(e.occurred_at), max(e.occurred_at)) as avg_events_per_day,
count(DISTINCT e.event_name) as unique_event_types
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
WHERE p.person_id = 'per_39d2cd41d42a1e4e35899fa3d6a51a3d' -- Replace with power user ID
GROUP BY p.person_id
Power User Identification
Internal User Red Flags:
- >20 events per day consistently
- >200 unique days active over long periods
- Admin page access:
/dashboard,/my-listings,/create-listing - Brand searches: Google organic with no UTM campaigns
- >100 unique event types (exploring all features)
Legitimate Power User vs Internal User:
- Legitimate: High activity but focused on core user flows (search, book, manage)
- Internal: High activity across admin features, testing flows, unusual page access
Performance Optimization Checklist
Model Configuration
- ✅ Use
materialized='table'for all attribution models - ✅ Add appropriate
tagsfor organization - ✅ Include 90-day attribution window
- ✅ Use MAX aggregation strategy to avoid cartesian products
Query Optimization
- ✅ Partition window functions by
person_id - ✅ Order by
occurred_atfor temporal logic - ✅ Use deterministic tie-breakers for duplicate timestamps
- ✅ Include proper indexes on
person_id,event_id,touchpoint_id
Data Quality Validation
- ✅ Validate 1:1 event-to-touchpoint relationship
- ✅ Check attribution coverage rates (expect 60-75%)
- ✅ Monitor deduplication rates (expect 40-60% for normal users)
- ✅ Identify and analyze power users separately
Expected Attribution Metrics
Healthy Attribution System
- Attribution Coverage: 60-75% of events
- Deduplication Rate: 40-60% overall (higher for power users)
- Batch Compression: 5-7:1 ratio (events to batches)
- Processing Time: <2 minutes for 10M+ events
Red Flags
- <50% attribution coverage: Missing touchpoint sources
- >80% deduplication rate across all users: Dedup key too broad
- >15x row explosion: Missing attribution window or cartesian product
- >10 minute processing: Performance optimization needed
Real-World Example: Power User Analysis
Case Study: per_39d2cd41d42a1e4e35899fa3d6a51a3d
Profile:
- 12,012 events over 474 days (25 events/day)
- 9,887 touchpoints → 45 kept (99.54% deduplication)
- 315 unique days active over 15+ months
- 91 unique event types
Touchpoint Patterns:
- 4,014 homepage visits from Google organic
- 2,854 dashboard visits over 288 unique days
- 1,278 admin requests (
/requests/wyndham) - 432 listing creation activities
Attribution Analysis:
- 100% Google organic referral traffic
- No UTM campaigns (direct brand searches)
- Consistent daily usage of internal features
- Admin-level access patterns
Conclusion: Clear internal user or super power user. High deduplication rate (99.54%) is expected and correct - prevents internal usage from skewing customer attribution data.
Action: No fix needed. Consider filtering these users from marketing attribution reports if desired, but the deduplication is working correctly.
Debugging Workflow
1. Check Pipeline Health
-- Get overall attribution pipeline metrics
SELECT
'nexus_events' as table_name, count(*) as row_count FROM nexus_events
UNION ALL
SELECT 'nexus_touchpoints' as table_name, count(*) as row_count FROM nexus_touchpoints
UNION ALL
SELECT 'nexus_touchpoint_paths' as table_name, count(*) as row_count FROM nexus_touchpoint_paths
UNION ALL
SELECT 'nexus_touchpoint_path_batches' as table_name, count(*) as row_count FROM nexus_touchpoint_path_batches
ORDER BY row_count DESC
2. Analyze Deduplication Impact
-- Check deduplication effectiveness
SELECT
attribution_deduplication_key,
count(*) as total_touchpoints,
count(DISTINCT person_id) as unique_people,
avg_touchpoints_per_person = count(*) / count(DISTINCT person_id)
FROM nexus_touchpoints t
INNER JOIN nexus_person_participants p ON t.touchpoint_event_id = p.event_id
GROUP BY attribution_deduplication_key
HAVING count(*) > 10000 -- Focus on high-volume dedup keys
ORDER BY total_touchpoints DESC
3. Identify Power Users
-- Find users with suspicious activity levels
SELECT
person_id,
count(DISTINCT event_id) as total_events,
count(DISTINCT date(occurred_at)) as unique_days_active,
count(DISTINCT event_id) / count(DISTINCT date(occurred_at)) as avg_events_per_day
FROM nexus_events e
INNER JOIN nexus_person_participants p ON e.event_id = p.event_id
GROUP BY person_id
HAVING count(DISTINCT event_id) / count(DISTINCT date(occurred_at)) > 20 -- >20 events/day
ORDER BY total_events DESC
4. Validate Attribution Logic
-- Ensure 1:1 event-to-touchpoint relationship
SELECT
CASE
WHEN count(DISTINCT event_id) = count(*) THEN '✅ Perfect 1:1 relationship'
ELSE '❌ Multiple touchpoints per event detected'
END as validation_result,
count(*) as total_attribution_records,
count(DISTINCT event_id) as unique_events
FROM nexus_touchpoint_paths
When to Investigate vs Accept
Investigate Further
- Attribution coverage <50%: Likely missing touchpoint sources
- Processing time >10 minutes: Performance optimization needed
- Row explosion >20x: Cartesian product or missing attribution window
- Zero deduplication: Dedup key not working
Accept as Normal
- Attribution coverage 60-75%: Typical for web analytics
- High deduplication for power users (80%+): Expected for internal users
- Batch compression 5-7:1: Efficient batching working correctly
- Processing time <5 minutes: Good performance for large datasets
Performance Monitoring
Key Metrics to Track
- Attribution coverage rate: Target 60-75%
- Deduplication rate: Target 40-60% overall
- Batch compression ratio: Target 5-7:1
- Processing time: Target <5 minutes for 10M events
- Power user identification: Monitor users with >1000 touchpoints
Health Check Query
-- Comprehensive attribution health check
SELECT
'Attribution Coverage' as metric,
round(
count(DISTINCT tp.event_id) * 100.0 /
count(DISTINCT e.event_id), 2
) as percentage
FROM nexus_events e
LEFT JOIN nexus_touchpoint_paths tp ON e.event_id = tp.event_id
UNION ALL
SELECT
'Deduplication Rate' as metric,
round(
(count(DISTINCT t.touchpoint_id) - count(DISTINCT tp.last_touchpoint_id)) * 100.0 /
count(DISTINCT t.touchpoint_id), 2
) as percentage
FROM nexus_touchpoints t
LEFT JOIN nexus_touchpoint_paths tp ON t.touchpoint_id = tp.last_touchpoint_id
UNION ALL
SELECT
'Batch Compression Ratio' as metric,
round(
count(DISTINCT tp.touchpoint_path_id) * 1.0 /
count(DISTINCT tb.touchpoint_batch_id), 2
) as ratio
FROM nexus_touchpoint_paths tp
LEFT JOIN nexus_touchpoint_path_batches tb ON tp.touchpoint_batch_id = tb.touchpoint_batch_id
The Nexus attribution system is designed to handle these scenarios gracefully while maintaining data quality and performance.