Nexus Data

Some data sources don't contain individual user events — they deliver pre-aggregated numbers. Ad platforms report daily spend and impressions per ad. Search consoles report daily clicks and position per keyword. Analytics platforms report daily sessions per channel. None of these rows represent a single person doing a single thing; each row is already a rollup.

These sources still belong in Nexus. They share timelines, dashboards, and cross-source queries with behavioral events. But they follow a different pattern than sources with identity resolution.

This document describes that pattern.

When to Use This Pattern

Use the pre-aggregated source pattern when all of the following are true:

Rows are aggregates, not events. Each row represents a daily (or periodic) rollup of metrics for a set of dimensions — not a single user action.
No person identifiers. The data has no email, user ID, cookie, or other identifier that could feed identity resolution.
The value is in the numbers. The source exists to answer "how much?" and "what's trending?" — not "who did what?"

Examples

Source	Grain	Why Pre-Aggregated
Google Ads	ad × device × network × day	Aggregated ad metrics from Google Ads API
Facebook Ads (Meta)	ad × day	Aggregated ad metrics from Meta Marketing API
Google Search Console	keyword × page × country × device × day	Aggregated search metrics from GSC API
Bing Ads	campaign × ad group × keyword × day	Aggregated ad metrics from Microsoft Ads
Google Analytics (sessions)	channel × medium × day	Aggregated session counts

Counter-Examples (Don't Use This Pattern)

Source	Why Not
Segment	Individual page views and tracks with `user_id` / `anonymous_id`
Stripe	Individual invoice and payment events with `customer_id`
Salesforce	Individual contact and opportunity records with email / account

Architecture Overview

Pre-aggregated sources use the same four-layer structure as behavioral sources but produce a different set of outputs. The key differences:

Concern	Behavioral Source	Pre-Aggregated Source
`event_type`	Domain-specific (`order`, `appointment`, etc.)	Always `metric`
`event_name`	Verb phrase (`order placed`)	Noun phrase (`ad performance`, `search impressions`)
Entity identifiers	Yes (person, group)	No (`entities: []`)
Entity traits	Yes	No
Relationships	Often	No
Measurements	Sometimes	Always — this is the point
Dimensions	Sometimes	Always — this is the point
Attribution	Often	No

Output Models

A pre-aggregated source produces three union-layer outputs (compared to five or more for a behavioral source):

models/sources/{source}/
├── {source}_sources.yml              # Source definitions
├── base/
│   └── base_{source}_*.sql           # Thin views on raw tables
├── normalized/
│   └── {source}_*.sql                # Joins, deduplication, type cleanup
├── intermediate/
│   ├── {source}_*_events.sql         # Event formatting
│   ├── {source}_*_measurements.sql   # Unpivoted measurements
│   └── {source}_*_dimensions.sql     # Unpivoted dimensions
├── {source}_events.sql               # Union of event types
├── {source}_event_measurements.sql   # Union of measurement types
└── {source}_event_dimensions.sql     # Union of dimension types

No _entity_identifiers, _entity_traits, or _relationship_declarations models.

Pattern Details

1. Event Type: Always `metric`

All pre-aggregated events use event_type = 'metric'. This signals to downstream consumers that the row is an aggregate, not a discrete occurrence. Quantitative data lives in the measurements table.

select
    {{ nexus.create_nexus_id('event', ['date', 'ad_id', ...]) }} as event_id,
    cast(date as timestamp_tz) as occurred_at,
    'metric' as event_type,
    'ad performance' as event_name,
    1 as significance,
    'google_ads' as source,
    ...

2. Event Names: Cross-Platform by Design

Event names should be source-agnostic so the same name works across platforms. This enables cross-channel queries without knowing which platforms are configured.

Good Event Name	Why
`ad performance`	Works for Google Ads, Facebook Ads, Bing Ads
`search impressions`	Works for Google Search Console, Bing Webmaster
`search query impressions`	Works for any search console with keyword data

Filter by source to narrow to a specific platform:

WHERE event_name = 'ad performance' AND source = 'google_ads'

3. Measurements: Unpivoted Quantitative Values

Every quantitative metric from the source should be unpivoted into the measurements EAV table. This is the primary reason pre-aggregated sources exist in Nexus — it makes cross-source numeric comparison possible.

-- One CTE per measurement, then union all
spend_measurement as (
    select
        {{ nexus.create_nexus_id('event_measurement', ['event_id', "'spend'"]) }}
            as event_measurement_id,
        event_id,
        'spend' as measurement_name,
        spend as value,
        'USD' as value_unit,
        occurred_at,
        source
    from ad_events
    where spend is not null
),
...
select * from spend_measurement
union all
select * from impressions_measurement
union all
select * from clicks_measurement

What to Include

Include as Measurement	Exclude from Measurements
Spend, impressions, clicks, reach	CTR, CPC, CPM (derived ratios)
Conversions, conversion value	Frequency (derived ratio)
Video views, revenue	Position (ordinal, needs weighted avg)

Rule of thumb: If it's additive (you can SUM it across rows and get a meaningful number), it's a measurement. If it's a ratio or average, exclude it and let consumers recalculate from the base measurements.

Position (e.g., GSC average position) is a special case — it requires impression-weighted averaging. It can live on the event row for convenience but should not be a measurement.

Measurement Names: Cross-Platform Parity

Use the same measurement name across platforms for the same concept:

Concept	Measurement Name	Facebook Ads Source	Google Ads Source
Money spent	`spend`	`spend`	`cost_micros / 1e6`
Times shown	`impressions`	`impressions`	`impressions`
Link/ad clicks	`clicks`	`inline_link_clicks`	`clicks`
Conversion count	`conversions`	(not yet modeled)	`conversions`

4. Dimensions: Unpivoted Categorical Attributes

Every categorical attribute (campaign name, device type, ad network) should be unpivoted into the dimensions EAV table. This enables filtering and grouping across sources without source-specific knowledge.

campaign_name_dimension as (
    select
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'campaign_name'"]) }}
            as event_dimension_id,
        event_id,
        'campaign_name' as dimension_name,
        campaign_name as dimension_value,
        occurred_at,
        source
    from ad_events
    where campaign_name is not null
),
...

What to Include

All string/categorical fields that a consumer might want to filter or group by:

Ad hierarchy: campaign_id, campaign_name, ad_group_id, ad_group_name, ad_id, ad_name, ad_set_id, adset_name
Device and network: device, ad_network_type, search_type
Account: account_id, customer_id
Geography: country

Dimension Names: Cross-Platform Parity

Where possible, use the same dimension name for the same concept across platforms. Where platforms have genuinely different concepts, use platform-specific names:

Concept	Google Ads	Facebook Ads	Notes
Campaign	`campaign_name`	`campaign_name`	Same
Ad group	`ad_group_name`	`adset_name`	Different platform concepts
Ad	`ad_name`	`ad_name`	Same
Device	`device`	(not available)	Google Ads only
Network	`ad_network_type`	(not applicable)	Google Ads only

5. No Entity Resolution

Pre-aggregated sources have no person or group identifiers. Configure them with empty entities:

# dbt_project.yml
vars:
  nexus:
    sources:
      google_ads:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: []          # No identity resolution
        attribution: false    # No attribution

6. Event ID: Deterministic from Grain Columns

The event ID must be deterministic and unique for the source's grain. Include all grain columns — not just the ones that seem important:

-- Google Ads: grain is ad × device × network × day
{{ nexus.create_nexus_id('event', [
    'date', 'customer_id', 'ad_id', 'ad_group_id',
    'campaign_id', 'device', 'ad_network_type'
]) }} as event_id

-- Facebook Ads: grain is ad × day
{{ nexus.create_nexus_id('event', ['date', 'ad_id']) }} as event_id

-- GSC search impressions: grain is country × device × search_type × day
{{ nexus.create_nexus_id('event', [
    'date', 'country', 'device', 'search_type'
]) }} as event_id

If any grain column is missing from the ID, you'll get collisions and silently lose rows.

Cross-Source Queries

The primary payoff of this pattern is cross-source queries via the measurements and dimensions tables. These work without knowing which platforms are configured.

Total Ad Spend Across Platforms

SELECT
    DATE_TRUNC('month', e.occurred_at) AS month,
    e.source,
    SUM(m.value) AS total_spend
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'spend'
GROUP BY 1, 2
ORDER BY 1, 2

Impressions by Campaign (Any Platform)

SELECT
    d.dimension_value AS campaign_name,
    e.source,
    SUM(m.value) AS impressions
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
JOIN nexus_event_dimensions d ON e.event_id = d.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'impressions'
  AND d.dimension_name = 'campaign_name'
GROUP BY 1, 2
ORDER BY 3 DESC

Ad Spend vs Organic Search (Metric Events Across Types)

SELECT
    DATE_TRUNC('month', occurred_at) AS month,
    'Ad Spend' AS metric,
    SUM(m.value) AS value
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'spend'
GROUP BY 1

UNION ALL

SELECT
    DATE_TRUNC('month', occurred_at),
    'Organic Clicks',
    SUM(m.value)
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'search impressions'
  AND m.measurement_name = 'clicks'
GROUP BY 1
ORDER BY 1

Derived Ratios: Don't Store, Recalculate

A common mistake is storing CTR, CPC, CPM, or average position as measurements. These are derived ratios — they cannot be summed, and averaging them across rows produces incorrect results.

Wrong (averaging pre-computed ratios):

-- DO NOT DO THIS
AVG(ctr)  -- Gives equal weight to a row with 1 impression and a row with 10,000

Correct (recalculating from base measurements):

SUM(clicks) / NULLIF(SUM(impressions), 0) AS ctr,
SUM(spend) / NULLIF(SUM(clicks), 0) AS cpc,
SUM(spend) / NULLIF(SUM(impressions), 0) * 1000 AS cpm

Derived ratios can live on the event row for convenience (single-row lookups), but should never be in the measurements table.

Adding a New Pre-Aggregated Source

Checklist

Identify the grain. What combination of columns makes each row unique? Document this explicitly in the source YAML description.
Create base views. One select * view per raw table.
Create normalized models. Join dimension tables (ad history, campaign history) to the fact table. Deduplicate history tables using qualify row_number() over (partition by id order by updated_at desc) = 1. Filter to meaningful rows (WHERE spend > 0 OR impressions > 0).
Create intermediate event model. Format as nexus events with event_type = 'metric'. Use a source-agnostic event_name. Include all source columns on the event row for downstream extraction.
Create intermediate measurement model. One CTE per additive metric. Exclude derived ratios. Use cross-platform measurement names.
Create intermediate dimension model. One CTE per categorical attribute. Use cross-platform dimension names where possible.
Create union models. Wrap intermediates with dbt_utils.union_relations() for future extensibility (even if there's only one intermediate today).
Configure dbt_project.yml. Set events: true, measurements: true, dimensions: true, entities: [], attribution: false.
Document the source. Create a client-specific doc covering event schema, measurements, dimensions, grain, and query examples.

Template

A minimal pre-aggregated source with one event type needs 8 files:

models/sources/{source}/
├── {source}_sources.yml
├── base/
│   └── base_{source}_{fact_table}.sql
├── normalized/
│   └── {source}_{entity}.sql
├── intermediate/
│   ├── {source}_{entity}_events.sql
│   ├── {source}_{entity}_measurements.sql
│   └── {source}_{entity}_dimensions.sql
├── {source}_events.sql
├── {source}_event_measurements.sql
└── {source}_event_dimensions.sql

If the source has history/dimension tables to join (ad names, campaign names), add base views for those too.

Relationship to Other Patterns

Recommended Source Model Structure: Pre-aggregated sources follow the same four-layer pattern but skip entity models.
Event Measurements: The measurements facet documentation covers the full EAV pattern, measurement theory, and cross-source querying.
Event Dimensions: The dimensions facet documentation covers the full EAV pattern and cross-source filtering.
Create Source Models: The identity resolution guide — use that for behavioral sources with person/group identifiers.

When to Use This Pattern

Examples

Counter-Examples (Don't Use This Pattern)

Architecture Overview

Output Models

Pattern Details

1. Event Type: Always metric

2. Event Names: Cross-Platform by Design

3. Measurements: Unpivoted Quantitative Values

What to Include

Measurement Names: Cross-Platform Parity

4. Dimensions: Unpivoted Categorical Attributes

What to Include

Dimension Names: Cross-Platform Parity

5. No Entity Resolution

6. Event ID: Deterministic from Grain Columns

Cross-Source Queries

Total Ad Spend Across Platforms

Impressions by Campaign (Any Platform)

Ad Spend vs Organic Search (Metric Events Across Types)

Derived Ratios: Don't Store, Recalculate

Adding a New Pre-Aggregated Source

Checklist

Template

Relationship to Other Patterns

1. Event Type: Always `metric`