Pre-Aggregated Sources

Pattern for modeling pre-aggregated data (ad platforms, search analytics, etc.) as nexus metric events with measurements and dimensions — no entity resolution, no identity, just numbers.

Some data sources don't contain individual user events — they deliver pre-aggregated numbers. Ad platforms report daily spend and impressions per ad. Search consoles report daily clicks and position per keyword. Analytics platforms report daily sessions per channel. None of these rows represent a single person doing a single thing; each row is already a rollup.

These sources still belong in Nexus. They share timelines, dashboards, and cross-source queries with behavioral events. But they follow a different pattern than sources with identity resolution.

This document describes that pattern.


When to Use This Pattern

Use the pre-aggregated source pattern when all of the following are true:

  1. Rows are aggregates, not events. Each row represents a daily (or periodic) rollup of metrics for a set of dimensions — not a single user action.
  2. No person identifiers. The data has no email, user ID, cookie, or other identifier that could feed identity resolution.
  3. The value is in the numbers. The source exists to answer "how much?" and "what's trending?" — not "who did what?"

Examples

Source Grain Why Pre-Aggregated
Google Ads ad × device × network × day Aggregated ad metrics from Google Ads API
Facebook Ads (Meta) ad × day Aggregated ad metrics from Meta Marketing API
Google Search Console keyword × page × country × device × day Aggregated search metrics from GSC API
Bing Ads campaign × ad group × keyword × day Aggregated ad metrics from Microsoft Ads
Google Analytics (sessions) channel × medium × day Aggregated session counts

Counter-Examples (Don't Use This Pattern)

Source Why Not
Segment Individual page views and tracks with user_id / anonymous_id
Stripe Individual invoice and payment events with customer_id
Salesforce Individual contact and opportunity records with email / account

Architecture Overview

Pre-aggregated sources use the same four-layer structure as behavioral sources but produce a different set of outputs. The key differences:

Concern Behavioral Source Pre-Aggregated Source
event_type Domain-specific (order, appointment, etc.) Always metric
event_name Verb phrase (order placed) Noun phrase (ad performance, search impressions)
Entity identifiers Yes (person, group) No (entities: [])
Entity traits Yes No
Relationships Often No
Measurements Sometimes Always — this is the point
Dimensions Sometimes Always — this is the point
Attribution Often No

Output Models

A pre-aggregated source produces three union-layer outputs (compared to five or more for a behavioral source):

models/sources/{source}/
├── {source}_sources.yml              # Source definitions
├── base/
│   └── base_{source}_*.sql           # Thin views on raw tables
├── normalized/
│   └── {source}_*.sql                # Joins, deduplication, type cleanup
├── intermediate/
│   ├── {source}_*_events.sql         # Event formatting
│   ├── {source}_*_measurements.sql   # Unpivoted measurements
│   └── {source}_*_dimensions.sql     # Unpivoted dimensions
├── {source}_events.sql               # Union of event types
├── {source}_event_measurements.sql   # Union of measurement types
└── {source}_event_dimensions.sql     # Union of dimension types

No _entity_identifiers, _entity_traits, or _relationship_declarations models.


Pattern Details

1. Event Type: Always metric

All pre-aggregated events use event_type = 'metric'. This signals to downstream consumers that the row is an aggregate, not a discrete occurrence. Quantitative data lives in the measurements table.

select
    {{ nexus.create_nexus_id('event', ['date', 'ad_id', ...]) }} as event_id,
    cast(date as timestamp_tz) as occurred_at,
    'metric' as event_type,
    'ad performance' as event_name,
    1 as significance,
    'google_ads' as source,
    ...

2. Event Names: Cross-Platform by Design

Event names should be source-agnostic so the same name works across platforms. This enables cross-channel queries without knowing which platforms are configured.

Good Event Name Why
ad performance Works for Google Ads, Facebook Ads, Bing Ads
search impressions Works for Google Search Console, Bing Webmaster
search query impressions Works for any search console with keyword data

Filter by source to narrow to a specific platform:

WHERE event_name = 'ad performance' AND source = 'google_ads'

3. Measurements: Unpivoted Quantitative Values

Every quantitative metric from the source should be unpivoted into the measurements EAV table. This is the primary reason pre-aggregated sources exist in Nexus — it makes cross-source numeric comparison possible.

-- One CTE per measurement, then union all
spend_measurement as (
    select
        {{ nexus.create_nexus_id('event_measurement', ['event_id', "'spend'"]) }}
            as event_measurement_id,
        event_id,
        'spend' as measurement_name,
        spend as value,
        'USD' as value_unit,
        occurred_at,
        source
    from ad_events
    where spend is not null
),
...
select * from spend_measurement
union all
select * from impressions_measurement
union all
select * from clicks_measurement

What to Include

Include as Measurement Exclude from Measurements
Spend, impressions, clicks, reach CTR, CPC, CPM (derived ratios)
Conversions, conversion value Frequency (derived ratio)
Video views, revenue Position (ordinal, needs weighted avg)

Rule of thumb: If it's additive (you can SUM it across rows and get a meaningful number), it's a measurement. If it's a ratio or average, exclude it and let consumers recalculate from the base measurements.

Position (e.g., GSC average position) is a special case — it requires impression-weighted averaging. It can live on the event row for convenience but should not be a measurement.

Measurement Names: Cross-Platform Parity

Use the same measurement name across platforms for the same concept:

Concept Measurement Name Facebook Ads Source Google Ads Source
Money spent spend spend cost_micros / 1e6
Times shown impressions impressions impressions
Link/ad clicks clicks inline_link_clicks clicks
Conversion count conversions (not yet modeled) conversions

4. Dimensions: Unpivoted Categorical Attributes

Every categorical attribute (campaign name, device type, ad network) should be unpivoted into the dimensions EAV table. This enables filtering and grouping across sources without source-specific knowledge.

campaign_name_dimension as (
    select
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'campaign_name'"]) }}
            as event_dimension_id,
        event_id,
        'campaign_name' as dimension_name,
        campaign_name as dimension_value,
        occurred_at,
        source
    from ad_events
    where campaign_name is not null
),
...

What to Include

All string/categorical fields that a consumer might want to filter or group by:

  • Ad hierarchy: campaign_id, campaign_name, ad_group_id, ad_group_name, ad_id, ad_name, ad_set_id, adset_name
  • Device and network: device, ad_network_type, search_type
  • Account: account_id, customer_id
  • Geography: country

Dimension Names: Cross-Platform Parity

Where possible, use the same dimension name for the same concept across platforms. Where platforms have genuinely different concepts, use platform-specific names:

Concept Google Ads Facebook Ads Notes
Campaign campaign_name campaign_name Same
Ad group ad_group_name adset_name Different platform concepts
Ad ad_name ad_name Same
Device device (not available) Google Ads only
Network ad_network_type (not applicable) Google Ads only

5. No Entity Resolution

Pre-aggregated sources have no person or group identifiers. Configure them with empty entities:

# dbt_project.yml
vars:
  nexus:
    sources:
      google_ads:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: []          # No identity resolution
        attribution: false    # No attribution

6. Event ID: Deterministic from Grain Columns

The event ID must be deterministic and unique for the source's grain. Include all grain columns — not just the ones that seem important:

-- Google Ads: grain is ad × device × network × day
{{ nexus.create_nexus_id('event', [
    'date', 'customer_id', 'ad_id', 'ad_group_id',
    'campaign_id', 'device', 'ad_network_type'
]) }} as event_id

-- Facebook Ads: grain is ad × day
{{ nexus.create_nexus_id('event', ['date', 'ad_id']) }} as event_id

-- GSC search impressions: grain is country × device × search_type × day
{{ nexus.create_nexus_id('event', [
    'date', 'country', 'device', 'search_type'
]) }} as event_id

If any grain column is missing from the ID, you'll get collisions and silently lose rows.


Cross-Source Queries

The primary payoff of this pattern is cross-source queries via the measurements and dimensions tables. These work without knowing which platforms are configured.

Total Ad Spend Across Platforms

SELECT
    DATE_TRUNC('month', e.occurred_at) AS month,
    e.source,
    SUM(m.value) AS total_spend
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'spend'
GROUP BY 1, 2
ORDER BY 1, 2

Impressions by Campaign (Any Platform)

SELECT
    d.dimension_value AS campaign_name,
    e.source,
    SUM(m.value) AS impressions
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
JOIN nexus_event_dimensions d ON e.event_id = d.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'impressions'
  AND d.dimension_name = 'campaign_name'
GROUP BY 1, 2
ORDER BY 3 DESC

Ad Spend vs Organic Search (Metric Events Across Types)

SELECT
    DATE_TRUNC('month', occurred_at) AS month,
    'Ad Spend' AS metric,
    SUM(m.value) AS value
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
  AND m.measurement_name = 'spend'
GROUP BY 1

UNION ALL

SELECT
    DATE_TRUNC('month', occurred_at),
    'Organic Clicks',
    SUM(m.value)
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'search impressions'
  AND m.measurement_name = 'clicks'
GROUP BY 1
ORDER BY 1

Derived Ratios: Don't Store, Recalculate

A common mistake is storing CTR, CPC, CPM, or average position as measurements. These are derived ratios — they cannot be summed, and averaging them across rows produces incorrect results.

Wrong (averaging pre-computed ratios):

-- DO NOT DO THIS
AVG(ctr)  -- Gives equal weight to a row with 1 impression and a row with 10,000

Correct (recalculating from base measurements):

SUM(clicks) / NULLIF(SUM(impressions), 0) AS ctr,
SUM(spend) / NULLIF(SUM(clicks), 0) AS cpc,
SUM(spend) / NULLIF(SUM(impressions), 0) * 1000 AS cpm

Derived ratios can live on the event row for convenience (single-row lookups), but should never be in the measurements table.


Adding a New Pre-Aggregated Source

Checklist

  1. Identify the grain. What combination of columns makes each row unique? Document this explicitly in the source YAML description.

  2. Create base views. One select * view per raw table.

  3. Create normalized models. Join dimension tables (ad history, campaign history) to the fact table. Deduplicate history tables using qualify row_number() over (partition by id order by updated_at desc) = 1. Filter to meaningful rows (WHERE spend > 0 OR impressions > 0).

  4. Create intermediate event model. Format as nexus events with event_type = 'metric'. Use a source-agnostic event_name. Include all source columns on the event row for downstream extraction.

  5. Create intermediate measurement model. One CTE per additive metric. Exclude derived ratios. Use cross-platform measurement names.

  6. Create intermediate dimension model. One CTE per categorical attribute. Use cross-platform dimension names where possible.

  7. Create union models. Wrap intermediates with dbt_utils.union_relations() for future extensibility (even if there's only one intermediate today).

  8. Configure dbt_project.yml. Set events: true, measurements: true, dimensions: true, entities: [], attribution: false.

  9. Document the source. Create a client-specific doc covering event schema, measurements, dimensions, grain, and query examples.

Template

A minimal pre-aggregated source with one event type needs 8 files:

models/sources/{source}/
├── {source}_sources.yml
├── base/
│   └── base_{source}_{fact_table}.sql
├── normalized/
│   └── {source}_{entity}.sql
├── intermediate/
│   ├── {source}_{entity}_events.sql
│   ├── {source}_{entity}_measurements.sql
│   └── {source}_{entity}_dimensions.sql
├── {source}_events.sql
├── {source}_event_measurements.sql
└── {source}_event_dimensions.sql

If the source has history/dimension tables to join (ad names, campaign names), add base views for those too.


Relationship to Other Patterns

  • Recommended Source Model Structure: Pre-aggregated sources follow the same four-layer pattern but skip entity models.
  • Event Measurements: The measurements facet documentation covers the full EAV pattern, measurement theory, and cross-source querying.
  • Event Dimensions: The dimensions facet documentation covers the full EAV pattern and cross-source filtering.
  • Create Source Models: The identity resolution guide — use that for behavioral sources with person/group identifiers.