Pre-Aggregated Sources
Pattern for modeling pre-aggregated data (ad platforms, search analytics, etc.) as nexus metric events with measurements and dimensions — no entity resolution, no identity, just numbers.
Some data sources don't contain individual user events — they deliver pre-aggregated numbers. Ad platforms report daily spend and impressions per ad. Search consoles report daily clicks and position per keyword. Analytics platforms report daily sessions per channel. None of these rows represent a single person doing a single thing; each row is already a rollup.
These sources still belong in Nexus. They share timelines, dashboards, and cross-source queries with behavioral events. But they follow a different pattern than sources with identity resolution.
This document describes that pattern.
When to Use This Pattern
Use the pre-aggregated source pattern when all of the following are true:
- Rows are aggregates, not events. Each row represents a daily (or periodic) rollup of metrics for a set of dimensions — not a single user action.
- No person identifiers. The data has no email, user ID, cookie, or other identifier that could feed identity resolution.
- The value is in the numbers. The source exists to answer "how much?" and "what's trending?" — not "who did what?"
Examples
| Source | Grain | Why Pre-Aggregated |
|---|---|---|
| Google Ads | ad × device × network × day | Aggregated ad metrics from Google Ads API |
| Facebook Ads (Meta) | ad × day | Aggregated ad metrics from Meta Marketing API |
| Google Search Console | keyword × page × country × device × day | Aggregated search metrics from GSC API |
| Bing Ads | campaign × ad group × keyword × day | Aggregated ad metrics from Microsoft Ads |
| Google Analytics (sessions) | channel × medium × day | Aggregated session counts |
Counter-Examples (Don't Use This Pattern)
| Source | Why Not |
|---|---|
| Segment | Individual page views and tracks with user_id / anonymous_id |
| Stripe | Individual invoice and payment events with customer_id |
| Salesforce | Individual contact and opportunity records with email / account |
Architecture Overview
Pre-aggregated sources use the same four-layer structure as behavioral sources but produce a different set of outputs. The key differences:
| Concern | Behavioral Source | Pre-Aggregated Source |
|---|---|---|
event_type |
Domain-specific (order, appointment, etc.) |
Always metric |
event_name |
Verb phrase (order placed) |
Noun phrase (ad performance, search impressions) |
| Entity identifiers | Yes (person, group) | No (entities: []) |
| Entity traits | Yes | No |
| Relationships | Often | No |
| Measurements | Sometimes | Always — this is the point |
| Dimensions | Sometimes | Always — this is the point |
| Attribution | Often | No |
Output Models
A pre-aggregated source produces three union-layer outputs (compared to five or more for a behavioral source):
models/sources/{source}/
├── {source}_sources.yml # Source definitions
├── base/
│ └── base_{source}_*.sql # Thin views on raw tables
├── normalized/
│ └── {source}_*.sql # Joins, deduplication, type cleanup
├── intermediate/
│ ├── {source}_*_events.sql # Event formatting
│ ├── {source}_*_measurements.sql # Unpivoted measurements
│ └── {source}_*_dimensions.sql # Unpivoted dimensions
├── {source}_events.sql # Union of event types
├── {source}_event_measurements.sql # Union of measurement types
└── {source}_event_dimensions.sql # Union of dimension types
No _entity_identifiers, _entity_traits, or _relationship_declarations
models.
Pattern Details
1. Event Type: Always metric
All pre-aggregated events use event_type = 'metric'. This signals to
downstream consumers that the row is an aggregate, not a discrete occurrence.
Quantitative data lives in the measurements table.
select
{{ nexus.create_nexus_id('event', ['date', 'ad_id', ...]) }} as event_id,
cast(date as timestamp_tz) as occurred_at,
'metric' as event_type,
'ad performance' as event_name,
1 as significance,
'google_ads' as source,
...
2. Event Names: Cross-Platform by Design
Event names should be source-agnostic so the same name works across platforms. This enables cross-channel queries without knowing which platforms are configured.
| Good Event Name | Why |
|---|---|
ad performance |
Works for Google Ads, Facebook Ads, Bing Ads |
search impressions |
Works for Google Search Console, Bing Webmaster |
search query impressions |
Works for any search console with keyword data |
Filter by source to narrow to a specific platform:
WHERE event_name = 'ad performance' AND source = 'google_ads'
3. Measurements: Unpivoted Quantitative Values
Every quantitative metric from the source should be unpivoted into the measurements EAV table. This is the primary reason pre-aggregated sources exist in Nexus — it makes cross-source numeric comparison possible.
-- One CTE per measurement, then union all
spend_measurement as (
select
{{ nexus.create_nexus_id('event_measurement', ['event_id', "'spend'"]) }}
as event_measurement_id,
event_id,
'spend' as measurement_name,
spend as value,
'USD' as value_unit,
occurred_at,
source
from ad_events
where spend is not null
),
...
select * from spend_measurement
union all
select * from impressions_measurement
union all
select * from clicks_measurement
What to Include
| Include as Measurement | Exclude from Measurements |
|---|---|
| Spend, impressions, clicks, reach | CTR, CPC, CPM (derived ratios) |
| Conversions, conversion value | Frequency (derived ratio) |
| Video views, revenue | Position (ordinal, needs weighted avg) |
Rule of thumb: If it's additive (you can SUM it across rows and get a
meaningful number), it's a measurement. If it's a ratio or average, exclude it
and let consumers recalculate from the base measurements.
Position (e.g., GSC average position) is a special case — it requires impression-weighted averaging. It can live on the event row for convenience but should not be a measurement.
Measurement Names: Cross-Platform Parity
Use the same measurement name across platforms for the same concept:
| Concept | Measurement Name | Facebook Ads Source | Google Ads Source |
|---|---|---|---|
| Money spent | spend |
spend |
cost_micros / 1e6 |
| Times shown | impressions |
impressions |
impressions |
| Link/ad clicks | clicks |
inline_link_clicks |
clicks |
| Conversion count | conversions |
(not yet modeled) | conversions |
4. Dimensions: Unpivoted Categorical Attributes
Every categorical attribute (campaign name, device type, ad network) should be unpivoted into the dimensions EAV table. This enables filtering and grouping across sources without source-specific knowledge.
campaign_name_dimension as (
select
{{ nexus.create_nexus_id('event_dimension', ['event_id', "'campaign_name'"]) }}
as event_dimension_id,
event_id,
'campaign_name' as dimension_name,
campaign_name as dimension_value,
occurred_at,
source
from ad_events
where campaign_name is not null
),
...
What to Include
All string/categorical fields that a consumer might want to filter or group by:
- Ad hierarchy:
campaign_id,campaign_name,ad_group_id,ad_group_name,ad_id,ad_name,ad_set_id,adset_name - Device and network:
device,ad_network_type,search_type - Account:
account_id,customer_id - Geography:
country
Dimension Names: Cross-Platform Parity
Where possible, use the same dimension name for the same concept across platforms. Where platforms have genuinely different concepts, use platform-specific names:
| Concept | Google Ads | Facebook Ads | Notes |
|---|---|---|---|
| Campaign | campaign_name |
campaign_name |
Same |
| Ad group | ad_group_name |
adset_name |
Different platform concepts |
| Ad | ad_name |
ad_name |
Same |
| Device | device |
(not available) | Google Ads only |
| Network | ad_network_type |
(not applicable) | Google Ads only |
5. No Entity Resolution
Pre-aggregated sources have no person or group identifiers. Configure them with empty entities:
# dbt_project.yml
vars:
nexus:
sources:
google_ads:
enabled: true
events: true
measurements: true
dimensions: true
entities: [] # No identity resolution
attribution: false # No attribution
6. Event ID: Deterministic from Grain Columns
The event ID must be deterministic and unique for the source's grain. Include all grain columns — not just the ones that seem important:
-- Google Ads: grain is ad × device × network × day
{{ nexus.create_nexus_id('event', [
'date', 'customer_id', 'ad_id', 'ad_group_id',
'campaign_id', 'device', 'ad_network_type'
]) }} as event_id
-- Facebook Ads: grain is ad × day
{{ nexus.create_nexus_id('event', ['date', 'ad_id']) }} as event_id
-- GSC search impressions: grain is country × device × search_type × day
{{ nexus.create_nexus_id('event', [
'date', 'country', 'device', 'search_type'
]) }} as event_id
If any grain column is missing from the ID, you'll get collisions and silently lose rows.
Cross-Source Queries
The primary payoff of this pattern is cross-source queries via the measurements and dimensions tables. These work without knowing which platforms are configured.
Total Ad Spend Across Platforms
SELECT
DATE_TRUNC('month', e.occurred_at) AS month,
e.source,
SUM(m.value) AS total_spend
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
AND m.measurement_name = 'spend'
GROUP BY 1, 2
ORDER BY 1, 2
Impressions by Campaign (Any Platform)
SELECT
d.dimension_value AS campaign_name,
e.source,
SUM(m.value) AS impressions
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
JOIN nexus_event_dimensions d ON e.event_id = d.event_id
WHERE e.event_name = 'ad performance'
AND m.measurement_name = 'impressions'
AND d.dimension_name = 'campaign_name'
GROUP BY 1, 2
ORDER BY 3 DESC
Ad Spend vs Organic Search (Metric Events Across Types)
SELECT
DATE_TRUNC('month', occurred_at) AS month,
'Ad Spend' AS metric,
SUM(m.value) AS value
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'ad performance'
AND m.measurement_name = 'spend'
GROUP BY 1
UNION ALL
SELECT
DATE_TRUNC('month', occurred_at),
'Organic Clicks',
SUM(m.value)
FROM nexus_events e
JOIN nexus_event_measurements m ON e.event_id = m.event_id
WHERE e.event_name = 'search impressions'
AND m.measurement_name = 'clicks'
GROUP BY 1
ORDER BY 1
Derived Ratios: Don't Store, Recalculate
A common mistake is storing CTR, CPC, CPM, or average position as measurements. These are derived ratios — they cannot be summed, and averaging them across rows produces incorrect results.
Wrong (averaging pre-computed ratios):
-- DO NOT DO THIS
AVG(ctr) -- Gives equal weight to a row with 1 impression and a row with 10,000
Correct (recalculating from base measurements):
SUM(clicks) / NULLIF(SUM(impressions), 0) AS ctr,
SUM(spend) / NULLIF(SUM(clicks), 0) AS cpc,
SUM(spend) / NULLIF(SUM(impressions), 0) * 1000 AS cpm
Derived ratios can live on the event row for convenience (single-row lookups), but should never be in the measurements table.
Adding a New Pre-Aggregated Source
Checklist
-
Identify the grain. What combination of columns makes each row unique? Document this explicitly in the source YAML description.
-
Create base views. One
select *view per raw table. -
Create normalized models. Join dimension tables (ad history, campaign history) to the fact table. Deduplicate history tables using
qualify row_number() over (partition by id order by updated_at desc) = 1. Filter to meaningful rows (WHERE spend > 0 OR impressions > 0). -
Create intermediate event model. Format as nexus events with
event_type = 'metric'. Use a source-agnosticevent_name. Include all source columns on the event row for downstream extraction. -
Create intermediate measurement model. One CTE per additive metric. Exclude derived ratios. Use cross-platform measurement names.
-
Create intermediate dimension model. One CTE per categorical attribute. Use cross-platform dimension names where possible.
-
Create union models. Wrap intermediates with
dbt_utils.union_relations()for future extensibility (even if there's only one intermediate today). -
Configure
dbt_project.yml. Setevents: true,measurements: true,dimensions: true,entities: [],attribution: false. -
Document the source. Create a client-specific doc covering event schema, measurements, dimensions, grain, and query examples.
Template
A minimal pre-aggregated source with one event type needs 8 files:
models/sources/{source}/
├── {source}_sources.yml
├── base/
│ └── base_{source}_{fact_table}.sql
├── normalized/
│ └── {source}_{entity}.sql
├── intermediate/
│ ├── {source}_{entity}_events.sql
│ ├── {source}_{entity}_measurements.sql
│ └── {source}_{entity}_dimensions.sql
├── {source}_events.sql
├── {source}_event_measurements.sql
└── {source}_event_dimensions.sql
If the source has history/dimension tables to join (ad names, campaign names), add base views for those too.
Relationship to Other Patterns
- Recommended Source Model Structure: Pre-aggregated sources follow the same four-layer pattern but skip entity models.
- Event Measurements: The measurements facet documentation covers the full EAV pattern, measurement theory, and cross-source querying.
- Event Dimensions: The dimensions facet documentation covers the full EAV pattern and cross-source filtering.
- Create Source Models: The identity resolution guide — use that for behavioral sources with person/group identifiers.