Event Dimensions
Cross-source categorical analysis through the dimensions facet — tag events with boolean and string properties using the standard EAV pattern.
Event dimensions bring categorical business concepts — is_revenue_earned,
is_expense_incurred, is_cash_received, invoice_id — into the nexus
compressed schema, making them queryable across sources without source-specific
knowledge.
Dimensions are the mirror image of
measurements. While measurements label
quantitative values (revenue = $100), dimensions label categorical
properties of events (is_revenue_earned = true, invoice_id = 'in_abc123').
Together, they close the gap between raw event data and metric-ready analysis.
The Problem
Measurements solved cross-source labeling for quantitative values. Asking "What
was my revenue?" no longer requires knowing that Stripe stores amounts in cents
or that Upwork calls them "Fixed-price" transactions — the revenue measurement
column handles that.
But a metric is not just a measurement. To compute "Total Revenue," an analyst needs two pieces of information:
- Which measurement column? —
revenue(measurements solve this) - Which events to aggregate over? — ??? (nothing currently solves this cross-source)
Without a cross-source dimension, the analyst must know that for Stripe it's
event_name = 'invoice paid', for Upwork it's event_name = 'payment received'
(but not deposit received), and for FreshBooks it's
event_name = 'invoice paid' (but not payment received, because that overlaps
with invoice paid for 184 of 186 payments). That is source-specific knowledge —
the exact thing measurements exist to eliminate.
Dimensions close this gap.
The Solution: Dimensions as a Facet
Dimensions follow the same faceted extraction pattern as measurements, identifiers, traits, states, and attribution. The intermediate EAV table stores:
- The entity is the event (
event_id) - The attribute is the dimension name (
dimension_name) - The value is the categorical payload (
dimension_value)
A single event can produce multiple dimension rows, each tagging a different cross-source property:
Source Record: Stripe Invoice #in_abc123 (paid)
→ dimension: dimension_name = 'is_revenue_earned', dimension_value = 'true'
→ dimension: dimension_name = 'invoice_id', dimension_value = 'in_abc123'
Source Record: Stripe Invoice #in_abc123 (created)
→ dimension: dimension_name = 'invoice_id', dimension_value = 'in_abc123'
→ (no is_revenue_earned row — creating an invoice is not earning revenue)
Source Record: Upwork Transaction #789 (Fixed-price payment)
→ dimension: dimension_name = 'is_revenue_earned', dimension_value = 'true'
Source Record: Upwork Transaction #790 (Withdrawal)
→ (no is_revenue_earned row — deposits are not revenue)
The dimension row is only present when the condition is true. Absence means
false. The core table pivots these into boolean columns with FALSE as the
default.
Dimensions vs States vs Events
These three concepts are often confused. Each serves a different purpose:
Events are things that happened. event_name = 'invoice paid' already tells
you what occurred. If you can answer a question by filtering on event_name,
you do not need a dimension.
States track entity lifecycle over time. "The subscription went from active
to canceled." States are about entities (persons, groups, subscriptions), not
events. Nexus already has nexus_entity_states for this.
Dimensions are cross-source business concepts applied to events. They exist when:
- The concept spans multiple sources or event types — you cannot determine
the answer from
event_namealone - It requires data-level reasoning — the engineer needs to examine the actual data to make the determination (e.g., "this payment is already covered by an invoice paid event, don't tag it")
- It's a property of the event, not a lifecycle state of an entity
Dimensions are concepts, not traditional source-level columns. The same concept can be derived completely differently depending on the source, the event type, or even the time period. The dimension model for each source encodes that reasoning independently; the pivoted result is a single, clean column that analysts use without source-specific knowledge.
Example: Cross-Source Derivation
is_revenue_earned is the quintessential dimension: Stripe invoice paid,
Upwork payment received, and FreshBooks invoice paid are all different
event_name values across different sources that all mean "revenue was earned."
The dimension unifies them.
A business unit dimension like business_unit ('dtc' or 'partner') might
apply to nearly all sources — but for timx it's derived from client_id, for
GA4 from the property ID or URL pattern, and for Invoca from phone number
routing. Each source's dimension model encodes its own derivation. The analyst
just sees business_unit = 'dtc'.
Example: Schema Drift Over Time
Dimensions also absorb schema drift within a single source. A call_type
dimension ('sales' or 'service') might be derived from a keypress signal in
historical data (2023–2025), but from an AI-managed classification field after a
phone system migration. The dimension model encodes the temporal boundary once:
CASE
WHEN occurred_at < '2025-03-01' THEN
CASE WHEN keypress = '1' THEN 'sales' ELSE 'service' END
WHEN ai_call_type IS NOT NULL THEN ai_call_type
ELSE 'unknown'
END as dimension_value
Downstream consumers see a stable call_type column regardless of era. The
schema change is invisible outside the dimension extraction model.
What Is Not a Dimension
A concept like is_deal_closed is not a dimension — it's either just event
filtering (WHERE event_name = 'deal closed') or a state (entity lifecycle).
Naming Convention: is_{noun}_{past_participle}
Boolean dimensions follow the pattern
is_{business_concept}_{qualifying_action}:
- The noun is the business concept (often a measurement name): revenue, expense, cash, lead
- The past participle is the qualifying condition — why this event is canonical for that concept: earned, incurred, received, created
- The
is_prefix marks the column as boolean and enables auto-casting in the pivot
| Dimension | What it says | Noun | Participle |
|---|---|---|---|
is_revenue_earned |
Revenue was earned on this event | revenue | earned |
is_cash_received |
Cash hit the bank on this event | cash | received |
is_expense_incurred |
An expense was taken on | expense | incurred |
is_tax_deductible |
This expense qualifies for tax deduction | tax | deductible |
is_lead_created |
A lead was created (system of record) | lead | created |
The convention reads like natural English in a WHERE clause:
WHERE d.is_revenue_earned. An LLM seeing that column name knows exactly what
to filter without documentation.
Non-boolean dimensions (string-valued like invoice_id) use plain descriptive
names without the is_ prefix.
Enabling Cross-Source Timing Analysis
The naming convention naturally enables comparative analysis between different
business events. For example, is_revenue_earned (Stripe invoice paid) and
is_cash_received (QuickBooks deposit) tag the same revenue at two different
moments — when the business earned it vs when cash settled in the bank:
-- "Compare when revenue was earned vs when cash was received"
SELECT
DATE_TRUNC(m.occurred_at, MONTH) AS month,
SUM(CASE WHEN d.is_revenue_earned THEN m.revenue END) AS revenue_earned,
SUM(CASE WHEN d.is_cash_received THEN m.revenue END) AS cash_received
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
GROUP BY 1
ORDER BY 1
The column names frame the analysis without any documentation or source-specific knowledge.
Which Concepts Deserve a Dimension?
Not every business concept needs a dimension. The test:
Does the concept span multiple sources, and does the business have a clear answer to "which events represent this concept?"
revenue: Yes. Multiple sources emit revenue events (Stripe, Upwork, FreshBooks). The business knows which events mean "revenue was earned." →is_revenue_earnedexpense: Yes. Expenses come from FreshBooks, QuickBooks, bank feeds. The business knows which events are real expenses vs transfers. →is_expense_incurredlead: Yes. GA4 fires a lead event from the website, Salesforce creates the CRM record. The business picks the system of record. →is_lead_creatednum_nights: No. "How many nights?" is inherently contextual. There is no cross-source "nights were realized" event. The analyst should specify the event type directly.
The list of standard dimensions should be short — truly cross-source business concepts only.
Schema
Core Table: nexus_event_dimensions
The analyst-facing table. One row per event, one column per dimension. Dimension
columns are discovered dynamically at compile time — new dimensions from new
sources appear automatically on the next dbt build.
| Column | Type | Required | Description |
|---|---|---|---|
event_id |
STRING | ✅ | FK to the event |
occurred_at |
TIMESTAMP | ✅ | When (denormalized from event) |
source |
STRING | ✅ | Source system |
{dimension_name} |
BOOLEAN or STRING | ❌ | One column per dimension. is_ prefix → BOOLEAN (default FALSE). Others → STRING (default NULL). |
The event_id linkage connects dimensions to the full nexus entity graph
through Events → Participants → Entities, enabling queries like "Revenue by
customer where is_revenue_earned" without any source-specific joins.
Type convention: Dimensions prefixed with is_ are automatically cast to
BOOLEAN in the pivot (with FALSE as default). All others remain STRING
(with NULL as default).
Intermediate Table: nexus_event_dimensions_unioned
The EAV canonical form. One row per dimension per event. This is what source models feed into and what the core table pivots from.
| Column | Type | Required | Description |
|---|---|---|---|
event_dimension_id |
STRING | ✅ | Unique ID (evt_dim_ prefix) |
event_id |
STRING | ✅ | FK to the event this dimension came from |
dimension_name |
STRING | ✅ | What property (the EAV discriminator) |
dimension_value |
STRING | ✅ | The value ('true'/'false' for booleans, or string) |
occurred_at |
TIMESTAMP | ✅ | When (denormalized from event) |
source |
STRING | ✅ | Source system |
You should not need to query this table directly — use the core table instead.
Extraction Pattern
Dimensions follow the standard four-layer architecture described in Recommended Source Model Structure.
Directory Structure
models/sources/{source_name}/
├── intermediate/
│ ├── {source}_invoice_events.sql ← existing
│ ├── {source}_invoice_measurements.sql ← existing
│ ├── {source}_invoice_dimensions.sql ← NEW
│ ├── {source}_person_identifiers.sql ← existing
│ └── {source}_group_identifiers.sql ← existing
├── {source}_events.sql ← existing (unions events)
├── {source}_entity_identifiers.sql ← existing (unions identifiers)
├── {source}_event_measurements.sql ← existing (unions measurements)
├── {source}_event_dimensions.sql ← NEW (unions dimensions)
Intermediate Dimension Models
Each intermediate dimension model:
- Reads from intermediate event models for
event_idlinkage and timing - Applies business logic to determine which events qualify for each dimension
- Produces one dimension row per qualifying event per dimension name
This mirrors how measurement models read from events and extract multiple measurement rows per event.
Example: stripe_invoice_dimensions.sql
{{ config(
materialized='table',
tags=['stripe', 'intermediate', 'dimensions']
) }}
-- Dimensions extracted from Stripe invoice events
-- is_revenue_earned: only on "invoice paid" events
-- invoice_id: on all invoice events (cross-source identifier)
WITH invoice_events AS (
SELECT * FROM {{ ref('stripe_invoice_events') }}
),
-- is_revenue_earned: tags invoice paid events as the canonical revenue event
revenue_earned_dimensions AS (
SELECT
{{ nexus.create_nexus_id('event_dimension', ['event_id', "'is_revenue_earned'"]) }}
as event_dimension_id,
event_id,
'is_revenue_earned' as dimension_name,
'true' as dimension_value,
occurred_at,
'stripe' as source
FROM invoice_events
WHERE event_name = 'invoice paid'
),
-- invoice_id: cross-source invoice identifier on all invoice events
invoice_id_dimensions AS (
SELECT
{{ nexus.create_nexus_id('event_dimension', ['event_id', "'invoice_id'"]) }}
as event_dimension_id,
event_id,
'invoice_id' as dimension_name,
invoice_id as dimension_value,
occurred_at,
'stripe' as source
FROM invoice_events
WHERE invoice_id IS NOT NULL
)
SELECT * FROM revenue_earned_dimensions
UNION ALL
SELECT * FROM invoice_id_dimensions
Source-Level Union: stripe_event_dimensions.sql
{{ config(
materialized='table',
tags=['nexus', 'dimensions', 'stripe']
) }}
-- Union all Stripe dimension types
{{ dbt_utils.union_relations(
relations=[
ref('stripe_invoice_dimensions')
]
) }}
ORDER BY occurred_at DESC
Querying Dimensions
The core nexus_event_dimensions table is pivoted — each dimension is a column.
Revenue with Dimension Filter
-- "What's my revenue this year?"
SELECT
EXTRACT(YEAR FROM m.occurred_at) AS year,
SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned
GROUP BY 1
ORDER BY 1
Revenue by Customer
-- "Revenue by customer"
SELECT
e.name AS customer,
SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
JOIN nexus_entity_participants p ON m.event_id = p.event_id
JOIN nexus_entities e ON p.entity_id = e.entity_id
WHERE d.is_revenue_earned
AND e.entity_type = 'group'
GROUP BY 1
ORDER BY 2 DESC
Discoverability
-- "What dimensions are available?"
SELECT * FROM nexus_event_dimensions_metadata
Where Business Logic Belongs
Semantic logic that requires data-level reasoning belongs in the transformation layer (dbt/nexus), not the semantic layer.
The engineer building the dimension model has maximum context: they are staring at the raw data, understanding edge cases, writing the joins. They know:
- "FreshBooks
payment receivedevents overlap withinvoice paid— 184 of 186 payments link to the same 181 invoices. Don't tag both." - "Two FreshBooks payments have no invoice — they're overpayments from 2019. Don't tag those either."
- "Upwork
deposit receivedis a bank withdrawal, not revenue."
The dimension is set once at build time, is tested, is version-controlled, and produces a clean boolean. Encoding this same logic in a Cube YAML file — far from the data, without the context of why certain events are excluded — is where mistakes happen.
The Boundary
| Operation type | Where it belongs | Example |
|---|---|---|
| Row-level data reasoning | dbt/nexus (transformation) | "Is this payment already covered by an invoice?" |
| Cross-source dimensions | dbt/nexus (transformation) | "This event is_revenue_earned" |
| Aggregation function | Semantic layer | Revenue = SUM(revenue) WHERE is_revenue_earned |
| Post-aggregation arithmetic | Semantic layer | profit = SUM(revenue) + SUM(expense) |
| Time intelligence | Semantic layer / query time | YoY, MTD, rolling averages |
Relationship to Measurements
Measurements and dimensions work together. Measurements label what the quantitative value is. Dimensions label which events should be aggregated for a given business concept.
erDiagram
EVENTS ||--o{ MEASUREMENTS : "what was the quantitative value"
EVENTS ||--o{ DIMENSIONS : "which events count for this concept"
EVENTS ||--o{ IDENTIFIERS : "who was involved"
EVENTS ||--o{ TRAITS : "what do we know about them"
EVENTS ||--o{ STATES : "what state changed"
EVENTS ||--o{ ATTRIBUTION : "what caused this"
EVENTS ||--o{ RELATIONSHIPS : "what connections exist"
The semantic layer defines metrics from these two facets:
Revenue = SUM(revenue) WHERE is_revenue_earnedExpenses = SUM(expense) WHERE is_expense_incurredProfit = Revenue + Expenses(expenses are negative)
Nexus handles the hard part — cross-source extraction, labeling, and per-event business logic. The metric layer becomes mechanical.
Configuration
Configure dimension sources in dbt_project.yml:
vars:
nexus:
sources:
stripe:
enabled: true
events: true
measurements: true
dimensions: true
entities: ['person', 'group']
freshbooks:
enabled: true
events: true
measurements: true
dimensions: true
entities: ['person', 'group']
upwork:
enabled: true
events: true
measurements: true
dimensions: true
entities: ['person', 'group']
The nexus package automatically unions all enabled source dimensions into
nexus_event_dimensions.
Dimension Catalog (YAML)
Dimensions are discovered automatically from the warehouse — every distinct
dimension_name in nexus_event_dimensions_unioned appears in the metadata
table. But you can enrich dimensions with qualitative information by adding
entries to dbt_project.yml under vars.nexus.dimensions:
vars:
nexus:
dimensions:
- name: page
label: Page
description: >
The page URL or path associated with the event (web or app surfaces).
aliases: [url, path]
tags: [web, navigation]
example_questions:
- Which pages get the most traffic?
- name: is_revenue_earned
label: Revenue earned
description: >
Whether the event counts as earned revenue for marketplace reporting.
tags: [financial, marketplace]
example_questions:
- How much revenue was earned last month?
- name: campaign_name
description: >
Human-readable advertising campaign name.
Catalog Schema
| Field | Required | Type | Description |
|---|---|---|---|
name |
yes | string | Must match the dimension_name in the EAV table |
label |
no | string | Human-readable display name (auto-generated from name if omitted) |
description |
no | string | What the dimension means and how it's derived |
aliases |
no | string[] | Alternative names people use for this dimension |
tags |
no | string[] | Organizational categories for filtering and grouping |
example_questions |
no | string[] | Business questions this dimension helps answer |
YAML entries are optional. Dimensions without YAML entries still appear in the
metadata table with warehouse-derived statistics — they just lack descriptions,
aliases, and tags. Conversely, a YAML entry for a dimension not yet materialized
in the warehouse appears with is_in_data = false, which is useful for
documenting planned dimensions ahead of implementation.
Metadata
The nexus_event_dimensions_metadata model combines warehouse statistics with
YAML-authored catalog fields into a single queryable table. It is rebuilt on
every dbt build and serves as the LLM-queryable catalog of what dimensions
exist, which sources contribute them, and what values they contain.
Discover Available Dimensions
SELECT * FROM nexus_event_dimensions_metadata
Metadata Table Schema
One row per (dimension_name, source) pair. Dimensions contributed by multiple
sources produce multiple rows.
| Column | Type | Description |
|---|---|---|
dimension_name |
STRING | Machine-readable identifier (matches dimension_name in the EAV) |
source |
STRING | Source system that contributed data (NULL for YAML-only entries) |
dimension_type |
STRING | boolean (for is_ prefix) or string |
label |
STRING | Human-readable display name (from YAML or auto-generated) |
description |
STRING | Plain-English explanation (from YAML) |
aliases |
STRING | Comma-separated alternative names (from YAML) |
tags |
STRING | Comma-separated organizational categories (from YAML) |
example_questions |
STRING | Semicolon-separated business questions (from YAML) |
example_value_1 |
STRING | Most frequent distinct value for this dimension/source pair |
example_value_2 |
STRING | Second most frequent value |
example_value_3 |
STRING | Third most frequent value |
first_seen_at |
TIMESTAMP | Earliest occurred_at for this dimension/source |
last_seen_at |
TIMESTAMP | Latest occurred_at for this dimension/source |
occurrence_count |
INTEGER | Total rows tagged with this dimension/source |
distinct_values |
INTEGER | Count of distinct dimension_values |
is_in_data |
BOOLEAN | true if the dimension exists in the warehouse, false if YAML-only |
LLM Context
An LLM writing SQL queries the dimension catalog alongside metric and measurement metadata:
SELECT
dimension_name,
dimension_type,
description,
example_value_1,
example_value_2,
example_value_3
FROM nexus_event_dimensions_metadata
WHERE source IS NOT NULL
ORDER BY dimension_name, source
The example values help an LLM understand what a dimension looks like in
practice without querying the underlying data — e.g., seeing
example_value_1 = 'dtc' on a business_unit dimension tells the LLM exactly
what filter values to use.
Browsing Dimensions in the Semantic Layer UI
The Semantic Layer page (/data-dictionary/semantic-layer, Dimensions tab)
provides a browsable interface over this metadata table. Each dimension shows
its sources, type, description, and a truncated example value in the table. A
peek pane expands to show all example values, per-source statistics, and the
YAML-authored fields.
Related Documentation
- Event Measurements — Quantitative labeling (the companion facet to dimensions)
- Metrics — Canonical business calculations built from measurements and dimensions
- Architecture Overview — Core tables ERD and key joins
- Pipeline Structure — Sources, Intermediate, and Core stages
- Recommended Source Model Structure — Four-layer extraction architecture