Event Dimensions

Cross-source categorical analysis through the dimensions facet — tag events with boolean and string properties using the standard EAV pattern.

Event dimensions bring categorical business concepts — is_revenue_earned, is_expense_incurred, is_cash_received, invoice_id — into the nexus compressed schema, making them queryable across sources without source-specific knowledge.

Dimensions are the mirror image of measurements. While measurements label quantitative values (revenue = $100), dimensions label categorical properties of events (is_revenue_earned = true, invoice_id = 'in_abc123'). Together, they close the gap between raw event data and metric-ready analysis.


The Problem

Measurements solved cross-source labeling for quantitative values. Asking "What was my revenue?" no longer requires knowing that Stripe stores amounts in cents or that Upwork calls them "Fixed-price" transactions — the revenue measurement column handles that.

But a metric is not just a measurement. To compute "Total Revenue," an analyst needs two pieces of information:

  1. Which measurement column?revenue (measurements solve this)
  2. Which events to aggregate over? — ??? (nothing currently solves this cross-source)

Without a cross-source dimension, the analyst must know that for Stripe it's event_name = 'invoice paid', for Upwork it's event_name = 'payment received' (but not deposit received), and for FreshBooks it's event_name = 'invoice paid' (but not payment received, because that overlaps with invoice paid for 184 of 186 payments). That is source-specific knowledge — the exact thing measurements exist to eliminate.

Dimensions close this gap.


The Solution: Dimensions as a Facet

Dimensions follow the same faceted extraction pattern as measurements, identifiers, traits, states, and attribution. The intermediate EAV table stores:

  • The entity is the event (event_id)
  • The attribute is the dimension name (dimension_name)
  • The value is the categorical payload (dimension_value)

A single event can produce multiple dimension rows, each tagging a different cross-source property:

Source Record: Stripe Invoice #in_abc123 (paid)
  → dimension: dimension_name = 'is_revenue_earned',  dimension_value = 'true'
  → dimension: dimension_name = 'invoice_id',         dimension_value = 'in_abc123'

Source Record: Stripe Invoice #in_abc123 (created)
  → dimension: dimension_name = 'invoice_id',         dimension_value = 'in_abc123'
  → (no is_revenue_earned row — creating an invoice is not earning revenue)

Source Record: Upwork Transaction #789 (Fixed-price payment)
  → dimension: dimension_name = 'is_revenue_earned',  dimension_value = 'true'

Source Record: Upwork Transaction #790 (Withdrawal)
  → (no is_revenue_earned row — deposits are not revenue)

The dimension row is only present when the condition is true. Absence means false. The core table pivots these into boolean columns with FALSE as the default.


Dimensions vs States vs Events

These three concepts are often confused. Each serves a different purpose:

Events are things that happened. event_name = 'invoice paid' already tells you what occurred. If you can answer a question by filtering on event_name, you do not need a dimension.

States track entity lifecycle over time. "The subscription went from active to canceled." States are about entities (persons, groups, subscriptions), not events. Nexus already has nexus_entity_states for this.

Dimensions are cross-source business concepts applied to events. They exist when:

  • The concept spans multiple sources or event types — you cannot determine the answer from event_name alone
  • It requires data-level reasoning — the engineer needs to examine the actual data to make the determination (e.g., "this payment is already covered by an invoice paid event, don't tag it")
  • It's a property of the event, not a lifecycle state of an entity

Dimensions are concepts, not traditional source-level columns. The same concept can be derived completely differently depending on the source, the event type, or even the time period. The dimension model for each source encodes that reasoning independently; the pivoted result is a single, clean column that analysts use without source-specific knowledge.

Example: Cross-Source Derivation

is_revenue_earned is the quintessential dimension: Stripe invoice paid, Upwork payment received, and FreshBooks invoice paid are all different event_name values across different sources that all mean "revenue was earned." The dimension unifies them.

A business unit dimension like business_unit ('dtc' or 'partner') might apply to nearly all sources — but for timx it's derived from client_id, for GA4 from the property ID or URL pattern, and for Invoca from phone number routing. Each source's dimension model encodes its own derivation. The analyst just sees business_unit = 'dtc'.

Example: Schema Drift Over Time

Dimensions also absorb schema drift within a single source. A call_type dimension ('sales' or 'service') might be derived from a keypress signal in historical data (2023–2025), but from an AI-managed classification field after a phone system migration. The dimension model encodes the temporal boundary once:

CASE
    WHEN occurred_at < '2025-03-01' THEN
        CASE WHEN keypress = '1' THEN 'sales' ELSE 'service' END
    WHEN ai_call_type IS NOT NULL THEN ai_call_type
    ELSE 'unknown'
END as dimension_value

Downstream consumers see a stable call_type column regardless of era. The schema change is invisible outside the dimension extraction model.

What Is Not a Dimension

A concept like is_deal_closed is not a dimension — it's either just event filtering (WHERE event_name = 'deal closed') or a state (entity lifecycle).


Naming Convention: is_{noun}_{past_participle}

Boolean dimensions follow the pattern is_{business_concept}_{qualifying_action}:

  • The noun is the business concept (often a measurement name): revenue, expense, cash, lead
  • The past participle is the qualifying condition — why this event is canonical for that concept: earned, incurred, received, created
  • The is_ prefix marks the column as boolean and enables auto-casting in the pivot
Dimension What it says Noun Participle
is_revenue_earned Revenue was earned on this event revenue earned
is_cash_received Cash hit the bank on this event cash received
is_expense_incurred An expense was taken on expense incurred
is_tax_deductible This expense qualifies for tax deduction tax deductible
is_lead_created A lead was created (system of record) lead created

The convention reads like natural English in a WHERE clause: WHERE d.is_revenue_earned. An LLM seeing that column name knows exactly what to filter without documentation.

Non-boolean dimensions (string-valued like invoice_id) use plain descriptive names without the is_ prefix.

Enabling Cross-Source Timing Analysis

The naming convention naturally enables comparative analysis between different business events. For example, is_revenue_earned (Stripe invoice paid) and is_cash_received (QuickBooks deposit) tag the same revenue at two different moments — when the business earned it vs when cash settled in the bank:

-- "Compare when revenue was earned vs when cash was received"
SELECT
    DATE_TRUNC(m.occurred_at, MONTH) AS month,
    SUM(CASE WHEN d.is_revenue_earned THEN m.revenue END) AS revenue_earned,
    SUM(CASE WHEN d.is_cash_received THEN m.revenue END) AS cash_received
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
GROUP BY 1
ORDER BY 1

The column names frame the analysis without any documentation or source-specific knowledge.

Which Concepts Deserve a Dimension?

Not every business concept needs a dimension. The test:

Does the concept span multiple sources, and does the business have a clear answer to "which events represent this concept?"

  • revenue: Yes. Multiple sources emit revenue events (Stripe, Upwork, FreshBooks). The business knows which events mean "revenue was earned." → is_revenue_earned
  • expense: Yes. Expenses come from FreshBooks, QuickBooks, bank feeds. The business knows which events are real expenses vs transfers. → is_expense_incurred
  • lead: Yes. GA4 fires a lead event from the website, Salesforce creates the CRM record. The business picks the system of record. → is_lead_created
  • num_nights: No. "How many nights?" is inherently contextual. There is no cross-source "nights were realized" event. The analyst should specify the event type directly.

The list of standard dimensions should be short — truly cross-source business concepts only.


Schema

Core Table: nexus_event_dimensions

The analyst-facing table. One row per event, one column per dimension. Dimension columns are discovered dynamically at compile time — new dimensions from new sources appear automatically on the next dbt build.

Column Type Required Description
event_id STRING FK to the event
occurred_at TIMESTAMP When (denormalized from event)
source STRING Source system
{dimension_name} BOOLEAN or STRING One column per dimension. is_ prefix → BOOLEAN (default FALSE). Others → STRING (default NULL).

The event_id linkage connects dimensions to the full nexus entity graph through Events → Participants → Entities, enabling queries like "Revenue by customer where is_revenue_earned" without any source-specific joins.

Type convention: Dimensions prefixed with is_ are automatically cast to BOOLEAN in the pivot (with FALSE as default). All others remain STRING (with NULL as default).

Intermediate Table: nexus_event_dimensions_unioned

The EAV canonical form. One row per dimension per event. This is what source models feed into and what the core table pivots from.

Column Type Required Description
event_dimension_id STRING Unique ID (evt_dim_ prefix)
event_id STRING FK to the event this dimension came from
dimension_name STRING What property (the EAV discriminator)
dimension_value STRING The value ('true'/'false' for booleans, or string)
occurred_at TIMESTAMP When (denormalized from event)
source STRING Source system

You should not need to query this table directly — use the core table instead.


Extraction Pattern

Dimensions follow the standard four-layer architecture described in Recommended Source Model Structure.

Directory Structure

models/sources/{source_name}/
├── intermediate/
│   ├── {source}_invoice_events.sql             ← existing
│   ├── {source}_invoice_measurements.sql       ← existing
│   ├── {source}_invoice_dimensions.sql         ← NEW
│   ├── {source}_person_identifiers.sql         ← existing
│   └── {source}_group_identifiers.sql          ← existing
├── {source}_events.sql                         ← existing (unions events)
├── {source}_entity_identifiers.sql             ← existing (unions identifiers)
├── {source}_event_measurements.sql             ← existing (unions measurements)
├── {source}_event_dimensions.sql               ← NEW (unions dimensions)

Intermediate Dimension Models

Each intermediate dimension model:

  1. Reads from intermediate event models for event_id linkage and timing
  2. Applies business logic to determine which events qualify for each dimension
  3. Produces one dimension row per qualifying event per dimension name

This mirrors how measurement models read from events and extract multiple measurement rows per event.

Example: stripe_invoice_dimensions.sql

{{ config(
    materialized='table',
    tags=['stripe', 'intermediate', 'dimensions']
) }}

-- Dimensions extracted from Stripe invoice events
-- is_revenue_earned: only on "invoice paid" events
-- invoice_id: on all invoice events (cross-source identifier)

WITH invoice_events AS (
    SELECT * FROM {{ ref('stripe_invoice_events') }}
),

-- is_revenue_earned: tags invoice paid events as the canonical revenue event
revenue_earned_dimensions AS (
    SELECT
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'is_revenue_earned'"]) }}
            as event_dimension_id,
        event_id,
        'is_revenue_earned' as dimension_name,
        'true' as dimension_value,
        occurred_at,
        'stripe' as source
    FROM invoice_events
    WHERE event_name = 'invoice paid'
),

-- invoice_id: cross-source invoice identifier on all invoice events
invoice_id_dimensions AS (
    SELECT
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'invoice_id'"]) }}
            as event_dimension_id,
        event_id,
        'invoice_id' as dimension_name,
        invoice_id as dimension_value,
        occurred_at,
        'stripe' as source
    FROM invoice_events
    WHERE invoice_id IS NOT NULL
)

SELECT * FROM revenue_earned_dimensions
UNION ALL
SELECT * FROM invoice_id_dimensions

Source-Level Union: stripe_event_dimensions.sql

{{ config(
    materialized='table',
    tags=['nexus', 'dimensions', 'stripe']
) }}

-- Union all Stripe dimension types
{{ dbt_utils.union_relations(
    relations=[
        ref('stripe_invoice_dimensions')
    ]
) }}

ORDER BY occurred_at DESC

Querying Dimensions

The core nexus_event_dimensions table is pivoted — each dimension is a column.

Revenue with Dimension Filter

-- "What's my revenue this year?"
SELECT
    EXTRACT(YEAR FROM m.occurred_at) AS year,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned
GROUP BY 1
ORDER BY 1

Revenue by Customer

-- "Revenue by customer"
SELECT
    e.name AS customer,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
JOIN nexus_entity_participants p ON m.event_id = p.event_id
JOIN nexus_entities e ON p.entity_id = e.entity_id
WHERE d.is_revenue_earned
  AND e.entity_type = 'group'
GROUP BY 1
ORDER BY 2 DESC

Discoverability

-- "What dimensions are available?"
SELECT * FROM nexus_event_dimensions_metadata

Where Business Logic Belongs

Semantic logic that requires data-level reasoning belongs in the transformation layer (dbt/nexus), not the semantic layer.

The engineer building the dimension model has maximum context: they are staring at the raw data, understanding edge cases, writing the joins. They know:

  • "FreshBooks payment received events overlap with invoice paid — 184 of 186 payments link to the same 181 invoices. Don't tag both."
  • "Two FreshBooks payments have no invoice — they're overpayments from 2019. Don't tag those either."
  • "Upwork deposit received is a bank withdrawal, not revenue."

The dimension is set once at build time, is tested, is version-controlled, and produces a clean boolean. Encoding this same logic in a Cube YAML file — far from the data, without the context of why certain events are excluded — is where mistakes happen.

The Boundary

Operation type Where it belongs Example
Row-level data reasoning dbt/nexus (transformation) "Is this payment already covered by an invoice?"
Cross-source dimensions dbt/nexus (transformation) "This event is_revenue_earned"
Aggregation function Semantic layer Revenue = SUM(revenue) WHERE is_revenue_earned
Post-aggregation arithmetic Semantic layer profit = SUM(revenue) + SUM(expense)
Time intelligence Semantic layer / query time YoY, MTD, rolling averages

Relationship to Measurements

Measurements and dimensions work together. Measurements label what the quantitative value is. Dimensions label which events should be aggregated for a given business concept.

erDiagram
    EVENTS ||--o{ MEASUREMENTS : "what was the quantitative value"
    EVENTS ||--o{ DIMENSIONS : "which events count for this concept"
    EVENTS ||--o{ IDENTIFIERS : "who was involved"
    EVENTS ||--o{ TRAITS : "what do we know about them"
    EVENTS ||--o{ STATES : "what state changed"
    EVENTS ||--o{ ATTRIBUTION : "what caused this"
    EVENTS ||--o{ RELATIONSHIPS : "what connections exist"

The semantic layer defines metrics from these two facets:

  • Revenue = SUM(revenue) WHERE is_revenue_earned
  • Expenses = SUM(expense) WHERE is_expense_incurred
  • Profit = Revenue + Expenses (expenses are negative)

Nexus handles the hard part — cross-source extraction, labeling, and per-event business logic. The metric layer becomes mechanical.


Configuration

Configure dimension sources in dbt_project.yml:

vars:
  nexus:
    sources:
      stripe:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']
      freshbooks:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']
      upwork:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']

The nexus package automatically unions all enabled source dimensions into nexus_event_dimensions.


Dimension Catalog (YAML)

Dimensions are discovered automatically from the warehouse — every distinct dimension_name in nexus_event_dimensions_unioned appears in the metadata table. But you can enrich dimensions with qualitative information by adding entries to dbt_project.yml under vars.nexus.dimensions:

vars:
  nexus:
    dimensions:
      - name: page
        label: Page
        description: >
          The page URL or path associated with the event (web or app surfaces).
        aliases: [url, path]
        tags: [web, navigation]
        example_questions:
          - Which pages get the most traffic?

      - name: is_revenue_earned
        label: Revenue earned
        description: >
          Whether the event counts as earned revenue for marketplace reporting.
        tags: [financial, marketplace]
        example_questions:
          - How much revenue was earned last month?

      - name: campaign_name
        description: >
          Human-readable advertising campaign name.

Catalog Schema

Field Required Type Description
name yes string Must match the dimension_name in the EAV table
label no string Human-readable display name (auto-generated from name if omitted)
description no string What the dimension means and how it's derived
aliases no string[] Alternative names people use for this dimension
tags no string[] Organizational categories for filtering and grouping
example_questions no string[] Business questions this dimension helps answer

YAML entries are optional. Dimensions without YAML entries still appear in the metadata table with warehouse-derived statistics — they just lack descriptions, aliases, and tags. Conversely, a YAML entry for a dimension not yet materialized in the warehouse appears with is_in_data = false, which is useful for documenting planned dimensions ahead of implementation.


Metadata

The nexus_event_dimensions_metadata model combines warehouse statistics with YAML-authored catalog fields into a single queryable table. It is rebuilt on every dbt build and serves as the LLM-queryable catalog of what dimensions exist, which sources contribute them, and what values they contain.

Discover Available Dimensions

SELECT * FROM nexus_event_dimensions_metadata

Metadata Table Schema

One row per (dimension_name, source) pair. Dimensions contributed by multiple sources produce multiple rows.

Column Type Description
dimension_name STRING Machine-readable identifier (matches dimension_name in the EAV)
source STRING Source system that contributed data (NULL for YAML-only entries)
dimension_type STRING boolean (for is_ prefix) or string
label STRING Human-readable display name (from YAML or auto-generated)
description STRING Plain-English explanation (from YAML)
aliases STRING Comma-separated alternative names (from YAML)
tags STRING Comma-separated organizational categories (from YAML)
example_questions STRING Semicolon-separated business questions (from YAML)
example_value_1 STRING Most frequent distinct value for this dimension/source pair
example_value_2 STRING Second most frequent value
example_value_3 STRING Third most frequent value
first_seen_at TIMESTAMP Earliest occurred_at for this dimension/source
last_seen_at TIMESTAMP Latest occurred_at for this dimension/source
occurrence_count INTEGER Total rows tagged with this dimension/source
distinct_values INTEGER Count of distinct dimension_values
is_in_data BOOLEAN true if the dimension exists in the warehouse, false if YAML-only

LLM Context

An LLM writing SQL queries the dimension catalog alongside metric and measurement metadata:

SELECT
    dimension_name,
    dimension_type,
    description,
    example_value_1,
    example_value_2,
    example_value_3
FROM nexus_event_dimensions_metadata
WHERE source IS NOT NULL
ORDER BY dimension_name, source

The example values help an LLM understand what a dimension looks like in practice without querying the underlying data — e.g., seeing example_value_1 = 'dtc' on a business_unit dimension tells the LLM exactly what filter values to use.

Browsing Dimensions in the Semantic Layer UI

The Semantic Layer page (/data-dictionary/semantic-layer, Dimensions tab) provides a browsable interface over this metadata table. Each dimension shows its sources, type, description, and a truncated example value in the table. A peek pane expands to show all example values, per-source statistics, and the YAML-authored fields.