Nexus Data

Event dimensions bring categorical business concepts — is_revenue_earned, is_expense_incurred, is_cash_received, invoice_id — into the nexus compressed schema, making them queryable across sources without source-specific knowledge.

Dimensions are the mirror image of measurements. While measurements label quantitative values (revenue = $100), dimensions label categorical properties of events (is_revenue_earned = true, invoice_id = 'in_abc123'). Together, they close the gap between raw event data and metric-ready analysis.

The Problem

Measurements solved cross-source labeling for quantitative values. Asking "What was my revenue?" no longer requires knowing that Stripe stores amounts in cents or that Upwork calls them "Fixed-price" transactions — the revenue measurement column handles that.

But a metric is not just a measurement. To compute "Total Revenue," an analyst needs two pieces of information:

Which measurement column? — revenue (measurements solve this)
Which events to aggregate over? — ??? (nothing currently solves this cross-source)

Without a cross-source dimension, the analyst must know that for Stripe it's event_name = 'invoice paid', for Upwork it's event_name = 'payment received' (but not deposit received), and for FreshBooks it's event_name = 'invoice paid' (but not payment received, because that overlaps with invoice paid for 184 of 186 payments). That is source-specific knowledge — the exact thing measurements exist to eliminate.

Dimensions close this gap.

Dimensions follow the same faceted extraction pattern as measurements, identifiers, traits, states, and attribution. The intermediate EAV table stores:

The entity is the event (event_id)
The attribute is the dimension name (dimension_name)
The value is the categorical payload (dimension_value)

A single event can produce multiple dimension rows, each tagging a different cross-source property:

Source Record: Stripe Invoice #in_abc123 (paid)
  → dimension: dimension_name = 'is_revenue_earned',  dimension_value = 'true'
  → dimension: dimension_name = 'invoice_id',         dimension_value = 'in_abc123'

Source Record: Stripe Invoice #in_abc123 (created)
  → dimension: dimension_name = 'invoice_id',         dimension_value = 'in_abc123'
  → (no is_revenue_earned row — creating an invoice is not earning revenue)

Source Record: Upwork Transaction #789 (Fixed-price payment)
  → dimension: dimension_name = 'is_revenue_earned',  dimension_value = 'true'

Source Record: Upwork Transaction #790 (Withdrawal)
  → (no is_revenue_earned row — deposits are not revenue)

The dimension row is only present when the condition is true. Absence means false. The core table pivots these into boolean columns with FALSE as the default.

Dimensions vs States vs Events

These three concepts are often confused. Each serves a different purpose:

Events are things that happened. event_name = 'invoice paid' already tells you what occurred. If you can answer a question by filtering on event_name, you do not need a dimension.

States track entity lifecycle over time. "The subscription went from active to canceled." States are about entities (persons, groups, subscriptions), not events. Nexus already has nexus_entity_states for this.

Dimensions are cross-source business concepts applied to events. They exist when:

The concept spans multiple sources or event types — you cannot determine the answer from event_name alone
It requires data-level reasoning — the engineer needs to examine the actual data to make the determination (e.g., "this payment is already covered by an invoice paid event, don't tag it")
It's a property of the event, not a lifecycle state of an entity

Dimensions are concepts, not traditional source-level columns. The same concept can be derived completely differently depending on the source, the event type, or even the time period. The dimension model for each source encodes that reasoning independently; the pivoted result is a single, clean column that analysts use without source-specific knowledge.

Example: Cross-Source Derivation

is_revenue_earned is the quintessential dimension: Stripe invoice paid, Upwork payment received, and FreshBooks invoice paid are all different event_name values across different sources that all mean "revenue was earned." The dimension unifies them.

A business unit dimension like business_unit ('dtc' or 'partner') might apply to nearly all sources — but for timx it's derived from client_id, for GA4 from the property ID or URL pattern, and for Invoca from phone number routing. Each source's dimension model encodes its own derivation. The analyst just sees business_unit = 'dtc'.

Example: Schema Drift Over Time

Dimensions also absorb schema drift within a single source. A call_type dimension ('sales' or 'service') might be derived from a keypress signal in historical data (2023–2025), but from an AI-managed classification field after a phone system migration. The dimension model encodes the temporal boundary once:

CASE
    WHEN occurred_at < '2025-03-01' THEN
        CASE WHEN keypress = '1' THEN 'sales' ELSE 'service' END
    WHEN ai_call_type IS NOT NULL THEN ai_call_type
    ELSE 'unknown'
END as dimension_value

Downstream consumers see a stable call_type column regardless of era. The schema change is invisible outside the dimension extraction model.

What Is Not a Dimension

A concept like is_deal_closed is not a dimension — it's either just event filtering (WHERE event_name = 'deal closed') or a state (entity lifecycle).

Naming Convention: `is_{noun}_{past_participle}`

Boolean dimensions follow the pattern is_{business_concept}_{qualifying_action}:

The noun is the business concept (often a measurement name): revenue, expense, cash, lead
The past participle is the qualifying condition — why this event is canonical for that concept: earned, incurred, received, created
The is_ prefix marks the column as boolean and enables auto-casting in the pivot

Dimension	What it says	Noun	Participle
`is_revenue_earned`	Revenue was earned on this event	revenue	earned
`is_cash_received`	Cash hit the bank on this event	cash	received
`is_expense_incurred`	An expense was taken on	expense	incurred
`is_tax_deductible`	This expense qualifies for tax deduction	tax	deductible
`is_lead_created`	A lead was created (system of record)	lead	created

The convention reads like natural English in a WHERE clause: WHERE d.is_revenue_earned. An LLM seeing that column name knows exactly what to filter without documentation.

Non-boolean dimensions (string-valued like invoice_id) use plain descriptive names without the is_ prefix.

Enabling Cross-Source Timing Analysis

The naming convention naturally enables comparative analysis between different business events. For example, is_revenue_earned (Stripe invoice paid) and is_cash_received (QuickBooks deposit) tag the same revenue at two different moments — when the business earned it vs when cash settled in the bank:

-- "Compare when revenue was earned vs when cash was received"
SELECT
    DATE_TRUNC(m.occurred_at, MONTH) AS month,
    SUM(CASE WHEN d.is_revenue_earned THEN m.revenue END) AS revenue_earned,
    SUM(CASE WHEN d.is_cash_received THEN m.revenue END) AS cash_received
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
GROUP BY 1
ORDER BY 1

The column names frame the analysis without any documentation or source-specific knowledge.

Which Concepts Deserve a Dimension?

Not every business concept needs a dimension. The test:

Does the concept span multiple sources, and does the business have a clear answer to "which events represent this concept?"

revenue: Yes. Multiple sources emit revenue events (Stripe, Upwork, FreshBooks). The business knows which events mean "revenue was earned." → is_revenue_earned
expense: Yes. Expenses come from FreshBooks, QuickBooks, bank feeds. The business knows which events are real expenses vs transfers. → is_expense_incurred
lead: Yes. GA4 fires a lead event from the website, Salesforce creates the CRM record. The business picks the system of record. → is_lead_created
num_nights: No. "How many nights?" is inherently contextual. There is no cross-source "nights were realized" event. The analyst should specify the event type directly.

The list of standard dimensions should be short — truly cross-source business concepts only.

Schema

Core Table: `nexus_event_dimensions`

The analyst-facing table. One row per event, one column per dimension. Dimension columns are discovered dynamically at compile time — new dimensions from new sources appear automatically on the next dbt build.

Column	Type	Required	Description
`event_id`	STRING	✅	FK to the event
`occurred_at`	TIMESTAMP	✅	When (denormalized from event)
`source`	STRING	✅	Source system
`{dimension_name}`	BOOLEAN or STRING	❌	One column per dimension. `is_` prefix → BOOLEAN (default FALSE). Others → STRING (default NULL).

The event_id linkage connects dimensions to the full nexus entity graph through Events → Participants → Entities, enabling queries like "Revenue by customer where is_revenue_earned" without any source-specific joins.

Type convention: Dimensions prefixed with is_ are automatically cast to BOOLEAN in the pivot (with FALSE as default). All others remain STRING (with NULL as default).

Intermediate Table: `nexus_event_dimensions_unioned`

The EAV canonical form. One row per dimension per event. This is what source models feed into and what the core table pivots from.

Column	Type	Required	Description
`event_dimension_id`	STRING	✅	Unique ID (`evt_dim_` prefix)
`event_id`	STRING	✅	FK to the event this dimension came from
`dimension_name`	STRING	✅	What property (the EAV discriminator)
`dimension_value`	STRING	✅	The value (`'true'`/`'false'` for booleans, or string)
`occurred_at`	TIMESTAMP	✅	When (denormalized from event)
`source`	STRING	✅	Source system

You should not need to query this table directly — use the core table instead.

Extraction Pattern

Dimensions follow the standard four-layer architecture described in Recommended Source Model Structure.

Directory Structure

models/sources/{source_name}/
├── intermediate/
│   ├── {source}_invoice_events.sql             ← existing
│   ├── {source}_invoice_measurements.sql       ← existing
│   ├── {source}_invoice_dimensions.sql         ← NEW
│   ├── {source}_person_identifiers.sql         ← existing
│   └── {source}_group_identifiers.sql          ← existing
├── {source}_events.sql                         ← existing (unions events)
├── {source}_entity_identifiers.sql             ← existing (unions identifiers)
├── {source}_event_measurements.sql             ← existing (unions measurements)
├── {source}_event_dimensions.sql               ← NEW (unions dimensions)

Intermediate Dimension Models

Each intermediate dimension model:

Reads from intermediate event models for event_id linkage and timing
Applies business logic to determine which events qualify for each dimension
Produces one dimension row per qualifying event per dimension name

This mirrors how measurement models read from events and extract multiple measurement rows per event.

Example: `stripe_invoice_dimensions.sql`

{{ config(
    materialized='table',
    tags=['stripe', 'intermediate', 'dimensions']
) }}

-- Dimensions extracted from Stripe invoice events
-- is_revenue_earned: only on "invoice paid" events
-- invoice_id: on all invoice events (cross-source identifier)

WITH invoice_events AS (
    SELECT * FROM {{ ref('stripe_invoice_events') }}
),

-- is_revenue_earned: tags invoice paid events as the canonical revenue event
revenue_earned_dimensions AS (
    SELECT
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'is_revenue_earned'"]) }}
            as event_dimension_id,
        event_id,
        'is_revenue_earned' as dimension_name,
        'true' as dimension_value,
        occurred_at,
        'stripe' as source
    FROM invoice_events
    WHERE event_name = 'invoice paid'
),

-- invoice_id: cross-source invoice identifier on all invoice events
invoice_id_dimensions AS (
    SELECT
        {{ nexus.create_nexus_id('event_dimension', ['event_id', "'invoice_id'"]) }}
            as event_dimension_id,
        event_id,
        'invoice_id' as dimension_name,
        invoice_id as dimension_value,
        occurred_at,
        'stripe' as source
    FROM invoice_events
    WHERE invoice_id IS NOT NULL
)

SELECT * FROM revenue_earned_dimensions
UNION ALL
SELECT * FROM invoice_id_dimensions

Source-Level Union: `stripe_event_dimensions.sql`

{{ config(
    materialized='table',
    tags=['nexus', 'dimensions', 'stripe']
) }}

-- Union all Stripe dimension types
{{ dbt_utils.union_relations(
    relations=[
        ref('stripe_invoice_dimensions')
    ]
) }}

ORDER BY occurred_at DESC

Querying Dimensions

The core nexus_event_dimensions table is pivoted — each dimension is a column.

Revenue with Dimension Filter

-- "What's my revenue this year?"
SELECT
    EXTRACT(YEAR FROM m.occurred_at) AS year,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned
GROUP BY 1
ORDER BY 1

Revenue by Customer

-- "Revenue by customer"
SELECT
    e.name AS customer,
    SUM(m.revenue) AS total_revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
JOIN nexus_entity_participants p ON m.event_id = p.event_id
JOIN nexus_entities e ON p.entity_id = e.entity_id
WHERE d.is_revenue_earned
  AND e.entity_type = 'group'
GROUP BY 1
ORDER BY 2 DESC

Discoverability

-- "What dimensions are available?"
SELECT * FROM nexus_event_dimensions_metadata

Where Business Logic Belongs

Semantic logic that requires data-level reasoning belongs in the transformation layer (dbt/nexus), not the semantic layer.

The engineer building the dimension model has maximum context: they are staring at the raw data, understanding edge cases, writing the joins. They know:

"FreshBooks payment received events overlap with invoice paid — 184 of 186 payments link to the same 181 invoices. Don't tag both."
"Two FreshBooks payments have no invoice — they're overpayments from 2019. Don't tag those either."
"Upwork deposit received is a bank withdrawal, not revenue."

The dimension is set once at build time, is tested, is version-controlled, and produces a clean boolean. Encoding this same logic in a Cube YAML file — far from the data, without the context of why certain events are excluded — is where mistakes happen.

The Boundary

Operation type	Where it belongs	Example
Row-level data reasoning	dbt/nexus (transformation)	"Is this payment already covered by an invoice?"
Cross-source dimensions	dbt/nexus (transformation)	"This event `is_revenue_earned`"
Aggregation function	Semantic layer	`Revenue = SUM(revenue) WHERE is_revenue_earned`
Post-aggregation arithmetic	Semantic layer	`profit = SUM(revenue) + SUM(expense)`
Time intelligence	Semantic layer / query time	YoY, MTD, rolling averages

Relationship to Measurements

Measurements and dimensions work together. Measurements label what the quantitative value is. Dimensions label which events should be aggregated for a given business concept.

erDiagram
    EVENTS ||--o{ MEASUREMENTS : "what was the quantitative value"
    EVENTS ||--o{ DIMENSIONS : "which events count for this concept"
    EVENTS ||--o{ IDENTIFIERS : "who was involved"
    EVENTS ||--o{ TRAITS : "what do we know about them"
    EVENTS ||--o{ STATES : "what state changed"
    EVENTS ||--o{ ATTRIBUTION : "what caused this"
    EVENTS ||--o{ RELATIONSHIPS : "what connections exist"

The semantic layer defines metrics from these two facets:

Revenue = SUM(revenue) WHERE is_revenue_earned
Expenses = SUM(expense) WHERE is_expense_incurred
Profit = Revenue + Expenses (expenses are negative)

Nexus handles the hard part — cross-source extraction, labeling, and per-event business logic. The metric layer becomes mechanical.

Configuration

Configure dimension sources in dbt_project.yml:

vars:
  nexus:
    sources:
      stripe:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']
      freshbooks:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']
      upwork:
        enabled: true
        events: true
        measurements: true
        dimensions: true
        entities: ['person', 'group']

The nexus package automatically unions all enabled source dimensions into nexus_event_dimensions.

Dimension Catalog (YAML)

Dimensions are discovered automatically from the warehouse — every distinct dimension_name in nexus_event_dimensions_unioned appears in the metadata table. But you can enrich dimensions with qualitative information by adding entries to dbt_project.yml under vars.nexus.dimensions:

vars:
  nexus:
    dimensions:
      - name: page
        label: Page
        description: >
          The page URL or path associated with the event (web or app surfaces).
        aliases: [url, path]
        tags: [web, navigation]
        example_questions:
          - Which pages get the most traffic?

      - name: is_revenue_earned
        label: Revenue earned
        description: >
          Whether the event counts as earned revenue for marketplace reporting.
        tags: [financial, marketplace]
        example_questions:
          - How much revenue was earned last month?

      - name: campaign_name
        description: >
          Human-readable advertising campaign name.

Catalog Schema

Field	Required	Type	Description
`name`	yes	string	Must match the `dimension_name` in the EAV table
`label`	no	string	Human-readable display name (auto-generated from name if omitted)
`description`	no	string	What the dimension means and how it's derived
`aliases`	no	string[]	Alternative names people use for this dimension
`tags`	no	string[]	Organizational categories for filtering and grouping
`example_questions`	no	string[]	Business questions this dimension helps answer

YAML entries are optional. Dimensions without YAML entries still appear in the metadata table with warehouse-derived statistics — they just lack descriptions, aliases, and tags. Conversely, a YAML entry for a dimension not yet materialized in the warehouse appears with is_in_data = false, which is useful for documenting planned dimensions ahead of implementation.

Metadata

The nexus_event_dimensions_metadata model combines warehouse statistics with YAML-authored catalog fields into a single queryable table. It is rebuilt on every dbt build and serves as the LLM-queryable catalog of what dimensions exist, which sources contribute them, and what values they contain.

Discover Available Dimensions

SELECT * FROM nexus_event_dimensions_metadata

Metadata Table Schema

One row per (dimension_name, source) pair. Dimensions contributed by multiple sources produce multiple rows.

Column	Type	Description
`dimension_name`	STRING	Machine-readable identifier (matches `dimension_name` in the EAV)
`source`	STRING	Source system that contributed data (NULL for YAML-only entries)
`dimension_type`	STRING	`boolean` (for `is_` prefix) or `string`
`label`	STRING	Human-readable display name (from YAML or auto-generated)
`description`	STRING	Plain-English explanation (from YAML)
`aliases`	STRING	Comma-separated alternative names (from YAML)
`tags`	STRING	Comma-separated organizational categories (from YAML)
`example_questions`	STRING	Semicolon-separated business questions (from YAML)
`example_value_1`	STRING	Most frequent distinct value for this dimension/source pair
`example_value_2`	STRING	Second most frequent value
`example_value_3`	STRING	Third most frequent value
`first_seen_at`	TIMESTAMP	Earliest `occurred_at` for this dimension/source
`last_seen_at`	TIMESTAMP	Latest `occurred_at` for this dimension/source
`occurrence_count`	INTEGER	Total rows tagged with this dimension/source
`distinct_values`	INTEGER	Count of distinct `dimension_value`s
`is_in_data`	BOOLEAN	`true` if the dimension exists in the warehouse, `false` if YAML-only

LLM Context

An LLM writing SQL queries the dimension catalog alongside metric and measurement metadata:

SELECT
    dimension_name,
    dimension_type,
    description,
    example_value_1,
    example_value_2,
    example_value_3
FROM nexus_event_dimensions_metadata
WHERE source IS NOT NULL
ORDER BY dimension_name, source

The example values help an LLM understand what a dimension looks like in practice without querying the underlying data — e.g., seeing example_value_1 = 'dtc' on a business_unit dimension tells the LLM exactly what filter values to use.

Browsing Dimensions in the Semantic Layer UI

The Semantic Layer page (/data-dictionary/semantic-layer, Dimensions tab) provides a browsable interface over this metadata table. Each dimension shows its sources, type, description, and a truncated example value in the table. A peek pane expands to show all example values, per-source statistics, and the YAML-authored fields.

Event Measurements — Quantitative labeling (the companion facet to dimensions)
Metrics — Canonical business calculations built from measurements and dimensions
Architecture Overview — Core tables ERD and key joins
Pipeline Structure — Sources, Intermediate, and Core stages
Recommended Source Model Structure — Four-layer extraction architecture

Nexus Data

Event Dimensions

The Problem

The Solution: Dimensions as a Facet

Dimensions vs States vs Events

Example: Cross-Source Derivation

Example: Schema Drift Over Time

What Is Not a Dimension

Naming Convention: `is_{noun}_{past_participle}`

Enabling Cross-Source Timing Analysis

Which Concepts Deserve a Dimension?

Schema

Core Table: `nexus_event_dimensions`

Intermediate Table: `nexus_event_dimensions_unioned`

Extraction Pattern

Directory Structure

Intermediate Dimension Models

Example: `stripe_invoice_dimensions.sql`

Source-Level Union: `stripe_event_dimensions.sql`

Querying Dimensions

Revenue with Dimension Filter

Revenue by Customer

Discoverability

Where Business Logic Belongs

The Boundary

Relationship to Measurements

Configuration

Dimension Catalog (YAML)

Catalog Schema

Metadata

Discover Available Dimensions

Metadata Table Schema

LLM Context

Browsing Dimensions in the Semantic Layer UI

The Problem

The Solution: Dimensions as a Facet

Dimensions vs States vs Events

Example: Cross-Source Derivation

Example: Schema Drift Over Time

What Is Not a Dimension

Naming Convention: is_{noun}_{past_participle}

Enabling Cross-Source Timing Analysis

Which Concepts Deserve a Dimension?

Schema

Core Table: nexus_event_dimensions

Intermediate Table: nexus_event_dimensions_unioned

Extraction Pattern

Directory Structure

Intermediate Dimension Models

Example: stripe_invoice_dimensions.sql

Source-Level Union: stripe_event_dimensions.sql

Querying Dimensions

Revenue with Dimension Filter

Revenue by Customer

Discoverability

Where Business Logic Belongs

The Boundary

Relationship to Measurements

Configuration

Dimension Catalog (YAML)

Catalog Schema

Metadata

Discover Available Dimensions

Metadata Table Schema

LLM Context

Browsing Dimensions in the Semantic Layer UI

Related Documentation

Naming Convention: `is_{noun}_{past_participle}`

Core Table: `nexus_event_dimensions`

Intermediate Table: `nexus_event_dimensions_unioned`

Example: `stripe_invoice_dimensions.sql`

Source-Level Union: `stripe_event_dimensions.sql`