Nexus Data

Metrics are named business calculations — Revenue, GBV, Take Rate, Conversion Rate — defined once in configuration and compiled into a queryable nexus_metrics_metadata table. For the rationale behind this approach, see Metric Definitions Should be Maintained in DBT.

A metric definition serves three purposes:

Qualitative information for LLMs and analysts — descriptions, aliases, what questions a metric answers
Lineage — which tables and columns a metric depends on, how it's derived
Precise SQL — the exact expression needed to compute the metric, ensuring consistency without reinventing the wheel

The definitions are structured and precise enough to deterministically derive tool-specific schemas (see Generating Semantic Layer Schemas), and rich enough to inform an LLM writing SQL directly.

Schema

Field	Required	Type	Description
`name`	yes	string	Machine-readable identifier
`label`	no	string	Human-readable display name
`aliases`	no	string[]	Alternative names people use for this metric
`type`	yes	enum	`measure`, `count`, or `derived`
`tables`	conditional	string[]	All tables needed (required for `measure`/`count`)
`metric_sql`	yes	string	Column expression (`measure`/`count`) or formula (`derived`)
`filter`	no	string[]	Fully qualified predicates, ANDed together
`format`	no	enum	`currency`, `integer`, `percentage`, `decimal`
`unit`	no	string	`USD`, `EUR`, `degrees_fahrenheit`, etc.
`polarity`	no	enum	`positive` (higher is better) or `negative` (lower is better)
`precision`	no	integer	Decimal places to display
`tags`	no	string[]	Organizational categories
`description`	yes	string	What it means and how it's derived
`example_questions`	no	string[]	Business questions this metric answers

Naming Conventions

name: snake_case, lowercase. This is the machine-readable identifier used in code, SQL references, and derived metric formulas. Examples: revenue, gbv, take_rate, converting_persons, ad_spend.
label: Title Case, human-readable. Used in dashboards, documentation, and LLM responses. Examples: Revenue, Gross Booking Value, Take Rate.
aliases: Lowercase, natural language. How people actually refer to the metric in conversation. Examples: total revenue, gross bookings.
tags: Lowercase, single words or short phrases. Organizational categories for filtering and grouping. Examples: financial, marketplace, acquisition.

Metrics are grouped by their primary model — the table that provides join context and serves as the query anchor:

vars:
  nexus:
    metrics:
      <model_name>:
        - { name, type, tables, metric_sql, filter, description, ... }

Metric Types

`measure` — Numeric Columns

A numeric column that the consumer aggregates. The definition specifies what to aggregate and what to filter on; the consumer chooses the aggregation function (SUM, AVG, MEDIAN, MIN, MAX).

nexus_event_measurements:
  - name: revenue
    label: Revenue
    aliases: [total revenue, earned revenue, platform revenue]
    type: measure
    tables: [nexus_event_measurements, nexus_event_dimensions]
    metric_sql: nexus_event_measurements.revenue
    filter:
      - nexus_event_dimensions.is_revenue_earned = true
    format: currency
    unit: USD
    polarity: positive
    precision: 2
    tags: [financial, marketplace]
    description: >
      Platform revenue on paid bookings.
    example_questions:
      - What was our revenue last month?
      - How much did we earn?
      - What is average revenue per booking?

An LLM asked "what was average revenue per booking?" reads the definition and writes AVG(m.revenue) ... WHERE d.is_revenue_earned. An analyst sees the canonical column, filter, and table dependencies without guessing.

`count` — Countable Columns

A column to count or count-distinct. The aggregation is inherently COUNT — but the resulting count is a number that can be further aggregated (max daily count, median weekly count, etc.).

nexus_events:
  - name: converting_persons
    label: Converting Persons
    type: count
    tables: [nexus_events, nexus_entity_participants, nexus_event_dimensions]
    metric_sql: DISTINCT nexus_entity_participants.entity_id
    filter:
      - nexus_event_dimensions.is_conversion = true
      - nexus_entity_participants.entity_type = "person"
    polarity: positive
    tags: [acquisition]
    description: >
      Unique persons who had a conversion event.

  - name: total_persons
    label: Total Persons
    type: count
    tables: [nexus_events, nexus_entity_participants]
    metric_sql: DISTINCT nexus_entity_participants.entity_id
    filter:
      - nexus_entity_participants.entity_type = "person"
    tags: [acquisition]
    description: >
      Total unique persons across all events.

`derived` — Formulas

A formula referencing other metrics by name, or concrete SQL with aggregations baked in. The formula IS the metric — the consumer can meta-aggregate the result (e.g. median daily conversion rate) but cannot swap internal aggregation functions.

nexus_events:
  - name: conversion_rate
    label: Conversion Rate
    type: derived
    metric_sql: converting_persons / NULLIF(total_persons, 0)
    format: percentage
    polarity: positive
    precision: 4
    tags: [acquisition]
    description: >
      Fraction of persons who converted.
    example_questions:
      - What percentage of visitors convert?
      - What is our conversion rate by channel?

nexus_event_measurements:
  - name: take_rate
    label: Take Rate
    type: derived
    tables: [nexus_event_measurements, nexus_event_dimensions]
    metric_sql: >
      (SUM(nexus_event_measurements.service_fee)
      + SUM(nexus_event_measurements.koala_commission))
      / NULLIF(SUM(nexus_event_measurements.gross_merchandise_value), 0)
    filter:
      - nexus_event_dimensions.is_revenue_earned = true
    format: percentage
    polarity: positive
    precision: 4
    tags: [financial, marketplace]
    description: >
      Platform take rate as a fraction of GBV.
    example_questions:
      - What percentage of GBV does the platform keep?
      - How has our take rate changed over time?

Derived metrics that reference other metrics by name (like conversion_rate) solve the filter scoping problem: each component carries its own filters, so the numerator and denominator can have different filter conditions without ambiguity.

Design Principles

Fully Qualified References

Every column reference in metric_sql and filter includes its table name. Zero ambiguity about what comes from where:

metric_sql: nexus_event_measurements.revenue          # not just "revenue"
filter:
  - nexus_event_dimensions.is_revenue_earned = true    # not just "is_revenue_earned"

Tables Declare Dependencies, Not Join Paths

tables says "you need these tables to compute this metric." The consumer knows how to join them — from nexus schema knowledge, semantic layer join definitions, or the nexus metadata tables. The metric definition doesn't repeat join logic.

Filters Are Predicates, Not WHERE Clauses

Each filter entry is a precise, fully qualified boolean expression. Multiple entries are ANDed. OR conditions go inside a single entry or — better — into a dimension that encodes the business logic at build time.

Complexity Belongs in the Transformation Layer

If a metric filter is getting complex (OR across sources, temporal boundaries, data-level reasoning), push that complexity into a dimension model in dbt. The dimension model encodes the logic once, the metric filter becomes dimension.is_x = true. Both live in the same dbt project — one PR creates the dimension and uses it in the metric.

Metrics Grouped by Primary Model

The model key determines:

Query anchor — where an LLM starts the FROM clause
Organization — related metrics live together
Schema generation — which semantic layer object the metric belongs to

Metrics Must Live on Event-Layer Models

Metrics must be defined on tables that participate in the event layer — tables with event_id and occurred_at columns. This includes nexus core tables (nexus_events, nexus_event_measurements, nexus_event_dimensions, nexus_entity_participants) and source event tables (koala_app_events, stripe_events, etc.).

Do not define metrics on normalized or snapshot models (e.g. koala_app_listings, koala_app_users). The semantic layer's query engine assumes every metric can be:

Joined via event_id — normalized models don't have one
Filtered by occurred_at — normalized models don't have a time axis
Grouped by day — meaningless without occurred_at

Metrics on non-event tables will appear in the metadata catalog but cannot produce sparklines, trend charts, or period-over-period comparisons.

If you need a metric over data that currently lives in a normalized table, the fix is upstream: create an event model in the source's intermediate/ layer that transforms the normalized records into Nexus-compatible events with event_id and occurred_at, then define the metric on that event table.

# Good — metric on an event-layer table
nexus_events:
  - name: active_listings
    type: count
    tables: [nexus_events, koala_app_events]
    metric_sql: DISTINCT koala_app_events.listing_id
    filter:
      - nexus_events.event_name = 'listing approved'
      - koala_app_events.status = 'active'

# Bad — metric on a normalized snapshot table
koala_app_listings:
  - name: active_listings
    type: count
    tables: [koala_app_listings]
    metric_sql: koala_app_listings.listing_id
    # No event_id, no occurred_at — breaks time-series queries

Metrics are not limited to the nexus compressed schema, however. Source-specific event tables are valid targets:

stripe_events:
  - name: stripe_fees
    label: Stripe Processing Fees
    type: measure
    tables: [stripe_events]
    metric_sql: stripe_events.fee
    format: currency
    unit: USD
    polarity: negative
    precision: 2
    tags: [financial]
    description: >
      Stripe processing fees across all payment events.
    example_questions:
      - How much are we paying Stripe?
      - What are our payment processing costs?

Why Not a Traditional Semantic Layer?

Traditional semantic layers do several things:

Define which tables exist and how they join
Declare dimensions with types
Declare metrics with aggregation rules
Translate business questions into deterministic SQL
Expose an API, pre-aggregations, documentation, etc.

Nexus already handles (1) and (2) through its core tables, facets, and metadata. The join paths are fixed by the nexus schema. The metadata tables catalog every available measurement, dimension, trait, and identifier.

Nexus metrics handle (3) — canonical metric declarations with enough precision to derive (4) and (5) deterministically.

LLMs Change the Equation

The traditional semantic layer's primary value is (4): translating business questions into SQL through a constrained query API. LLMs change this fundamentally — they can write SQL directly given good context.

The shift is from constraining outputs (a query API that only allows valid queries) to informing outputs (a rich schema and metric catalog that helps an LLM write correct SQL).

Nexus metrics are informing: here's the canonical definition, adapt as needed. An LLM can compute MEDIAN(daily_conversion_rate) — a nested aggregation that many semantic layers can't express — because the metric definition gave it everything it needs.

The Right Boundary

Responsibility	Nexus	Query layer
Cross-source extraction and labeling	measurements, dimensions	—
Join paths and entity graph	core schema	—
Discoverability catalog	metadata tables	—
Canonical metric definitions	`nexus_metrics_metadata`	—
Deterministic schema generation	metric definitions	semantic layer tools
SQL generation from questions	—	LLM, semantic layer, analyst
Time intelligence (YoY, MTD)	—	LLM, semantic layer, analyst
Pre-aggregation / caching	—	semantic layer, BI tool
Visualization	—	dashboard, artifact

Configuration

Metrics are defined in dbt_project.yml under vars.nexus.metrics, grouped by primary model:

vars:
  nexus:
    metrics:
      nexus_event_measurements:
        - name: revenue
          label: Revenue
          aliases: [total revenue, earned revenue]
          type: measure
          tables: [nexus_event_measurements, nexus_event_dimensions]
          metric_sql: nexus_event_measurements.revenue
          filter:
            - nexus_event_dimensions.is_revenue_earned = true
          format: currency
          unit: USD
          polarity: positive
          precision: 2
          tags: [financial]
          description: >
            Platform revenue on paid bookings.
          example_questions:
            - What was our revenue last month?
            - How much did we earn?

        - name: gbv
          label: Gross Booking Value
          aliases: [gross bookings, total bookings value]
          type: measure
          tables: [nexus_event_measurements, nexus_event_dimensions]
          metric_sql: nexus_event_measurements.gross_merchandise_value
          filter:
            - nexus_event_dimensions.is_revenue_earned = true
          format: currency
          unit: USD
          polarity: positive
          precision: 2
          tags: [financial, marketplace]
          description: >
            Total gross merchandise value on paid booking events.
            Includes guest fees, owner payout, and platform commission.
          example_questions:
            - What was total GBV this quarter?
            - How much merchandise value did we process?

        - name: take_rate
          label: Take Rate
          type: derived
          tables: [nexus_event_measurements, nexus_event_dimensions]
          metric_sql: >
            (SUM(nexus_event_measurements.service_fee)
            + SUM(nexus_event_measurements.koala_commission))
            / NULLIF(SUM(nexus_event_measurements.gross_merchandise_value), 0)
          filter:
            - nexus_event_dimensions.is_revenue_earned = true
          format: percentage
          polarity: positive
          precision: 4
          tags: [financial, marketplace]
          description: >
            Platform take rate as a fraction of GBV.
          example_questions:
            - What percentage of GBV does the platform keep?

        - name: ad_spend
          label: Ad Spend
          type: measure
          tables: [nexus_event_measurements]
          metric_sql: nexus_event_measurements.spend
          format: currency
          unit: USD
          polarity: negative
          precision: 2
          tags: [marketing]
          description: >
            Total advertising spend across Facebook Ads and Google Ads.

      nexus_events:
        - name: converting_persons
          label: Converting Persons
          type: count
          tables: [nexus_events, nexus_entity_participants, nexus_event_dimensions]
          metric_sql: DISTINCT nexus_entity_participants.entity_id
          filter:
            - nexus_event_dimensions.is_conversion = true
            - nexus_entity_participants.entity_type = "person"
          polarity: positive
          tags: [acquisition]
          description: >
            Unique persons who had a conversion event.

        - name: total_persons
          label: Total Persons
          type: count
          tables: [nexus_events, nexus_entity_participants]
          metric_sql: DISTINCT nexus_entity_participants.entity_id
          filter:
            - nexus_entity_participants.entity_type = "person"
          tags: [acquisition]
          description: >
            Total unique persons across all events.

        - name: conversion_rate
          label: Conversion Rate
          type: derived
          metric_sql: converting_persons / NULLIF(total_persons, 0)
          format: percentage
          polarity: positive
          precision: 4
          tags: [acquisition]
          description: >
            Fraction of persons who converted.
          example_questions:
            - What percentage of visitors convert?

      stripe_events:
        - name: stripe_fees
          label: Stripe Processing Fees
          type: measure
          tables: [stripe_events]
          metric_sql: stripe_events.fee
          format: currency
          unit: USD
          polarity: negative
          precision: 2
          tags: [financial]
          description: >
            Stripe processing fees across all payment events.

When dbt's vars.yml feature ships (v1.12+), metrics can be extracted to their own file with zero changes to the model.

The Model: `nexus_metrics_metadata`

The nexus_metrics_metadata model reads metric definitions from vars and compiles them into a queryable table. The model iterates over the model-grouped structure and flattens it into rows:

{{ config(materialized='table') }}

{% set all_metrics = var('nexus', {}).get('metrics', {}) %}

{% set has_metrics = all_metrics | length > 0 %}

{% if has_metrics %}

    {% set ns = namespace(first=true) %}
    {% for model_name, metrics_list in all_metrics.items() %}
        {% for metric in metrics_list %}
            {% if not ns.first %}UNION ALL{% endif %}
            {% set ns.first = false %}
            SELECT
                '{{ model_name }}' AS model,
                '{{ metric.name }}' AS metric_name,
                '{{ metric.get("label", metric.name) }}' AS label,
                {{ "'" ~ metric.aliases | join(", ") ~ "'" if metric.get('aliases') else 'NULL' }} AS aliases,
                '{{ metric.type }}' AS metric_type,
                {{ "'" ~ (metric.tables | join(", ")) ~ "'" if metric.get('tables') else 'NULL' }} AS tables,
                '{{ metric.metric_sql | trim }}' AS metric_sql,
                {{ "'" ~ (metric.filter | join(" AND ")) ~ "'" if metric.get('filter') else 'NULL' }} AS filter,
                {{ "'" ~ metric.format ~ "'" if metric.get('format') else 'NULL' }} AS format,
                {{ "'" ~ metric.unit ~ "'" if metric.get('unit') else 'NULL' }} AS unit,
                {{ "'" ~ metric.polarity ~ "'" if metric.get('polarity') else 'NULL' }} AS polarity,
                {{ metric.precision if metric.get('precision') is not none else 'NULL' }} AS precision,
                {{ "'" ~ (metric.tags | join(", ")) ~ "'" if metric.get('tags') else 'NULL' }} AS tags,
                '{{ metric.get("description", "") | trim }}' AS description,
                {{ "'" ~ (metric.example_questions | join("; ")) ~ "'" if metric.get('example_questions') else 'NULL' }} AS example_questions
        {% endfor %}
    {% endfor %}

{% else %}

    SELECT
        CAST(NULL AS VARCHAR) AS model,
        CAST(NULL AS VARCHAR) AS metric_name,
        CAST(NULL AS VARCHAR) AS label,
        CAST(NULL AS VARCHAR) AS aliases,
        CAST(NULL AS VARCHAR) AS metric_type,
        CAST(NULL AS VARCHAR) AS tables,
        CAST(NULL AS VARCHAR) AS metric_sql,
        CAST(NULL AS VARCHAR) AS filter,
        CAST(NULL AS VARCHAR) AS format,
        CAST(NULL AS VARCHAR) AS unit,
        CAST(NULL AS VARCHAR) AS polarity,
        CAST(NULL AS INTEGER) AS precision,
        CAST(NULL AS VARCHAR) AS tags,
        CAST(NULL AS VARCHAR) AS description,
        CAST(NULL AS VARCHAR) AS example_questions
    WHERE 1 = 0

{% endif %}

Metadata Table Schema

Column	Type	Description
`model`	STRING	Primary model (query anchor and grouping)
`metric_name`	STRING	Machine-readable identifier
`label`	STRING	Human-readable display name
`aliases`	STRING	Comma-separated alternative names
`metric_type`	STRING	`measure`, `count`, or `derived`
`tables`	STRING	Comma-separated table dependencies
`metric_sql`	STRING	Column expression or formula
`filter`	STRING	Predicates ANDed together
`format`	STRING	Display format (`currency`, `percentage`, etc.)
`unit`	STRING	Unit of measurement
`polarity`	STRING	`positive` or `negative`
`precision`	INTEGER	Decimal places
`tags`	STRING	Comma-separated organizational categories
`description`	STRING	Plain-English explanation
`example_questions`	STRING	Semicolon-separated business questions this metric answers

Querying Metrics Metadata

Discover Available Metrics

SELECT * FROM nexus_metrics_metadata

LLM Context

An LLM writing SQL queries the metrics catalog alongside measurement and dimension metadata:

SELECT
    model,
    metric_name,
    metric_type,
    metric_sql,
    filter,
    description,
    example_questions
FROM nexus_metrics_metadata
ORDER BY model, metric_name

Given the revenue metric definition, an LLM constructs:

SELECT
    DATE_TRUNC('month', m.occurred_at) AS month,
    SUM(m.revenue) AS revenue
FROM nexus_event_measurements m
JOIN nexus_event_dimensions d ON m.event_id = d.event_id
WHERE d.is_revenue_earned = true
GROUP BY 1
ORDER BY 1

The metric definition provided the column, the filter, and the table dependencies. The LLM assembled them into a query with time granularity, grouping, and ordering appropriate for the question.

Human Documentation

Metric descriptions, labels, and aliases serve double duty. A dashboard showing a "Revenue" metric card populates a tooltip from the description:

Revenue: Platform revenue on paid bookings.

The aliases field means an LLM recognizes "total revenue", "earned revenue", and "platform revenue" as the same metric. And example_questions helps an LLM match a user's natural language question to the right metric.

Metrics reference event-layer tables — nexus core tables, source event tables, or any model with event_id and occurred_at.

Source Data
  → Events (what happened)
    → Measurements (quantitative values)
    → Dimensions (categorical filters)
    → Source event tables (source-specific columns)
  → Metrics (canonical business calculations over event-layer tables)

Measurements and dimensions are in the data — rows in tables, computed at build time from source records. Metrics are about the data — definitions that describe how to query the data. This is why measurements and dimensions are core nexus tables, while metrics are a metadata table compiled from configuration.

Normalized and snapshot tables (e.g. Fivetran-synced application tables) are useful as dimension sources and reference data, but should not be metric hosts. If a business calculation needs data from a normalized table, transform that data into events upstream so the metric can be time-series compatible.

When Not to Define a Metric

Not every business metric needs a definition. Many metrics follow a standard pattern that's derivable from the event name alone — an LLM or analyst already knows how to write these queries given the nexus schema.

Pageviews — count of events with a specific event name:

SELECT COUNT(nexus_events.event_id)
FROM nexus_events
WHERE nexus_events.event_name = 'pageview'

Unique pageviewers — distinct persons who triggered the event:

SELECT COUNT(DISTINCT nexus_entity_participants.entity_id)
FROM nexus_events
LEFT JOIN nexus_entity_participants
  ON nexus_events.event_id = nexus_entity_participants.event_id
WHERE nexus_events.event_name = 'pageview'
  AND nexus_entity_participants.entity_type = 'person'

These are general patterns — COUNT(events) and COUNT(DISTINCT entities) by event name — that apply to any event type. Defining a metric for each one would be noise. The event metadata already tells an LLM everything it needs.

Define a metric when:

The calculation isn't obvious from the schema (e.g., take rate involves specific columns and division)
It requires specific filters beyond event name (e.g., revenue requires is_revenue_earned = true)
It's a derived formula combining other metrics (e.g., conversion rate)
You want to attach qualitative metadata — description, example questions, polarity — that wouldn't be discoverable otherwise
Consistency matters and you don't want different people writing different SQL for the same concept

Skip a metric when the query is a predictable pattern over event_name that anyone familiar with the schema could write.

Adding Metrics to a Client Project

Define metrics in dbt_project.yml under vars.nexus.metrics, grouped by primary model
Create the model at models/nexus/nexus_metrics_metadata.sql (copy the model template above)
Run dbt build — the model compiles metric definitions into a queryable table

The metrics model has no upstream dependencies on other nexus tables — it reads purely from configuration. It can be built independently and updated by editing YAML.

Generating Semantic Layer Schemas

The metric definitions are structured and precise enough to deterministically generate schemas for semantic layer tools like Cube or MetricFlow. A measure type maps to a tool-native measure with consumer-chosen aggregation; a count type maps to a count-distinct measure; a derived type maps to a calculated metric referencing its components. The model key determines which semantic layer object (cube, semantic model, etc.) the metric belongs to, and tables declares the required joins.

If a metric can't be expressed precisely enough to generate a schema deterministically, it's a signal that the definition needs refinement — decompose into components, or push complexity into a dimension.

Event Measurements — Quantitative observations (the raw values metrics can aggregate)
Event Dimensions — Categorical filters (the event selectors metrics can use)
Metadata Tables — Other auto-generated catalogs
Architecture Overview — Core tables ERD, key joins, and source-specific event fields