Core Event Schema Boundary
Design principle for what belongs on nexus_events vs facet tables (dimensions, measurements, source tables).
Principle
Every column on nexus_events must be required. If a field cannot be guaranteed
on every event from every source, it does not belong on the core event table —
it belongs in a facet table (dimensions, measurements, identifiers, traits,
attribution) or on the source event table.
There are no optional columns on nexus_events.
Why
The core event table is the nexus contract. Downstream consumers — output models, semantic layer generators, LLM agents — need to know exactly what they can rely on without checking for nulls or reading source-specific documentation. A column that is "usually there" is worse than a column that is always there or a column that lives in a well-defined facet table with metadata.
Optional columns create ambiguity:
- Is it null because the source doesn't have this concept, or because of a bug?
- Should the semantic layer expose it as a dimension? It's not in the metadata pipeline.
- Does an LLM know to check for it? It's not in the facet catalog.
Required columns eliminate these questions. Facet tables with EAV metadata answer them systematically for everything else.
Current Required Fields
| Field | Type | Description |
|---|---|---|
event_id |
STRING | Unique nexus event identifier |
occurred_at |
TIMESTAMP | Business timestamp |
event_type |
STRING | Event category |
event_name |
STRING | Specific event action |
source |
STRING | Source system name |
Fields to Evaluate
The following fields are currently on or near the core event schema and need to be evaluated against the "must be required" rule:
| Field | Current Status | Evaluation |
|---|---|---|
event_description |
Optional | Could be required (soft — warning test). Every event can produce a human-readable description. |
significance |
Optional | Candidate to move to measurements or dimensions. |
_ingested_at |
Optional | Operational metadata. Could be required with warning-level test. |
_processed_at |
Optional | Operational metadata. Could be required with warning-level test. |
Where Non-Core Data Belongs
| Data type | Home | Example |
|---|---|---|
| Quantitative values | Measurements (EAV → pivot) | revenue, annual_premium_price |
| Cross-source categorical tags | Dimensions (EAV → pivot) | is_revenue_earned, source_record_id |
source_record_id is a good example of why this boundary matters. It is the
source system's primary business identifier for the event record (contract
number, invoice ID, order ID). It is broadly useful but not universal — GA4
pageviews, synthetic events, and session boundaries have no meaningful source
record ID. That makes it ineligible for the core event table and a natural fit
for dimensions, where absence means null in the pivot without ambiguity. |
Person/group identifiers | Identifiers (EAV) | email, phone, customer_id |
| Entity properties | Traits (EAV) | first_name, city, plan_type | |
Touchpoint click IDs | Attribution models | fbclid, gclid | |
Source-specific fields | Source event tables | contract_number,
appointment_status |
The Join Tradeoff
Cleaner separation of concerns means more joins. This is an intentional choice:
nexus_eventsalone answers "what happened?" — event counts, timelines, source breakdowns. That covers a large class of questions.- Adding "how much?" requires joining measurements.
- Adding "which business concept?" requires joining dimensions.
- Adding "who was involved?" requires joining participants and entities.
Each join adds a well-defined facet with its own metadata, tests, and semantic layer discoverability. The alternative — a wide events table with every possible column — trades discoverability and contract stability for fewer joins.
These Joins Are Cheap
The pivoted facet tables (nexus_event_dimensions, nexus_event_measurements)
have one row per event_id — the same grain as nexus_events. Joining them is
a 1:1 primary key join, which modern columnar engines (Snowflake, BigQuery,
Databricks) optimize to a hash lookup. There is no cardinality fan-out, no row
explosion, no complexity. The "cost" of these joins is effectively zero at query
time.
The real cost of a wide events table is not avoiding joins — it is losing contract stability, metadata discoverability, and the ability to add new facets without altering the core schema.
Impact on Semantic Layer Generation
The facet pipeline (EAV → union → pivot → metadata) is the mechanism nexus uses
for auto-discovery. A field on nexus_events is not self-describing — the
generator must be taught about it explicitly. A field in a facet table is
automatically cataloged in the metadata table and available for semantic layer
generation without configuration.
This is the strongest argument for keeping nexus_events narrow: everything
outside it flows through a pipeline that makes it discoverable.
Related
- Event Schema Quick Reference — current field reference
- Event Dimensions — the EAV dimension pattern
- Event Measurements — the EAV measurement pattern