Nexus Data

Principle

Every column on nexus_events must be required. If a field cannot be guaranteed on every event from every source, it does not belong on the core event table — it belongs in a facet table (dimensions, measurements, identifiers, traits, attribution) or on the source event table.

There are no optional columns on nexus_events.

Why

The core event table is the nexus contract. Downstream consumers — output models, semantic layer generators, LLM agents — need to know exactly what they can rely on without checking for nulls or reading source-specific documentation. A column that is "usually there" is worse than a column that is always there or a column that lives in a well-defined facet table with metadata.

Optional columns create ambiguity:

Is it null because the source doesn't have this concept, or because of a bug?
Should the semantic layer expose it as a dimension? It's not in the metadata pipeline.
Does an LLM know to check for it? It's not in the facet catalog.

Required columns eliminate these questions. Facet tables with EAV metadata answer them systematically for everything else.

Current Required Fields

Field	Type	Description
`event_id`	STRING	Unique nexus event identifier
`occurred_at`	TIMESTAMP	Business timestamp
`event_type`	STRING	Event category
`event_name`	STRING	Specific event action
`source`	STRING	Source system name

Fields to Evaluate

The following fields are currently on or near the core event schema and need to be evaluated against the "must be required" rule:

Field	Current Status	Evaluation
`event_description`	Optional	Could be required (soft — warning test). Every event can produce a human-readable description.
`significance`	Optional	Candidate to move to measurements or dimensions.
`_ingested_at`	Optional	Operational metadata. Could be required with warning-level test.
`_processed_at`	Optional	Operational metadata. Could be required with warning-level test.

Where Non-Core Data Belongs

Data type	Home	Example
Quantitative values	Measurements (EAV → pivot)	`revenue`, `annual_premium_price`
Cross-source categorical tags	Dimensions (EAV → pivot)	`is_revenue_earned`, `source_record_id`

source_record_id is a good example of why this boundary matters. It is the source system's primary business identifier for the event record (contract number, invoice ID, order ID). It is broadly useful but not universal — GA4 pageviews, synthetic events, and session boundaries have no meaningful source record ID. That makes it ineligible for the core event table and a natural fit for dimensions, where absence means null in the pivot without ambiguity. | Person/group identifiers | Identifiers (EAV) | email, phone, customer_id | | Entity properties | Traits (EAV) | first_name, city, plan_type | | Touchpoint click IDs | Attribution models | fbclid, gclid | | Source-specific fields | Source event tables | contract_number, appointment_status |

The Join Tradeoff

Cleaner separation of concerns means more joins. This is an intentional choice:

nexus_events alone answers "what happened?" — event counts, timelines, source breakdowns. That covers a large class of questions.
Adding "how much?" requires joining measurements.
Adding "which business concept?" requires joining dimensions.
Adding "who was involved?" requires joining participants and entities.

Each join adds a well-defined facet with its own metadata, tests, and semantic layer discoverability. The alternative — a wide events table with every possible column — trades discoverability and contract stability for fewer joins.

These Joins Are Cheap

The pivoted facet tables (nexus_event_dimensions, nexus_event_measurements) have one row per event_id — the same grain as nexus_events. Joining them is a 1:1 primary key join, which modern columnar engines (Snowflake, BigQuery, Databricks) optimize to a hash lookup. There is no cardinality fan-out, no row explosion, no complexity. The "cost" of these joins is effectively zero at query time.

The real cost of a wide events table is not avoiding joins — it is losing contract stability, metadata discoverability, and the ability to add new facets without altering the core schema.

Impact on Semantic Layer Generation

The facet pipeline (EAV → union → pivot → metadata) is the mechanism nexus uses for auto-discovery. A field on nexus_events is not self-describing — the generator must be taught about it explicitly. A field in a facet table is automatically cataloged in the metadata table and available for semantic layer generation without configuration.

This is the strongest argument for keeping nexus_events narrow: everything outside it flows through a pipeline that makes it discoverable.

Event Schema Quick Reference — current field reference
Event Dimensions — the EAV dimension pattern
Event Measurements — the EAV measurement pattern