Materializations

How dbt persists each model — view, table, incremental, ephemeral — and how to choose between them.

Learning Objectives

By the end of this lesson, you will be able to:

  • Describe the four built-in dbt materializations
  • Pick the right materialization for each layer of the Doe family project
  • Configure materialization at the project, directory, and model level
  • Recognize when to reach for incremental (and when not to)

What materialization means

A model is a select statement. The materialization decides what dbt does with that select — what physical object (or no object) it creates in the warehouse.

Materialization What dbt creates
view A view: create or replace view ... as <select>
table A table, rebuilt from scratch every run
incremental A table, but only new/changed rows are inserted on each run
ephemeral No object at all — inlined as a CTE into downstream models

The first two are the workhorses. The third is for scale. The fourth is a niche tool.


view — light, always fresh, never stored

A view is just a saved SQL definition. Querying it re-executes the underlying select against whatever the source data currently is.

Use when:

  • The model is cheap to compute on the fly
  • You want the freshest possible data every time someone queries
  • You're in a staging layer where each select is light cleanup

Don't use when:

  • The query is expensive (every read pays the full cost)
  • Downstream models read from it repeatedly in the same run
{{ config(materialized='view') }}

select
    member,
    email,
    role
from {{ ref('family_members') }}
where email is not null

table — heavy, fast to read, rebuilt every run

A table is the materialized result of your select, stored to disk. dbt drops and recreates it every dbt run.

Use when:

  • Downstream models or queries read from it often
  • The select is expensive enough that you want to pay the cost once per run instead of per query
  • You need real warehouse statistics for the optimizer

Don't use when:

  • The model is small and trivial (view is enough)
  • You're rebuilding billions of rows every run (look at incremental)
{{ config(materialized='table') }}

select
    entity_id,
    display_name,
    last_interacted_at
from {{ ref('nexus_entities') }}
where entity_type = 'person'

For the Doe family, every marts/ model is a table. They're queried constantly (the family contacts list, the Christmas card list) and the underlying data isn't huge.


incremental — append-only at scale

incremental is a table that dbt builds in two modes:

  • Full refresh (first run, or --full-refresh): build from scratch
  • Incremental (subsequent runs): only insert rows that are new since the last build

For the Doe family this rarely matters — their data is small. But once you have 10M+ rows in a source, full-refreshing on every run becomes painful. incremental is how production nexus deployments keep nexus_events build times manageable.

A minimal incremental model:

{{ config(materialized='incremental', unique_key='event_id') }}

select
    event_id,
    occurred_at,
    event_type,
    source
from {{ ref('all_events') }}

{% if is_incremental() %}
    where occurred_at > (select max(occurred_at) from {{ this }})
{% endif %}

The {% if is_incremental() %} block runs only on incremental builds. On a full refresh it's skipped and dbt builds the full table.

We don't dive deeper here — the dbt incremental docs go into all the modes (append, merge, delete+insert) and the trade-offs. For now, just know that incremental exists and what it's for.


ephemeral — no object, inlined as a CTE

An ephemeral model never lands in the warehouse. Instead, dbt inlines its select as a CTE wherever it's ref()'d from.

Use when:

  • You want to factor out a snippet for readability without paying for another object
  • A model is referenced once and only once

Don't use when:

  • Multiple downstream models read it (you'll re-execute the CTE multiple times)
  • The select is heavy (each downstream pays the cost)

In practice you'll reach for ephemeral rarely. Mention it once, recognize it when you see it, move on.


Setting materialization

Materialization can be set at three levels — most specific wins.

1. Project level (dbt_project.yml)

Sets the default for entire directory subtrees:

models:
  doe_family:
    sources:
      +materialized: view
    staging:
      +materialized: view
    marts:
      +materialized: table

The + prefix is dbt's "this is a config" syntax.

2. Directory _properties.yml / inline YAML

Less common — you can override at a subdirectory level by adding a nested key.

3. Model file ({{ config(...) }})

Highest precedence — set right inside the model:

{{ config(materialized='incremental', unique_key='event_id') }}

select ...

Use this for one-off overrides; let the project-level config carry the defaults.


The Doe family's materialization strategy

Layer Default Why
sources/ view Trivial cleanup; freshness over speed
staging/ view Mostly thin reshape; cheap to recompute
marts/ table Read constantly by humans and downstream queries
(special) incremental Promoted on a case-by-case basis once a table gets large

This is a sensible default for any nexus project at personal-to-small-team scale. Most production deployments follow the same pattern.


Hands-On Exercise

  1. Create models/marts/family_member_count.sql:

    {{ config(materialized='view') }}
    
    select count(*) as n_family_members
    from {{ ref('family_members') }}
    
  2. Run dbt build -s family_member_count. Inspect the result in BigQuery — note that it's a view, not a table.

  3. Change the config to table:

    {{ config(materialized='table') }}
    
  4. Run dbt build -s family_member_count again. BigQuery now shows it as a table. dbt dropped and recreated it.

  5. Remove the inline {{ config(...) }} entirely. In dbt_project.yml, confirm that models.doe_family.marts.+materialized is set to table (or set it). Rebuild and confirm it's still a table — the project-level default takes over.

  6. (Optional, advanced) Convert a model to incremental with a unique_key and an is_incremental() filter. Run it twice and inspect the row count after each run.


Summary

Concept Key takeaway
view Saved SQL — freshest possible, pay on every read
table Materialized result — fast reads, rebuilt each run
incremental Table built additively — for large or expensive models
ephemeral Inlined as a CTE — niche, use rarely
Where to set Project-level for defaults, inline {{ config() }} for overrides
Doe family default view for sources/staging, table for marts

Next Lesson

You can shape what each model does (ref(), source()) and how it's persisted (materialization). The third leg of the stool is Jinja — how dbt generates the SQL it actually sends to BigQuery. Head to 2.5 Jinja and programmatic SQL.