Creating models, refs, and lineage
A model is a .sql file. ref() and source() build the dependency graph. dbt uses that graph (the DAG) to run models in the right order.
Learning Objectives
By the end of this lesson, you will be able to:
- Write a dbt model as a single
selectstatement in a.sqlfile - Use
ref()to depend on another model - Use
source()to depend on a raw table outside dbt - Read and reason about a dbt DAG (directed acyclic graph)
- Use selectors (
+,-) to run subsets of the graph
This lesson is the conceptual heart of dbt. Everything else in this module (materializations, Jinja, macros, packages) builds on the mental model you build here.
A model is a select statement
In dbt, a model is a single .sql file in models/. It contains
one select statement that describes the rows the model should
contain. Nothing else — no create table, no insert, no DDL.
For example, models/staging/active_family_members.sql:
select
member,
email,
role
from {{ ref('family_members') }}
where role in ('parent', 'child')
When you run dbt build, dbt wraps this select in the right DDL for
your warehouse and materialization (create or replace table ... as select ... for BigQuery + table), executes it, and stores the
result. You write the what; dbt handles the how.
That's the single biggest mental shift coming from raw SQL scripts: you stop writing DDL. dbt manages object creation, dependencies, schema names, and rebuilds. You describe the data you want.
ref() — the function that builds the DAG
When one model needs data from another, never hard-code the table
name. Use ref():
-- ✅ Right
select * from {{ ref('family_members') }}
-- ❌ Wrong
select * from doe_family_dev.family_members
ref() does two things:
- At compile time, dbt replaces
{{ ref('family_members') }}with the fully qualified table name for the current target (dev vs prod, your schema vs a teammate's). - At parse time, dbt records that this model depends on
family_members. That dependency is what builds the DAG.
If you run dbt build and family_members is out of date, dbt will
rebuild it first before rebuilding the model that refs it. You never
have to think about ordering.
source() — for raw tables outside dbt
dbt doesn't build the raw tables landed by os-nexus (or Fivetran, or
your own scripts). Those exist before dbt runs. You reference them with
source() after declaring them in a YAML file.
Declare in models/sources/gmail/_sources.yml:
version: 2
sources:
- name: gmail
schema: gmail
tables:
- name: messages
- name: recipients
Reference from a model with source('gmail', 'messages'):
-- models/staging/gmail/stg_gmail_messages.sql
select
message_id,
sender_email,
sent_at,
subject
from {{ source('gmail', 'messages') }}
source() plays the same role as ref(): it resolves to a real table
name and records the dependency for the DAG. The difference is that
dbt doesn't try to build sources — they're inputs to the pipeline.
The rule of thumb: everything inside models/ is a ref();
everything outside dbt is a source().
The DAG
The collection of every ref() and source() call across your
project forms a directed acyclic graph — a tree of dependencies
that flows in one direction (acyclic = no loops).
gmail.messages ─────→ stg_gmail_messages ─────→ gmail_events ─────┐
│
gmail.recipients ───→ stg_gmail_recipients ───→ gmail_entity_id ─┤
├──→ nexus_events
google_calendar.events ──→ google_calendar_events ─────────────────┤
│
notion.tasks ────→ stg_notion_tasks ───→ notion_events ───────────┘
dbt walks this graph every time you run. For each model:
- Resolve its
ref()s andsource()s - Sort the whole graph topologically
- Build models in dependency order
- Parallelize where the graph allows (independent branches)
You don't have to specify run order anywhere — it falls out of the
ref() calls.
Viewing the DAG
In dbt Cloud
The Lineage tab in the IDE renders the DAG visually. Click a model to highlight its upstream and downstream dependencies.
Locally with dbt Power User
The Lineage panel (added by the extension) shows the same view. Click a model in the panel to navigate to its file. Click the upstream/downstream icons to focus subgraphs.
From the CLI
dbt docs generate
dbt docs serve
Opens a browser-based docs site with full lineage browsing.
The DAG is more than a pretty picture. It's how you reason about:
- Impact analysis — change a model, see what downstream models break
- Debugging — a stale upstream is the most common cause of downstream errors
- Performance — long, narrow chains are slow; wide, parallel branches run fast
Selecting subsets of the DAG
You almost never want to run every model. Use the -s (select) flag
to target subsets:
# Run one model
dbt build -s family_members
# Run that model and everything downstream
dbt build -s family_members+
# Run that model and everything upstream (sources, parents, grandparents)
dbt build -s +family_members
# Both — full dependency cone around it
dbt build -s +family_members+
# Everything tagged 'nexus'
dbt build -s tag:nexus
# Multiple selectors (union)
dbt build -s tag:nexus tag:marts
# Intersection
dbt build -s tag:nexus,tag:gmail
The + is read as "and everything attached to this side". This is the
single most useful tool in dbt for iterating quickly during
development.
A complete two-model example
Create models/staging/family_emails.sql:
select
member,
email
from {{ ref('family_members') }}
where email is not null
Create models/marts/parent_emails.sql:
select email
from {{ ref('family_emails') }}
where member in (
select member from {{ ref('family_members') }} where role = 'parent'
)
The DAG:
family_members (seed)
├──→ family_emails
│ │
│ └──→ parent_emails
│ ▲
│ │
└────────────────────┘
Build everything downstream of family_members:
dbt build -s family_members+
dbt builds family_members (if it's a seed it just loads CSV), then
family_emails, then parent_emails. You wrote zero ordering logic.
Hands-On Exercise
-
Create a model
models/staging/active_family_members.sqlthat selects onlyrole in ('parent', 'child')from{{ ref('family_members') }}. -
Create a model
models/marts/family_email_domains.sqlthat selectsmember, split(email, '@')[offset(1)] as domainfrom{{ ref('active_family_members') }}. -
Build only the new models:
dbt build -s active_family_members+ -
Open the lineage view (dbt Cloud or Power User) and confirm the shape:
family_members → active_family_members → family_email_domains. -
Make a tiny edit to
active_family_members.sql. Then run:dbt build -s state:modified+dbt rebuilds the changed model and everything downstream of it. That's impact-analysis-as-a-command.
Summary
| Concept | Key takeaway |
|---|---|
| Model | A .sql file with one select statement; becomes a table or view |
ref() |
Reference another model — resolves to a real table name + records a dep |
source() |
Reference a raw table outside dbt; declared in a YAML file |
| DAG | The graph dbt builds from every ref() / source() call |
| Run ordering | Falls out of the DAG automatically — you don't specify it |
| Selectors | -s model+ / +model / tag:foo / state:modified+ for targeted runs |
Next Lesson
ref() and the DAG explain what gets built and in what order. The
next thing to learn is how to structure the SQL inside a model — and
in dbt that almost always means CTEs. Head to
2.3 CTEs — the building blocks of dbt models.