Introduction to dbt
What dbt is, why a family would want one, and how it fits into the Doe family's personal data warehouse.
Learning Objectives
By the end of this lesson, you will be able to:
- Explain what dbt is and what problem it solves
- Describe how dbt fits between a data warehouse and downstream consumers
- Articulate why the Doe family chose dbt over hand-written SQL scripts
- Recognize the two setup paths covered in the rest of this module
What is dbt?
dbt ("data build tool") is a framework for writing, testing, and deploying SQL transformations against a data warehouse. It is the T in ELT — once your data is loaded (the E and L), dbt is what turns it into the clean, modeled tables your team actually queries.
You write .sql files that describe what each table or view should
contain. dbt figures out the dependency order, compiles each file into
real SQL, runs it against your warehouse, and persists the result.
┌──────────────────────────┐
Raw data → │ dbt (T) │ → Clean models
│ • models │
│ • tests │
│ • lineage │
└──────────────────────────┘
You also get, for free:
- A dependency graph — dbt knows which models depend on which, and builds them in the right order.
- Tests — uniqueness, not-null, custom assertions, all declared next to the models they cover.
- Documentation — auto-generated from your model definitions.
- Version control — your transformations live in Git like any other code.
Why the Doe family chose dbt
Jane could have written a folder of .sql scripts and a bash file that
runs them in order. She'd hit the same problems every analytics team has
ever hit:
| Problem with raw scripts | What dbt gives you |
|---|---|
| "What runs before what?" | Automatic dependency resolution |
| "Did the email model produce zero rows?" | Declarative tests |
| "What changed last week and why?" | Git history of every model |
| "Can we try a change without breaking the dashboards?" | Dev/prod environments built in |
| "How do I share what a column means?" | Auto-generated docs |
| "I want to reuse this snippet in five models." | Jinja macros + the ref() function |
Most importantly: dbt-nexus is distributed as a dbt package. If you want to use nexus, you need dbt. That alone settles it for the Doe family.
How dbt fits into the family's stack
Gmail ─┐
Google Calendar ─┼─→ BigQuery (raw) ─→ dbt + dbt-nexus ─→ BigQuery (modeled) ─→ Family contacts list
Notion ─┘ Christmas card list
…whatever else Jane builds
The Doe family uses BigQuery as the warehouse. The raw data from Gmail, Calendar, and Notion gets landed in BigQuery by some ingestion path (we deliberately don't pick one — see the syllabus). dbt then takes those raw tables and, with help from dbt-nexus, produces the modeled tables the family actually uses.
Two ways to run dbt
The next two lessons cover two distinct setup paths:
| Path | Lesson | Good when |
|---|---|---|
| dbt Cloud (browser IDE) | 1.2 | You want zero local setup and a free hosted environment |
| dbt locally (VS Code + ext) | 1.3 | You prefer your own editor, terminal, and full control |
You only need to pick one for the rest of the course. The Doe family has Jane working locally in VS Code (she likes her keybindings) and John in dbt Cloud (he doesn't want to install Python). Either is fine.
Key terms you'll see throughout the course
| Term | Definition |
|---|---|
| Model | A .sql file in your dbt project. Each model becomes a table or view in the warehouse. |
ref() |
Jinja function that references another model. Builds the dependency graph automatically. |
source() |
Jinja function that references a raw table outside dbt's models. |
| Materialization | How dbt persists a model: view, table, incremental, or ephemeral. |
| Target | A named environment (e.g., dev, prod) with its own warehouse credentials and dataset. |
| Jinja | The templating language dbt uses to make SQL programmable. |
| Package | A reusable bundle of dbt models, macros, and tests. dbt-nexus is a package. |
Hands-On Exercise
You won't run any code in this lesson, but take 10 minutes to look around:
- Open the official dbt docs and skim the "What is dbt?" page.
- Browse the dbt Hub — the package
directory. dbt-nexus isn't on the Hub (it installs from Git), but
dbt_utilsis — bookmark it; we'll use it in Module 2 and 3. - Decide which setup path you want to follow: cloud (1.3) or local (1.4). You can change your mind later — both produce the same models.
Summary
| Concept | Key takeaway |
|---|---|
| dbt's role | The "T" in ELT — turns raw warehouse data into modeled, tested tables |
| Why dbt | Dependency order, tests, docs, environments, version control — all built in |
| Doe family's stack | Gmail/Calendar/Notion → BigQuery → dbt + dbt-nexus → modeled tables |
| Two setup paths | dbt Cloud (browser) or local (VS Code) — pick one for the rest of the course |
| dbt-nexus | A dbt package; you'll install it in Module 3 |
Next Lesson
First a one-time infrastructure step: 1.2 Prerequisite: Setting up BigQuery. Then pick your dbt path — Cloud (1.3) or local (1.4).