Introduction to dbt

What dbt is, why a family would want one, and how it fits into the Doe family's personal data warehouse.

Learning Objectives

By the end of this lesson, you will be able to:

  • Explain what dbt is and what problem it solves
  • Describe how dbt fits between a data warehouse and downstream consumers
  • Articulate why the Doe family chose dbt over hand-written SQL scripts
  • Recognize the two setup paths covered in the rest of this module

What is dbt?

dbt ("data build tool") is a framework for writing, testing, and deploying SQL transformations against a data warehouse. It is the T in ELT — once your data is loaded (the E and L), dbt is what turns it into the clean, modeled tables your team actually queries.

You write .sql files that describe what each table or view should contain. dbt figures out the dependency order, compiles each file into real SQL, runs it against your warehouse, and persists the result.

              ┌──────────────────────────┐
   Raw data → │ dbt (T)                  │ → Clean models
              │  • models                │
              │  • tests                 │
              │  • lineage               │
              └──────────────────────────┘

You also get, for free:

  • A dependency graph — dbt knows which models depend on which, and builds them in the right order.
  • Tests — uniqueness, not-null, custom assertions, all declared next to the models they cover.
  • Documentation — auto-generated from your model definitions.
  • Version control — your transformations live in Git like any other code.

Why the Doe family chose dbt

Jane could have written a folder of .sql scripts and a bash file that runs them in order. She'd hit the same problems every analytics team has ever hit:

Problem with raw scripts What dbt gives you
"What runs before what?" Automatic dependency resolution
"Did the email model produce zero rows?" Declarative tests
"What changed last week and why?" Git history of every model
"Can we try a change without breaking the dashboards?" Dev/prod environments built in
"How do I share what a column means?" Auto-generated docs
"I want to reuse this snippet in five models." Jinja macros + the ref() function

Most importantly: dbt-nexus is distributed as a dbt package. If you want to use nexus, you need dbt. That alone settles it for the Doe family.


How dbt fits into the family's stack

   Gmail            ─┐
   Google Calendar  ─┼─→  BigQuery (raw)  ─→  dbt + dbt-nexus  ─→  BigQuery (modeled)  ─→  Family contacts list
   Notion           ─┘                                                                     Christmas card list
                                                                                            …whatever else Jane builds

The Doe family uses BigQuery as the warehouse. The raw data from Gmail, Calendar, and Notion gets landed in BigQuery by some ingestion path (we deliberately don't pick one — see the syllabus). dbt then takes those raw tables and, with help from dbt-nexus, produces the modeled tables the family actually uses.


Two ways to run dbt

The next two lessons cover two distinct setup paths:

Path Lesson Good when
dbt Cloud (browser IDE) 1.2 You want zero local setup and a free hosted environment
dbt locally (VS Code + ext) 1.3 You prefer your own editor, terminal, and full control

You only need to pick one for the rest of the course. The Doe family has Jane working locally in VS Code (she likes her keybindings) and John in dbt Cloud (he doesn't want to install Python). Either is fine.


Key terms you'll see throughout the course

Term Definition
Model A .sql file in your dbt project. Each model becomes a table or view in the warehouse.
ref() Jinja function that references another model. Builds the dependency graph automatically.
source() Jinja function that references a raw table outside dbt's models.
Materialization How dbt persists a model: view, table, incremental, or ephemeral.
Target A named environment (e.g., dev, prod) with its own warehouse credentials and dataset.
Jinja The templating language dbt uses to make SQL programmable.
Package A reusable bundle of dbt models, macros, and tests. dbt-nexus is a package.

Hands-On Exercise

You won't run any code in this lesson, but take 10 minutes to look around:

  1. Open the official dbt docs and skim the "What is dbt?" page.
  2. Browse the dbt Hub — the package directory. dbt-nexus isn't on the Hub (it installs from Git), but dbt_utils is — bookmark it; we'll use it in Module 2 and 3.
  3. Decide which setup path you want to follow: cloud (1.3) or local (1.4). You can change your mind later — both produce the same models.

Summary

Concept Key takeaway
dbt's role The "T" in ELT — turns raw warehouse data into modeled, tested tables
Why dbt Dependency order, tests, docs, environments, version control — all built in
Doe family's stack Gmail/Calendar/Notion → BigQuery → dbt + dbt-nexus → modeled tables
Two setup paths dbt Cloud (browser) or local (VS Code) — pick one for the rest of the course
dbt-nexus A dbt package; you'll install it in Module 3

Next Lesson

First a one-time infrastructure step: 1.2 Prerequisite: Setting up BigQuery. Then pick your dbt path — Cloud (1.3) or local (1.4).