Jinja and programmatic SQL
Jinja is what makes dbt SQL programmable — control flow, loops, compile-time SQL generation. The compiled SQL is the truth.
Learning Objectives
By the end of this lesson, you will be able to:
- Read the three Jinja delimiters:
{{ }},{% %},{# #} - Use
if,for, andsetto generate SQL at compile time - Inspect the compiled SQL to debug what Jinja produced
- Recognize three patterns Jinja unlocks that raw SQL can't express cleanly
Why dbt uses a templating language
Raw SQL has two limitations dbt needs to work around:
-
You can't reference another model's name without hardcoding it. Different environments (dev, prod, your teammate's schema) need different fully-qualified names. →
{{ ref('model_name') }}solves this. -
You can't write SQL that generates SQL. Want to
selectall 47 columns from a table except 3? Want aunion allover 20 tables that share a schema? Raw SQL has no way to express either without typing the whole thing out by hand.
Jinja — the templating language dbt embeds in .sql files — handles
both. dbt compiles your Jinja-templated SQL into pure SQL, then
sends that pure SQL to BigQuery. Jinja is the layer where SQL gets
programmable.
The three Jinja delimiters
Every Jinja construct uses one of three delimiter pairs:
| Delimiter | Purpose | Example |
|---|---|---|
{{ ... }} |
Expression | select * from {{ ref('family_members') }} |
{% ... %} |
Statement / logic | {% if var('include_kids') %}...{% endif %} |
{# ... #} |
Comment | {# this is not in the compiled SQL #} |
Expressions evaluate to a string and get spliced into the SQL. Statements drive control flow but emit nothing on their own. Comments disappear at compile time.
Variables and set
You can declare a Jinja-level variable in the middle of a model:
{% set kid_roles = ['child', 'teen'] %}
select *
from {{ ref('family_members') }}
where role in ({{ "'" ~ kid_roles | join("', '") ~ "'" }})
(That last line concatenates the list into 'child', 'teen' at
compile time.)
You can also pull variables from dbt_project.yml:
vars:
active_year: 2026
where year = {{ var('active_year') }}
And override at the command line:
dbt run --vars '{active_year: 2025}'
if — conditional SQL
select
member,
email
{% if var('include_roles', false) %}
, role
{% endif %}
from {{ ref('family_members') }}
If include_roles is true at compile time, the model includes the
role column. If false, it doesn't. One model, two shapes — the
choice is made when dbt compiles, not when BigQuery runs.
if is also great for environment branching using target.name:
select *
from {{ ref('nexus_events') }}
{% if target.name == 'dev' %}
where occurred_at >= timestamp_sub(current_timestamp(), interval 30 day)
{% endif %}
In dev you limit to 30 days for fast builds; in prod you process the full history. No code change at deploy time.
for — looping over a list
This is where Jinja really shines. Say you have five Doe family emails and want to count messages received by each one in a single model:
{% set family_emails = [
'jane@doefamily.example',
'john@doefamily.example',
'jack@doefamily.example',
'joe@doefamily.example',
'julie@doefamily.example'
] %}
select
{% for email in family_emails %}
countif(recipient_email = '{{ email }}') as msgs_to_{{ email.split('@')[0] }}
{%- if not loop.last %},{% endif %}
{% endfor %}
from {{ source('gmail', 'recipients') }}
That compiles to:
select
countif(recipient_email = 'jane@doefamily.example') as msgs_to_jane,
countif(recipient_email = 'john@doefamily.example') as msgs_to_john,
countif(recipient_email = 'jack@doefamily.example') as msgs_to_jack,
countif(recipient_email = 'joe@doefamily.example') as msgs_to_joe,
countif(recipient_email = 'julie@doefamily.example') as msgs_to_julie
from `doe-family-dwh`.`gmail`.`recipients`
One Jinja loop, five SQL expressions. Now imagine the list comes from
a family_members seed via a compile-time run_query() call (an
advanced trick): you've just built a model that adapts as you add and
remove family members, with zero hand-edits.
Three patterns Jinja unlocks
1. Star-except
Raw SQL has no way to say "all columns except these three". Jinja can,
via dbt_utils.star:
select {{ dbt_utils.star(from=ref('gmail_events'), except=['internal_id', 'raw_payload']) }}
from {{ ref('gmail_events') }}
At compile time, dbt queries the warehouse for the column list, then emits everything except the excluded ones. Pure SQL would force you to enumerate every kept column by hand.
2. Dynamic UNION ALL
Got 30 tables in os_nexus.event_* and you want to union them?
{% set event_tables = ['event_signup', 'event_login', 'event_purchase', ...] %}
{% for t in event_tables %}
select *, '{{ t }}' as source_table from {{ source('os_nexus', t) }}
{% if not loop.last %} union all {% endif %}
{% endfor %}
One model, N tables unioned, easy to maintain.
3. Conditional environment behavior
You already saw if target.name == 'dev'. The same idea drives
nexus.limit_by_timestamp() (in Module 2) — a macro that injects a
date filter only in dev, keeping local builds fast.
The compiled SQL is the truth
When something doesn't behave the way you expect, read the compiled
SQL. It's the actual SQL dbt sent to BigQuery, with all Jinja
resolved. dbt writes it to target/compiled/<project>/... after every
run.
In dbt Cloud: open a model → click the Compiled tab.
Locally with dbt Power User: right-click a model → View Compiled SQL.
From the CLI:
dbt compile -s my_model
cat target/compiled/doe_family/models/marts/my_model.sql
Always read the compiled SQL when debugging Jinja. The error messages from BigQuery are about the compiled output, not what you wrote.
A note on whitespace
Jinja preserves whitespace by default, which can produce ugly compiled SQL. Two helpers:
{%-and-%}strip whitespace on that side of the statement{{-and-}}do the same for expressions
select
{% for col in cols -%}
{{ col }}{% if not loop.last %},{% endif %}
{% endfor %}
Compiles to cleaner SQL than without the -. Useful when reading
compiled output during debugging.
Hands-On Exercise
-
Create
models/marts/family_email_summary.sqlusing a Jinjaforloop to count messages received per family member. Use thefamily_membersseed to drive the loop (hint: a hardcoded list is fine for this exercise — using a seed dynamically is a stretch goal). -
Run
dbt compile -s family_email_summary. Open the compiled SQL and check it generated onecountif(...)per family member. -
Add a top-of-file
{% set debug = true %}. Wrap a no-op comment in{% if debug %}{% endif %}. Confirm in the compiled SQL that the comment is/isn't there depending on the value. -
(Stretch) Add
{% if target.name == 'dev' %} limit 100 {% endif %}to one of your existing staging models. Run in dev — confirm the limit. Switch target to anything else — confirm the limit disappears.
Summary
| Concept | Key takeaway |
|---|---|
| Jinja | Templating language dbt uses to make SQL programmable |
| Three delimiters | {{ }} expression, {% %} statement, {# #} comment |
| Compile time | Jinja runs before the SQL is sent to BigQuery — it generates the SQL |
| Patterns | Star-except, dynamic unions, environment-conditional filters |
| Compiled SQL | The actual SQL sent to BigQuery; always read it when Jinja confuses you |
| Whitespace control | {%- ... -%} to strip noise in the compiled output |
Next Lesson
Once your project starts repeating the same Jinja snippets in three places, factor them into reusable units called macros — and once someone else has already written the macro you need, install their package. Head to 2.6 Macros and packages.