You Don't Need Recursive CTEs in dbt
If you''re using a recursive CTE in dbt to traverse a bounded graph, you can replace it with a Jinja loop that renders the same algorithm as a chain of regular CTEs. The result is plain SQL that doesn''t hit Snowflake''s recursive-operator memory limit.
If you're using WITH RECURSIVE in dbt to walk a graph up to a fixed
depth, you can replace it with a Jinja for loop that unrolls the
traversal into a chain of regular CTEs. The rendered SQL is plain hash
joins. Same algorithm, no recursive primitive. This is what we did to
fix an out-of-memory failure in our identity resolution macro, but the
pattern applies to any bounded-depth graph traversal in dbt.
The standard recursive CTE
SQL is a declarative language. You describe what you want; the engine
figures out execution. There's no for loop, no while, no mutable
state. Graph traversal — visit every node reachable from a starting
point — is fundamentally iterative, and there's no way to express it
declaratively in a single query.
WITH RECURSIVE is the SQL standard's solution. It lets you reference
a CTE from inside its own definition, and the engine repeatedly
re-evaluates the recursive case until no new rows are produced (or a
depth limit is hit). The standard pattern looks like this:
WITH RECURSIVE walk AS (
SELECT node, node AS reachable, 0 AS depth
FROM graph
UNION ALL
SELECT w.node, e.target, w.depth + 1
FROM walk w
JOIN edges e ON w.reachable = e.source
WHERE w.depth < 3
)
SELECT * FROM walk;
This is the textbook answer. Every SQL tutorial that covers graph traversal lands here. It exists because pure SQL has no other way to express iteration.
The Jinja-unrolled replacement
dbt isn't pure SQL. Jinja runs before SQL hits the warehouse. If your
depth limit is a value Jinja can see at render time — which it is in
any depth-bounded traversal, because the limit is typically a var()
in dbt_project.yml — you can unroll the recursion at compile time
into a chain of regular CTEs:
{% set max_depth = var('max_depth', 3) %}
WITH depth_0 AS (
SELECT node, node AS reachable FROM graph
)
{% for level in range(1, max_depth + 1) %}
, depth_{{ level }} AS (
SELECT * FROM depth_{{ level - 1 }}
UNION
SELECT d.node, e.target
FROM depth_{{ level - 1 }} d
JOIN edges e ON d.reachable = e.source
)
{% endfor %}
SELECT * FROM depth_{{ max_depth }};
With max_depth = 3, Jinja renders four CTEs (depth_0 through
depth_3). Each one extends reachability by one more hop and uses
UNION to dedup. Cycles are absorbed naturally — a walk that revisits
a node produces a duplicate (start, reachable) row, which UNION
drops.
The output is identical to the recursive version. Any node reachable
from a start node via a walk of length ≤ N is also reachable via a
simple-path prefix of that walk, which is ≤ N. So both versions produce
the same set of (start, reachable) pairs and the same downstream
results.
The out-of-memory problem this solves
The recursive operator on Snowflake holds its intermediate state in memory and cannot spill to disk. On large graphs, the frontier of recursive rows plus any per-row tracking state (path strings, cycle detection, etc.) grows past the per-query memory budget and the query aborts with:
100298 (57014): Recursive Join ran out of memory.
Please re-run this query on a larger warehouse.
A bigger warehouse buys headroom but doesn't change the failure mode. Eventually the data grows further and the recursive operator runs out of memory again.
The Jinja-unrolled version doesn't use the recursive operator. Each level is a regular CTE that the planner executes as a hash join. Hash joins on Snowflake (and BigQuery) spill to local and remote storage when they run out of memory. They get slower under memory pressure; they don't abort. The out-of-memory failure mode is structurally impossible.
BigQuery has a different set of structural restrictions on
WITH RECURSIVE — it must be at the top level of a WITH clause,
can't be combined with non-recursive CTEs via UNION ALL, and can't
reference upstream CTEs in its recursive body. The unrolled form is
plain SQL, so none of those restrictions apply there either.
The depth bound that the recursive version required is the same bound the unrolled version uses. The pattern works wherever the depth is known at Jinja-compile time, which covers the typical dbt cases: identity resolution, hierarchical rollups, time-window walks, any bounded DAG traversal.