← Back to blog

You Don't Need Recursive CTEs in dbt

If you''re using a recursive CTE in dbt to traverse a bounded graph, you can replace it with a Jinja loop that renders the same algorithm as a chain of regular CTEs. The result is plain SQL that doesn''t hit Snowflake''s recursive-operator memory limit.

Kevin McLaughlin
4 min read

If you're using WITH RECURSIVE in dbt to walk a graph up to a fixed depth, you can replace it with a Jinja for loop that unrolls the traversal into a chain of regular CTEs. The rendered SQL is plain hash joins. Same algorithm, no recursive primitive. This is what we did to fix an out-of-memory failure in our identity resolution macro, but the pattern applies to any bounded-depth graph traversal in dbt.

The standard recursive CTE

SQL is a declarative language. You describe what you want; the engine figures out execution. There's no for loop, no while, no mutable state. Graph traversal — visit every node reachable from a starting point — is fundamentally iterative, and there's no way to express it declaratively in a single query.

WITH RECURSIVE is the SQL standard's solution. It lets you reference a CTE from inside its own definition, and the engine repeatedly re-evaluates the recursive case until no new rows are produced (or a depth limit is hit). The standard pattern looks like this:

WITH RECURSIVE walk AS (
  SELECT node, node AS reachable, 0 AS depth
  FROM graph

  UNION ALL

  SELECT w.node, e.target, w.depth + 1
  FROM walk w
  JOIN edges e ON w.reachable = e.source
  WHERE w.depth < 3
)
SELECT * FROM walk;

This is the textbook answer. Every SQL tutorial that covers graph traversal lands here. It exists because pure SQL has no other way to express iteration.

The Jinja-unrolled replacement

dbt isn't pure SQL. Jinja runs before SQL hits the warehouse. If your depth limit is a value Jinja can see at render time — which it is in any depth-bounded traversal, because the limit is typically a var() in dbt_project.yml — you can unroll the recursion at compile time into a chain of regular CTEs:

{% set max_depth = var('max_depth', 3) %}

WITH depth_0 AS (
  SELECT node, node AS reachable FROM graph
)
{% for level in range(1, max_depth + 1) %}
, depth_{{ level }} AS (
  SELECT * FROM depth_{{ level - 1 }}
  UNION
  SELECT d.node, e.target
  FROM depth_{{ level - 1 }} d
  JOIN edges e ON d.reachable = e.source
)
{% endfor %}

SELECT * FROM depth_{{ max_depth }};

With max_depth = 3, Jinja renders four CTEs (depth_0 through depth_3). Each one extends reachability by one more hop and uses UNION to dedup. Cycles are absorbed naturally — a walk that revisits a node produces a duplicate (start, reachable) row, which UNION drops.

The output is identical to the recursive version. Any node reachable from a start node via a walk of length ≤ N is also reachable via a simple-path prefix of that walk, which is ≤ N. So both versions produce the same set of (start, reachable) pairs and the same downstream results.

The out-of-memory problem this solves

The recursive operator on Snowflake holds its intermediate state in memory and cannot spill to disk. On large graphs, the frontier of recursive rows plus any per-row tracking state (path strings, cycle detection, etc.) grows past the per-query memory budget and the query aborts with:

100298 (57014): Recursive Join ran out of memory.
Please re-run this query on a larger warehouse.

A bigger warehouse buys headroom but doesn't change the failure mode. Eventually the data grows further and the recursive operator runs out of memory again.

The Jinja-unrolled version doesn't use the recursive operator. Each level is a regular CTE that the planner executes as a hash join. Hash joins on Snowflake (and BigQuery) spill to local and remote storage when they run out of memory. They get slower under memory pressure; they don't abort. The out-of-memory failure mode is structurally impossible.

BigQuery has a different set of structural restrictions on WITH RECURSIVE — it must be at the top level of a WITH clause, can't be combined with non-recursive CTEs via UNION ALL, and can't reference upstream CTEs in its recursive body. The unrolled form is plain SQL, so none of those restrictions apply there either.

The depth bound that the recursive version required is the same bound the unrolled version uses. The pattern works wherever the depth is known at Jinja-compile time, which covers the typical dbt cases: identity resolution, hierarchical rollups, time-window walks, any bounded DAG traversal.