Nexus Data

Learning Objectives

By the end of this lesson, you will have:

Walked through three worked dbt examples using the Doe family seed
Stood up a complete dbt project end to end via the path you chose
Built a real model that reads warehouse data, joins to a seed, and writes a result table
Committed and pushed your work via Git

Worked example 1 — count the family

The simplest possible model. Create models/marts/family_count.sql:

select count(*) as family_size
from {{ ref('family_members') }}

Build it:

dbt build -s family_count

Confirm in BigQuery:

select * from `doe-family-dwh`.`doe_family_dev`.`family_count`;

You should see one row: family_size = 5.

Worked example 2 — list the kids' emails

A slightly less trivial model. Create models/marts/kid_emails.sql:

select
  member as name,
  email
from {{ ref('family_members') }}
where role = 'child'
order by name

Build it:

dbt build -s kid_emails

You should see Jack, Joe, and Julie with their emails.

Worked example 3 — join the seed with a public BigQuery dataset

BigQuery ships a generous catalog of public datasets you can query for free. Let's join the family seed with one of them.

We'll use bigquery-public-data.usa_names.usa_1910_2013 — a list of US baby names by year. For each Doe kid, count how many babies were given that name in 2010.

First, register the public dataset as a dbt source. Create models/sources/usa_names/_sources.yml:

version: 2

sources:
  - name: usa_names
    database: bigquery-public-data
    schema: usa_names
    tables:
      - name: usa_1910_2013

Then create models/marts/kid_name_popularity.sql:

with kids as (
    select member as name
    from {{ ref('family_members') }}
    where role = 'child'
),

names_2010 as (
    select
        name,
        sum(number) as births_2010
    from {{ source('usa_names', 'usa_1910_2013') }}
    where year = 2010
    group by name
)

select
    k.name,
    coalesce(n.births_2010, 0) as births_2010
from kids k
left join names_2010 n
    on lower(k.name) = lower(n.name)
order by births_2010 desc

Build it:

dbt build -s kid_name_popularity

You've now joined hand-curated seed data with a real public warehouse table — the same pattern you'll use in Module 3 to join the family_members seed with resolved entities from Gmail / Calendar / Notion.

Consolidated practicum

Time to put everything in Module 2 together.

Goal: build a model that reads from a public BigQuery dataset, joins it with the family_members seed, applies at least one of the concepts from Module 2 (a ref() chain, a non-default materialization, a Jinja loop, or a macro), and push the result to GitHub through a PR.

Step 1 — confirm Module 1 is done

Cloud path: complete the assignment in 1.3 Setting up dbt Cloud
Local path: complete the assignment in 1.4 Setting up dbt locally with VS Code
Git repo created and connected: 1.5 Git and GitHub

You should be able to run dbt debug and see "All checks passed!"

Step 2 — seed + project structure

The seed from 2.1 and the layered layout from 2.7 should both be in place.

Step 3 — build something real

Suggested options:

Easy mode: rebuild kid_name_popularity from example 3 in your own project.
Harder: pick a different public BigQuery dataset (e.g., bigquery-public-data.austin_311, noaa_gsod, openaq) and join it with family_members in a way that's meaningful to you.
Concept-driven: require yourself to use at least one of the Module 2 concepts. Examples:
- A macro from 2.6 that emits a case when
- A Jinja for loop from 2.5
- A non-default materialization from 2.4
- A multi-CTE model in the standard layout from 2.3
Hardest: add a custom test (uniqueness, not-null, or accepted_values) on your seed and on your mart. See the dbt docs on tests.

Step 4 — version control

If you're going the local route:

git checkout -b m2-practicum
git add models/ seeds/ dbt_project.yml macros/
git commit -m "Module 2 practicum: kid name popularity model with macro"
git push -u origin m2-practicum

Then open a PR on GitHub and merge it.

Cloud users: commit through the dbt Cloud IDE, open the PR from your managed repo or from GitHub if you've wired one up.

Step 5 — verify

In BigQuery, query the resulting table. Confirm it contains what you expect.

Self-check

You're ready for Module 3 if you can answer yes to all of these:

I can run dbt debug and see "All checks passed!"
I have a project structured with sources/, staging/, and marts/
I've loaded a seed file with dbt seed
I've built a model that joins seed data with a source() reference
I've used at least one of: a non-default materialization, a Jinja construct, or a macro
My project is in Git with at least one merged PR

If any of those is "no", revisit the relevant lesson before continuing.

Summary

What you built	Why it matters in Module 3
`family_count`	Simplest possible model — proves your environment works
`kid_emails`	Filtering and ordering — basic mart-level model patterns
`kid_name_popularity`	Joining seed data to a warehouse source — the exact shape you'll use against resolved entities

Next Lesson

You're ready. Head to 3.1 Prerequisite: Ingesting source data with os-nexus — where real data starts flowing into the warehouse.