Prerequisite — Setting up BigQuery

Create a GCP project, enable BigQuery, and provision a service account dbt and the ingestion pipeline will both use. Datasets get created automatically.

This is a one-time infrastructure prerequisite for Module 1. You'll need a BigQuery project before you can connect dbt to anything in lessons 1.3 / 1.4.

Learning Objectives

By the end of this lesson, you will have:

  • A Google Cloud project with the BigQuery API enabled
  • A service account with the right roles to read, write, and create BigQuery datasets
  • A downloaded JSON key for that service account

You don't need to create datasets by hand — dbt creates its own output dataset on its first run, and the os-nexus ingestion pipeline (Module 3's prereq lesson) creates per-source datasets on its first sync.

This entire lesson is one-time setup. You won't touch most of it again after today.


Why BigQuery?

The Doe family runs on BigQuery because:

  • It has a generous free tier (1 TB of query, 10 GB of storage per month at no charge)
  • It's the warehouse most dbt-nexus clients use, so all the templates and examples in this course align with it
  • The os-nexus ingestion pipeline (next lesson) writes directly to BigQuery

If you want to use a different warehouse (Snowflake, Redshift, etc.), dbt-nexus supports them — but you'll have to translate the BigQuery specifics in this course as you go. For this course we assume BigQuery.


Step 1: Create a Google Cloud project

  1. Go to console.cloud.google.com
  2. Click the project picker (top bar) → New Project
  3. Name it doe-family-dwh (or your own equivalent)
  4. Pick or create a billing account — BigQuery's free tier still requires a billing account to be attached, but you won't be charged for personal-scale usage
  5. Click Create

Make sure the new project is selected before continuing.


Step 2: Enable the BigQuery API

  1. In the console, search "BigQuery API" in the top search bar
  2. Click Enable if it isn't already

(BigQuery is usually enabled by default on new projects, but it's worth confirming.)


Step 3: Datasets get created automatically

You don't need to create any BigQuery datasets by hand. The pipelines do it for you:

Dataset Created by
gmail os-nexus, on the first Gmail sync
google_calendar os-nexus, on the first Calendar sync
notion os-nexus, on the first Notion sync
doe_family_dev dbt, on the first dbt run after setup

Source datasets are named after the source system — no raw_ prefix. This is the canonical nexus convention; check clients/slide-rule-tech/ for an example.

What this means for you in this lesson: the service account you create in the next step needs the permission to create datasets, not just read and write them. We grant that role explicitly below.


Step 4: Create a service account

dbt (and the os-nexus ingestion pipeline) will authenticate as a service account — a non-human Google identity that owns a key.

  1. In the console, go to IAM & Admin → Service Accounts
  2. Click Create Service Account
  3. Name: nexus-pipeline (or similar)
  4. Description: "Service account for dbt + os-nexus ingestion"
  5. Click Create and continue

Grant the service account these roles:

Role Why
BigQuery Job User Run queries and load jobs
BigQuery Data Editor Read and write tables
BigQuery Data Owner Create the source and output datasets on first run

Click Continue and Done.


Step 5: Create and download a JSON key

  1. In the service accounts list, click your new nexus-pipeline account
  2. Open the Keys tab → Add keyCreate new key
  3. Choose JSONCreate
  4. The browser downloads a file like doe-family-dwh-a1b2c3d4.json

Move it somewhere safe and easy to reference, e.g.:

mkdir -p ~/.gcp
mv ~/Downloads/doe-family-dwh-a1b2c3d4.json ~/.gcp/nexus-pipeline.json
chmod 600 ~/.gcp/nexus-pipeline.json

Never commit this key to Git. Your .gitignore in lesson 1.5 will exclude common credential paths, but the safest move is to keep the key file completely outside any repo.


Step 6: Verify

You can confirm the service account works by impersonating it locally and listing whatever's in your project (it'll be empty for now — that's fine, the pipelines populate it later):

gcloud auth activate-service-account --key-file ~/.gcp/nexus-pipeline.json
bq ls --project_id=doe-family-dwh

If the command runs without an authentication error, you're set. Empty output is expected at this stage — datasets show up once dbt does its first build (later in this module) and once os-nexus runs its first sync (Module 3's prereq lesson).


Hands-On Exercise

  1. Complete steps 1–5 in your own GCP project (use whatever project name you prefer — adjust the rest of the course accordingly).

  2. Save the JSON key somewhere safe and outside any Git repo.

  3. Run bq ls --project_id=<your-project> as the service account and confirm the command runs cleanly (empty output is fine — datasets come later).


Summary

Concept Key takeaway
BigQuery free tier 1 TB query / 10 GB storage per month at no charge
GCP project One project owns datasets, the API, service accounts, and billing
Datasets Created automatically — gmail, google_calendar, notion by os-nexus; doe_family_dev by dbt
Naming convention Source datasets named after the source system, no raw_ prefix
Service account A non-human identity dbt + the ingestion pipeline both authenticate as
JSON key The credential — keep it safe, never commit to Git

Next Lesson

You've got a BigQuery project and a service account key. Now point dbt at it — pick one of the next two lessons:

You only need one. Both produce the same warehouse output.