Nexus Data

This is a one-time infrastructure prerequisite for Module 1. You'll need a BigQuery project before you can connect dbt to anything in lessons 1.3 / 1.4.

Learning Objectives

By the end of this lesson, you will have:

A Google Cloud project with the BigQuery API enabled
A service account with the right roles to read, write, and create BigQuery datasets
A downloaded JSON key for that service account

You don't need to create datasets by hand — dbt creates its own output dataset on its first run, and the os-nexus ingestion pipeline (Module 3's prereq lesson) creates per-source datasets on its first sync.

This entire lesson is one-time setup. You won't touch most of it again after today.

Why BigQuery?

The Doe family runs on BigQuery because:

It has a generous free tier (1 TB of query, 10 GB of storage per month at no charge)
It's the warehouse most dbt-nexus clients use, so all the templates and examples in this course align with it
The os-nexus ingestion pipeline (next lesson) writes directly to BigQuery

If you want to use a different warehouse (Snowflake, Redshift, etc.), dbt-nexus supports them — but you'll have to translate the BigQuery specifics in this course as you go. For this course we assume BigQuery.

Step 1: Create a Google Cloud project

Go to console.cloud.google.com
Click the project picker (top bar) → New Project
Name it doe-family-dwh (or your own equivalent)
Pick or create a billing account — BigQuery's free tier still requires a billing account to be attached, but you won't be charged for personal-scale usage
Click Create

Make sure the new project is selected before continuing.

Step 2: Enable the BigQuery API

In the console, search "BigQuery API" in the top search bar
Click Enable if it isn't already

(BigQuery is usually enabled by default on new projects, but it's worth confirming.)

Step 3: Datasets get created automatically

You don't need to create any BigQuery datasets by hand. The pipelines do it for you:

Dataset	Created by
`gmail`	os-nexus, on the first Gmail sync
`google_calendar`	os-nexus, on the first Calendar sync
`notion`	os-nexus, on the first Notion sync
`doe_family_dev`	dbt, on the first `dbt run` after setup

Source datasets are named after the source system — no raw_ prefix. This is the canonical nexus convention; check clients/slide-rule-tech/ for an example.

What this means for you in this lesson: the service account you create in the next step needs the permission to create datasets, not just read and write them. We grant that role explicitly below.

Step 4: Create a service account

dbt (and the os-nexus ingestion pipeline) will authenticate as a service account — a non-human Google identity that owns a key.

In the console, go to IAM & Admin → Service Accounts
Click Create Service Account
Name: nexus-pipeline (or similar)
Description: "Service account for dbt + os-nexus ingestion"
Click Create and continue

Grant the service account these roles:

Role	Why
BigQuery Job User	Run queries and load jobs
BigQuery Data Editor	Read and write tables
BigQuery Data Owner	Create the source and output datasets on first run

Click Continue and Done.

Step 5: Create and download a JSON key

In the service accounts list, click your new nexus-pipeline account
Open the Keys tab → Add key → Create new key
Choose JSON → Create
The browser downloads a file like doe-family-dwh-a1b2c3d4.json

Move it somewhere safe and easy to reference, e.g.:

mkdir -p ~/.gcp
mv ~/Downloads/doe-family-dwh-a1b2c3d4.json ~/.gcp/nexus-pipeline.json
chmod 600 ~/.gcp/nexus-pipeline.json

Never commit this key to Git. Your .gitignore in lesson 1.5 will exclude common credential paths, but the safest move is to keep the key file completely outside any repo.

Step 6: Verify

You can confirm the service account works by impersonating it locally and listing whatever's in your project (it'll be empty for now — that's fine, the pipelines populate it later):

gcloud auth activate-service-account --key-file ~/.gcp/nexus-pipeline.json
bq ls --project_id=doe-family-dwh

If the command runs without an authentication error, you're set. Empty output is expected at this stage — datasets show up once dbt does its first build (later in this module) and once os-nexus runs its first sync (Module 3's prereq lesson).

Hands-On Exercise

Complete steps 1–5 in your own GCP project (use whatever project name you prefer — adjust the rest of the course accordingly).
Save the JSON key somewhere safe and outside any Git repo.
Run bq ls --project_id=<your-project> as the service account and confirm the command runs cleanly (empty output is fine — datasets come later).

Summary

Concept	Key takeaway
BigQuery free tier	1 TB query / 10 GB storage per month at no charge
GCP project	One project owns datasets, the API, service accounts, and billing
Datasets	Created automatically — `gmail`, `google_calendar`, `notion` by os-nexus; `doe_family_dev` by dbt
Naming convention	Source datasets named after the source system, no `raw_` prefix
Service account	A non-human identity dbt + the ingestion pipeline both authenticate as
JSON key	The credential — keep it safe, never commit to Git

Next Lesson

You've got a BigQuery project and a service account key. Now point dbt at it — pick one of the next two lessons:

1.3 Setting up dbt Cloud (browser-based)
1.4 Setting up dbt locally with VS Code (terminal + editor)

You only need one. Both produce the same warehouse output.