Prerequisite — Setting up BigQuery
Create a GCP project, enable BigQuery, and provision a service account dbt and the ingestion pipeline will both use. Datasets get created automatically.
This is a one-time infrastructure prerequisite for Module 1. You'll need a BigQuery project before you can connect dbt to anything in lessons 1.3 / 1.4.
Learning Objectives
By the end of this lesson, you will have:
- A Google Cloud project with the BigQuery API enabled
- A service account with the right roles to read, write, and create BigQuery datasets
- A downloaded JSON key for that service account
You don't need to create datasets by hand — dbt creates its own output dataset on its first run, and the os-nexus ingestion pipeline (Module 3's prereq lesson) creates per-source datasets on its first sync.
This entire lesson is one-time setup. You won't touch most of it again after today.
Why BigQuery?
The Doe family runs on BigQuery because:
- It has a generous free tier (1 TB of query, 10 GB of storage per month at no charge)
- It's the warehouse most dbt-nexus clients use, so all the templates and examples in this course align with it
- The os-nexus ingestion pipeline (next lesson) writes directly to BigQuery
If you want to use a different warehouse (Snowflake, Redshift, etc.), dbt-nexus supports them — but you'll have to translate the BigQuery specifics in this course as you go. For this course we assume BigQuery.
Step 1: Create a Google Cloud project
- Go to console.cloud.google.com
- Click the project picker (top bar) → New Project
- Name it
doe-family-dwh(or your own equivalent) - Pick or create a billing account — BigQuery's free tier still requires a billing account to be attached, but you won't be charged for personal-scale usage
- Click Create
Make sure the new project is selected before continuing.
Step 2: Enable the BigQuery API
- In the console, search "BigQuery API" in the top search bar
- Click Enable if it isn't already
(BigQuery is usually enabled by default on new projects, but it's worth confirming.)
Step 3: Datasets get created automatically
You don't need to create any BigQuery datasets by hand. The pipelines do it for you:
| Dataset | Created by |
|---|---|
gmail |
os-nexus, on the first Gmail sync |
google_calendar |
os-nexus, on the first Calendar sync |
notion |
os-nexus, on the first Notion sync |
doe_family_dev |
dbt, on the first dbt run after setup |
Source datasets are named after the source system — no raw_
prefix. This is the canonical nexus convention; check
clients/slide-rule-tech/ for an example.
What this means for you in this lesson: the service account you create in the next step needs the permission to create datasets, not just read and write them. We grant that role explicitly below.
Step 4: Create a service account
dbt (and the os-nexus ingestion pipeline) will authenticate as a service account — a non-human Google identity that owns a key.
- In the console, go to IAM & Admin → Service Accounts
- Click Create Service Account
- Name:
nexus-pipeline(or similar) - Description: "Service account for dbt + os-nexus ingestion"
- Click Create and continue
Grant the service account these roles:
| Role | Why |
|---|---|
| BigQuery Job User | Run queries and load jobs |
| BigQuery Data Editor | Read and write tables |
| BigQuery Data Owner | Create the source and output datasets on first run |
Click Continue and Done.
Step 5: Create and download a JSON key
- In the service accounts list, click your new
nexus-pipelineaccount - Open the Keys tab → Add key → Create new key
- Choose JSON → Create
- The browser downloads a file like
doe-family-dwh-a1b2c3d4.json
Move it somewhere safe and easy to reference, e.g.:
mkdir -p ~/.gcp
mv ~/Downloads/doe-family-dwh-a1b2c3d4.json ~/.gcp/nexus-pipeline.json
chmod 600 ~/.gcp/nexus-pipeline.json
Never commit this key to Git. Your .gitignore in lesson 1.5 will
exclude common credential paths, but the safest move is to keep the
key file completely outside any repo.
Step 6: Verify
You can confirm the service account works by impersonating it locally and listing whatever's in your project (it'll be empty for now — that's fine, the pipelines populate it later):
gcloud auth activate-service-account --key-file ~/.gcp/nexus-pipeline.json
bq ls --project_id=doe-family-dwh
If the command runs without an authentication error, you're set. Empty output is expected at this stage — datasets show up once dbt does its first build (later in this module) and once os-nexus runs its first sync (Module 3's prereq lesson).
Hands-On Exercise
-
Complete steps 1–5 in your own GCP project (use whatever project name you prefer — adjust the rest of the course accordingly).
-
Save the JSON key somewhere safe and outside any Git repo.
-
Run
bq ls --project_id=<your-project>as the service account and confirm the command runs cleanly (empty output is fine — datasets come later).
Summary
| Concept | Key takeaway |
|---|---|
| BigQuery free tier | 1 TB query / 10 GB storage per month at no charge |
| GCP project | One project owns datasets, the API, service accounts, and billing |
| Datasets | Created automatically — gmail, google_calendar, notion by os-nexus; doe_family_dev by dbt |
| Naming convention | Source datasets named after the source system, no raw_ prefix |
| Service account | A non-human identity dbt + the ingestion pipeline both authenticate as |
| JSON key | The credential — keep it safe, never commit to Git |
Next Lesson
You've got a BigQuery project and a service account key. Now point dbt at it — pick one of the next two lessons:
- 1.3 Setting up dbt Cloud (browser-based)
- 1.4 Setting up dbt locally with VS Code (terminal + editor)
You only need one. Both produce the same warehouse output.