Nexus Data

Learning Objectives

By the end of this lesson, you will be able to:

Articulate why analytics code belongs in version control
Initialize a Git repository in your dbt project
Use the core Git commands: status, add, commit, push, pull, branch, checkout
Push your project to a new GitHub repository
Follow the branch → commit → PR → review → merge workflow
Write a dbt-tuned .gitignore and explain what it excludes

Why version control matters for analytics

A spreadsheet has a single timeline: the current state. SQL scripts in a folder are the same — whoever saves last wins, and there's no history of what changed or why.

Git turns your dbt project into a branching, reviewable, undoable history:

Without Git	With Git
"Who changed this model and broke production?"	`git log` + `git blame` show every change
"Can we revert that bad migration?"	`git revert` brings it back instantly
"I want to try something risky."	`git checkout -b experiment` — separate branch
"We need another set of eyes on this."	Pull request — review with comments inline
"Did the production deploy include yesterday's fix?"	The commit hash on `main` tells you

For the Doe family this matters as soon as Jane and John are both editing the project — but it matters even when there's only one author, because Jane one year from now doesn't remember what Jane today was thinking.

Local vs Cloud users — both need Git

Local users (1.3): you'll run git commands in your terminal, push to GitHub, and pull updates the same way.
dbt Cloud users (1.2): dbt Cloud uses Git too. The managed repository you set up is a Git repo; the IDE commits to it through buttons. You can swap to a GitHub-backed repo at any time. Everything in this lesson conceptually applies — the only difference is you'll click buttons instead of typing commands.

The rest of the lesson is written for the command line. Cloud users should still read it through; the same concepts (branches, commits, PRs, merges) appear in the IDE.

Install Git

Most systems have Git already:

git --version

If not, install it:

# macOS
brew install git

# Linux (Debian/Ubuntu)
sudo apt install git

# Windows
# Download from https://git-scm.com/download/win

Configure your identity (used on every commit):

git config --global user.name "Jane Doe"
git config --global user.email "jane@doefamily.example"

Initialize a Git repo in your dbt project

From the root of your dbt project (the doe_family/ directory from lesson 1.3):

cd doe_family
git init

You now have a .git/ directory. That's the entire history database — everything else in the folder is your working copy.

The dbt `.gitignore`

Before your first commit, exclude files that shouldn't be in version control. Create .gitignore at the project root with:

# dbt build artifacts — regenerated on every run
target/
dbt_packages/
logs/

# Local profiles — contains credentials, never commit
profiles.yml

# Python venv (if you keep it inside the project)
.venv/

# OS noise
.DS_Store
Thumbs.db

# Editor noise
.vscode/
.idea/

Why each entry matters:

Path	Why excluded
`target/`	Compiled SQL and run artifacts; regenerated every `dbt run`
`dbt_packages/`	Installed via `dbt deps`; restored on demand, don't track binary
`logs/`	Run logs; noisy and grow forever
`profiles.yml`	Contains warehouse credentials — never commit
`.venv/`	Your local Python virtualenv

If you put profiles.yml in ~/.dbt/profiles.yml (the default), it already lives outside your repo and .gitignore is just belt-and-suspenders. If you've put it inside the project, the ignore is critical.

The core Git workflow

`git status`

Shows what's changed and what's staged. Run it constantly.

git status

`git add`

Stages changes for the next commit. Stage specific files:

git add models/example/jane_says_hi.sql .gitignore

Avoid git add . or git add -A — they're easy ways to accidentally commit credentials or large files.

`git commit`

Records the staged changes as a new commit:

git commit -m "Initial dbt project with example model"

Commit messages should explain why, not just what. The diff already shows what changed.

`git log`

Shows the commit history:

git log --oneline -10

`git branch` and `git checkout`

A branch is a parallel line of work. Create one for any non-trivial change:

git checkout -b add-gmail-source
# ...make changes, commit...
git checkout main          # back to the main branch

Branches let you experiment without disturbing main.

Pushing to GitHub

Create a GitHub repo

Go to github.com and sign in
Click + → New repository
Name it doe-family-dbt
Leave it empty — don't initialize with a README; you already have files
Click Create

GitHub gives you a URL like git@github.com:janedoe/doe-family-dbt.git.

Connect your local repo

git remote add origin git@github.com:janedoe/doe-family-dbt.git
git branch -M main
git push -u origin main

The -u flag sets the upstream so future git push and git pull commands know where to go.

The branch → PR → merge workflow

For any meaningful change to a project on GitHub:

# 1. Start a feature branch
git checkout -b add-gmail-source

# 2. Make your changes, commit as you go
# (edit files...)
git add models/sources/gmail/gmail_events.sql
git commit -m "Add gmail events source model"

# 3. Push the branch to GitHub
git push -u origin add-gmail-source

# 4. Open a pull request on GitHub
#    GitHub will print a URL; or visit your repo and click "Compare & pull request"

# 5. Reviewers comment, you push fixes to the same branch
git add models/sources/gmail/gmail_events.sql
git commit -m "Add tests for gmail events"
git push

# 6. Once approved, merge via the GitHub UI

# 7. Locally, get back to main and pull the merged change
git checkout main
git pull

Why use PRs even when you're a solo Doe-family analyst:

A review checkpoint — you read your own diff one more time before it lands. Catches surprising amounts of nonsense.
A natural reflection point — write a PR description that explains what you did and why. Future-you will thank present-you.
CI hooks — once you set up automated tests, every PR runs them.

Hands-On Exercise

Run git init in your dbt project.
Create the .gitignore above.

Stage and commit your initial files:

git add .gitignore models/ seeds/ dbt_project.yml
git commit -m "Initial dbt project"

Create an empty doe-family-dbt repo on GitHub.

Add the remote and push:

git remote add origin <your-repo-url>
git push -u origin main

Create a feature branch, add a trivial model, commit, push it, and open a pull request against main.
Merge the PR via the GitHub UI. Then git checkout main && git pull to bring the merge back to your machine.

Summary

Concept	Key takeaway
Why version control	History, reviewability, undo, branching, collaboration
`.gitignore` for dbt	Always exclude `target/`, `dbt_packages/`, `logs/`, `profiles.yml`
Daily commands	`status`, `add`, `commit`, `push`, `pull`, `branch`, `checkout`
Branching	One branch per change; keep `main` deployable
Pull requests	Self-review + collaboration + CI hook point
Cloud users	Same concepts, different UI — the dbt Cloud IDE wraps Git in buttons

Next Lesson

Project on disk, project in version control — Module 1 is done. Time to start learning what dbt actually does. Head to Module 2 — dbt Basics and the first lesson: 2.1 Seeding the Doe family data.