Git and GitHub
Why analytics code needs version control, the core git workflow, pushing to GitHub, and a dbt-tuned .gitignore.
Learning Objectives
By the end of this lesson, you will be able to:
- Articulate why analytics code belongs in version control
- Initialize a Git repository in your dbt project
- Use the core Git commands:
status,add,commit,push,pull,branch,checkout - Push your project to a new GitHub repository
- Follow the branch → commit → PR → review → merge workflow
- Write a dbt-tuned
.gitignoreand explain what it excludes
Why version control matters for analytics
A spreadsheet has a single timeline: the current state. SQL scripts in a folder are the same — whoever saves last wins, and there's no history of what changed or why.
Git turns your dbt project into a branching, reviewable, undoable history:
| Without Git | With Git |
|---|---|
| "Who changed this model and broke production?" | git log + git blame show every change |
| "Can we revert that bad migration?" | git revert brings it back instantly |
| "I want to try something risky." | git checkout -b experiment — separate branch |
| "We need another set of eyes on this." | Pull request — review with comments inline |
| "Did the production deploy include yesterday's fix?" | The commit hash on main tells you |
For the Doe family this matters as soon as Jane and John are both editing the project — but it matters even when there's only one author, because Jane one year from now doesn't remember what Jane today was thinking.
Local vs Cloud users — both need Git
- Local users (1.3): you'll run
gitcommands in your terminal, push to GitHub, and pull updates the same way. - dbt Cloud users (1.2): dbt Cloud uses Git too. The managed repository you set up is a Git repo; the IDE commits to it through buttons. You can swap to a GitHub-backed repo at any time. Everything in this lesson conceptually applies — the only difference is you'll click buttons instead of typing commands.
The rest of the lesson is written for the command line. Cloud users should still read it through; the same concepts (branches, commits, PRs, merges) appear in the IDE.
Install Git
Most systems have Git already:
git --version
If not, install it:
# macOS
brew install git
# Linux (Debian/Ubuntu)
sudo apt install git
# Windows
# Download from https://git-scm.com/download/win
Configure your identity (used on every commit):
git config --global user.name "Jane Doe"
git config --global user.email "jane@doefamily.example"
Initialize a Git repo in your dbt project
From the root of your dbt project (the doe_family/ directory from
lesson 1.3):
cd doe_family
git init
You now have a .git/ directory. That's the entire history database —
everything else in the folder is your working copy.
The dbt .gitignore
Before your first commit, exclude files that shouldn't be in version
control. Create .gitignore at the project root with:
# dbt build artifacts — regenerated on every run
target/
dbt_packages/
logs/
# Local profiles — contains credentials, never commit
profiles.yml
# Python venv (if you keep it inside the project)
.venv/
# OS noise
.DS_Store
Thumbs.db
# Editor noise
.vscode/
.idea/
Why each entry matters:
| Path | Why excluded |
|---|---|
target/ |
Compiled SQL and run artifacts; regenerated every dbt run |
dbt_packages/ |
Installed via dbt deps; restored on demand, don't track binary |
logs/ |
Run logs; noisy and grow forever |
profiles.yml |
Contains warehouse credentials — never commit |
.venv/ |
Your local Python virtualenv |
If you put profiles.yml in ~/.dbt/profiles.yml (the default), it
already lives outside your repo and .gitignore is just belt-and-suspenders.
If you've put it inside the project, the ignore is critical.
The core Git workflow
git status
Shows what's changed and what's staged. Run it constantly.
git status
git add
Stages changes for the next commit. Stage specific files:
git add models/example/jane_says_hi.sql .gitignore
Avoid git add . or git add -A — they're easy ways to accidentally
commit credentials or large files.
git commit
Records the staged changes as a new commit:
git commit -m "Initial dbt project with example model"
Commit messages should explain why, not just what. The diff already shows what changed.
git log
Shows the commit history:
git log --oneline -10
git branch and git checkout
A branch is a parallel line of work. Create one for any non-trivial change:
git checkout -b add-gmail-source
# ...make changes, commit...
git checkout main # back to the main branch
Branches let you experiment without disturbing main.
Pushing to GitHub
Create a GitHub repo
- Go to github.com and sign in
- Click + → New repository
- Name it
doe-family-dbt - Leave it empty — don't initialize with a README; you already have files
- Click Create
GitHub gives you a URL like git@github.com:janedoe/doe-family-dbt.git.
Connect your local repo
git remote add origin git@github.com:janedoe/doe-family-dbt.git
git branch -M main
git push -u origin main
The -u flag sets the upstream so future git push and git pull
commands know where to go.
The branch → PR → merge workflow
For any meaningful change to a project on GitHub:
# 1. Start a feature branch
git checkout -b add-gmail-source
# 2. Make your changes, commit as you go
# (edit files...)
git add models/sources/gmail/gmail_events.sql
git commit -m "Add gmail events source model"
# 3. Push the branch to GitHub
git push -u origin add-gmail-source
# 4. Open a pull request on GitHub
# GitHub will print a URL; or visit your repo and click "Compare & pull request"
# 5. Reviewers comment, you push fixes to the same branch
git add models/sources/gmail/gmail_events.sql
git commit -m "Add tests for gmail events"
git push
# 6. Once approved, merge via the GitHub UI
# 7. Locally, get back to main and pull the merged change
git checkout main
git pull
Why use PRs even when you're a solo Doe-family analyst:
- A review checkpoint — you read your own diff one more time before it lands. Catches surprising amounts of nonsense.
- A natural reflection point — write a PR description that explains what you did and why. Future-you will thank present-you.
- CI hooks — once you set up automated tests, every PR runs them.
Hands-On Exercise
-
Run
git initin your dbt project. -
Create the
.gitignoreabove. -
Stage and commit your initial files:
git add .gitignore models/ seeds/ dbt_project.yml git commit -m "Initial dbt project" -
Create an empty
doe-family-dbtrepo on GitHub. -
Add the remote and push:
git remote add origin <your-repo-url> git push -u origin main -
Create a feature branch, add a trivial model, commit, push it, and open a pull request against
main. -
Merge the PR via the GitHub UI. Then
git checkout main && git pullto bring the merge back to your machine.
Summary
| Concept | Key takeaway |
|---|---|
| Why version control | History, reviewability, undo, branching, collaboration |
.gitignore for dbt |
Always exclude target/, dbt_packages/, logs/, profiles.yml |
| Daily commands | status, add, commit, push, pull, branch, checkout |
| Branching | One branch per change; keep main deployable |
| Pull requests | Self-review + collaboration + CI hook point |
| Cloud users | Same concepts, different UI — the dbt Cloud IDE wraps Git in buttons |
Next Lesson
Project on disk, project in version control — Module 1 is done. Time to start learning what dbt actually does. Head to Module 2 — dbt Basics and the first lesson: 2.1 Seeding the Doe family data.