Git and GitHub

Why analytics code needs version control, the core git workflow, pushing to GitHub, and a dbt-tuned .gitignore.

Learning Objectives

By the end of this lesson, you will be able to:

  • Articulate why analytics code belongs in version control
  • Initialize a Git repository in your dbt project
  • Use the core Git commands: status, add, commit, push, pull, branch, checkout
  • Push your project to a new GitHub repository
  • Follow the branch → commit → PR → review → merge workflow
  • Write a dbt-tuned .gitignore and explain what it excludes

Why version control matters for analytics

A spreadsheet has a single timeline: the current state. SQL scripts in a folder are the same — whoever saves last wins, and there's no history of what changed or why.

Git turns your dbt project into a branching, reviewable, undoable history:

Without Git With Git
"Who changed this model and broke production?" git log + git blame show every change
"Can we revert that bad migration?" git revert brings it back instantly
"I want to try something risky." git checkout -b experiment — separate branch
"We need another set of eyes on this." Pull request — review with comments inline
"Did the production deploy include yesterday's fix?" The commit hash on main tells you

For the Doe family this matters as soon as Jane and John are both editing the project — but it matters even when there's only one author, because Jane one year from now doesn't remember what Jane today was thinking.


Local vs Cloud users — both need Git

  • Local users (1.3): you'll run git commands in your terminal, push to GitHub, and pull updates the same way.
  • dbt Cloud users (1.2): dbt Cloud uses Git too. The managed repository you set up is a Git repo; the IDE commits to it through buttons. You can swap to a GitHub-backed repo at any time. Everything in this lesson conceptually applies — the only difference is you'll click buttons instead of typing commands.

The rest of the lesson is written for the command line. Cloud users should still read it through; the same concepts (branches, commits, PRs, merges) appear in the IDE.


Install Git

Most systems have Git already:

git --version

If not, install it:

# macOS
brew install git

# Linux (Debian/Ubuntu)
sudo apt install git

# Windows
# Download from https://git-scm.com/download/win

Configure your identity (used on every commit):

git config --global user.name "Jane Doe"
git config --global user.email "jane@doefamily.example"

Initialize a Git repo in your dbt project

From the root of your dbt project (the doe_family/ directory from lesson 1.3):

cd doe_family
git init

You now have a .git/ directory. That's the entire history database — everything else in the folder is your working copy.


The dbt .gitignore

Before your first commit, exclude files that shouldn't be in version control. Create .gitignore at the project root with:

# dbt build artifacts — regenerated on every run
target/
dbt_packages/
logs/

# Local profiles — contains credentials, never commit
profiles.yml

# Python venv (if you keep it inside the project)
.venv/

# OS noise
.DS_Store
Thumbs.db

# Editor noise
.vscode/
.idea/

Why each entry matters:

Path Why excluded
target/ Compiled SQL and run artifacts; regenerated every dbt run
dbt_packages/ Installed via dbt deps; restored on demand, don't track binary
logs/ Run logs; noisy and grow forever
profiles.yml Contains warehouse credentials — never commit
.venv/ Your local Python virtualenv

If you put profiles.yml in ~/.dbt/profiles.yml (the default), it already lives outside your repo and .gitignore is just belt-and-suspenders. If you've put it inside the project, the ignore is critical.


The core Git workflow

git status

Shows what's changed and what's staged. Run it constantly.

git status

git add

Stages changes for the next commit. Stage specific files:

git add models/example/jane_says_hi.sql .gitignore

Avoid git add . or git add -A — they're easy ways to accidentally commit credentials or large files.

git commit

Records the staged changes as a new commit:

git commit -m "Initial dbt project with example model"

Commit messages should explain why, not just what. The diff already shows what changed.

git log

Shows the commit history:

git log --oneline -10

git branch and git checkout

A branch is a parallel line of work. Create one for any non-trivial change:

git checkout -b add-gmail-source
# ...make changes, commit...
git checkout main          # back to the main branch

Branches let you experiment without disturbing main.


Pushing to GitHub

Create a GitHub repo

  1. Go to github.com and sign in
  2. Click + → New repository
  3. Name it doe-family-dbt
  4. Leave it empty — don't initialize with a README; you already have files
  5. Click Create

GitHub gives you a URL like git@github.com:janedoe/doe-family-dbt.git.

Connect your local repo

git remote add origin git@github.com:janedoe/doe-family-dbt.git
git branch -M main
git push -u origin main

The -u flag sets the upstream so future git push and git pull commands know where to go.


The branch → PR → merge workflow

For any meaningful change to a project on GitHub:

# 1. Start a feature branch
git checkout -b add-gmail-source

# 2. Make your changes, commit as you go
# (edit files...)
git add models/sources/gmail/gmail_events.sql
git commit -m "Add gmail events source model"

# 3. Push the branch to GitHub
git push -u origin add-gmail-source

# 4. Open a pull request on GitHub
#    GitHub will print a URL; or visit your repo and click "Compare & pull request"

# 5. Reviewers comment, you push fixes to the same branch
git add models/sources/gmail/gmail_events.sql
git commit -m "Add tests for gmail events"
git push

# 6. Once approved, merge via the GitHub UI

# 7. Locally, get back to main and pull the merged change
git checkout main
git pull

Why use PRs even when you're a solo Doe-family analyst:

  • A review checkpoint — you read your own diff one more time before it lands. Catches surprising amounts of nonsense.
  • A natural reflection point — write a PR description that explains what you did and why. Future-you will thank present-you.
  • CI hooks — once you set up automated tests, every PR runs them.

Hands-On Exercise

  1. Run git init in your dbt project.

  2. Create the .gitignore above.

  3. Stage and commit your initial files:

    git add .gitignore models/ seeds/ dbt_project.yml
    git commit -m "Initial dbt project"
    
  4. Create an empty doe-family-dbt repo on GitHub.

  5. Add the remote and push:

    git remote add origin <your-repo-url>
    git push -u origin main
    
  6. Create a feature branch, add a trivial model, commit, push it, and open a pull request against main.

  7. Merge the PR via the GitHub UI. Then git checkout main && git pull to bring the merge back to your machine.


Summary

Concept Key takeaway
Why version control History, reviewability, undo, branching, collaboration
.gitignore for dbt Always exclude target/, dbt_packages/, logs/, profiles.yml
Daily commands status, add, commit, push, pull, branch, checkout
Branching One branch per change; keep main deployable
Pull requests Self-review + collaboration + CI hook point
Cloud users Same concepts, different UI — the dbt Cloud IDE wraps Git in buttons

Next Lesson

Project on disk, project in version control — Module 1 is done. Time to start learning what dbt actually does. Head to Module 2 — dbt Basics and the first lesson: 2.1 Seeding the Doe family data.