Measuring impact

How to A/B query cost objectively — using BigQuery dry-run estimates, the analysis HTTP API's `stats` field, and the `warehouse_optimization` toggle as a controlled comparison.

The hardest part of warehouse perf work is measuring it. BigQuery's free 24-hour result cache plus the CURRENT_* daily-anchor rewrite mean repeated wall-clock numbers can swing 100× between consecutive runs of the same query — and most of that swing has nothing to do with the change you're evaluating.

This page covers the deterministic measurement paths.

1. dryRun — the deterministic primary

BigQuery's dryRun: true job config plans the query and returns the scan-bytes estimate without executing it. No rows, no slot time, no cache interaction. Two consecutive dry-runs of the same query will return identical bytes regardless of cache state — it's a deterministic prediction of the cold-cache scan cost.

Nexus exposes this through the analysis HTTP API:

curl -X POST https://hidden-guanaco-95.convex.site/analysis/query \
  -H "Authorization: Bearer $API_KEY" \
  --data-raw '{
    "warehouseId": "<id>",
    "query": "SELECT COUNT(*) FROM nexus_events WHERE occurred_at >= TIMESTAMP \"2026-03-01\"",
    "dryRun": true
  }'
# → { "data": [], "stats": { "totalBytesProcessed": 1069984, ... } }

What dry-run does measure

  • Partition prune. A query with occurred_at >= X AND occurred_at < Y vs. the same query without the date predicate will show dramatically different totalBytesProcessed. This isolates the partition-prune benefit cleanly.
  • Column-prune. SELECT event_id vs. SELECT * shows the column- read cost difference for the same row set.
  • Join elimination (with PK/FK constraints — see below).

What dry-run does NOT measure

  • Cluster prune. BigQuery's dry-run estimator is conservative and doesn't include block-pruning effects from clustering. Cluster prune happens at execution time using block-level statistics that the planner doesn't have at plan time. Use a real run + the query plan (see below).
  • Slot time / wall clock. No execution, no compute. If your question is "how long will this take to render?", a dry-run won't answer it.

2. Real-run stats — slot time + cache hit + jobId

The same analysis API returns BigQuery job statistics on every real run too:

curl -X POST https://hidden-guanaco-95.convex.site/analysis/query \
  -H "Authorization: Bearer $API_KEY" \
  --data-raw '{"warehouseId":"<id>","query":"<sql>"}'
# → {
#     "data": [...],
#     "stats": {
#       "totalBytesProcessed": 1234567,
#       "totalSlotMs": 4321,
#       "cacheHit": false,
#       "jobId": "599b307d-..."
#     }
#   }

If cacheHit: true, all the byte/slot numbers will be zero or close to it — the result was served from BigQuery's 24h result cache. To force a cold run for measurement, prepend a per-run cache-busting comment:

-- bench: 2026-06-03T12:34:56Z run-id-1
SELECT ... FROM nexus_events WHERE ...

Cache key includes the SQL text, so any unique comment defeats the cache without changing the query plan.

jobId can be looked up in INFORMATION_SCHEMA.JOBS_BY_USER for the full per-stage query plan (see below).

3. Controlled A/B via the feature toggle

For changes whose effect doesn't show in dry-run (clustering, informational constraints applied at execution time), the cleanest measurement is the feature toggle:

# In your client's dbt_project.yml
vars:
  nexus:
    warehouse_optimization:
      enabled: false   # disable for this measurement

Then dbt run --select nexus_events nexus_event_dimensions ... to rebuild without the optimizations, run your benchmark queries, capture stats, set enabled: true (or remove the override), rebuild, re-benchmark.

Yes, this rebuilds the substrate twice. It's the only way to get a true control vs. treatment comparison in the same warehouse, same dataset, same data — every other variable held constant.

4. The BigQuery query plan (when dry-run isn't enough)

For deep questions like "is cluster prune actually firing?" or "which join stage is slowest?", read the per-stage query plan from INFORMATION_SCHEMA.JOBS_BY_USER:

SELECT
  job_id,
  query,
  total_bytes_processed,
  total_slot_ms,
  cache_hit,
  query_info.query_plan AS plan
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE job_id = '599b307d-...'   -- from stats.jobId

The query_plan array has one entry per stage with records_read, records_written, shuffle_output_bytes, and a human-readable name ("Read events", "Hash Join", etc.). For cluster prune verification, look at the read stage's records_read — if it's much smaller than the table's total row count for a query with a clustered-column predicate, the prune fired.

JOBS_BY_USER requires elevated IAM in some projects. If you don't have access, the BigQuery web console's "Execution details" tab on any completed job shows the same plan tree visually.

5. Sanity-checking what you observe

A few things that look like build-time misses but actually aren't:

  • Query has no occurred_at filter. Partition prune cannot fire. The build-time partition is irrelevant; rewrite the query to scope by date.
  • COUNT(*) over a partition. BigQuery treats this as metadata-only and reports 0 bytes regardless of partition or cluster setup. The 0 isn't a perf win — it's the operation never needing data to begin with.
  • Cluster column used in WHERE but baseline already small. Adding a column predicate forces BigQuery to read that column, which can increase reported bytes for a metadata-light baseline. Compare against a query that already reads the same columns.
  • Result served from cache. cacheHit: true makes byte numbers near zero. Always cache-bust for cold measurements.

Reference

The dbt-nexus optimizations on the read path: