Measuring impact
How to A/B query cost objectively — using BigQuery dry-run estimates, the analysis HTTP API's `stats` field, and the `warehouse_optimization` toggle as a controlled comparison.
The hardest part of warehouse perf work is measuring it. BigQuery's free 24-hour result cache plus the CURRENT_* daily-anchor rewrite mean repeated wall-clock numbers can swing 100× between consecutive runs of the same query — and most of that swing has nothing to do with the change you're evaluating.
This page covers the deterministic measurement paths.
1. dryRun — the deterministic primary
BigQuery's dryRun: true job config plans the query and returns the
scan-bytes estimate without executing it. No rows, no slot time,
no cache interaction. Two consecutive dry-runs of the same query
will return identical bytes regardless of cache state — it's a
deterministic prediction of the cold-cache scan cost.
Nexus exposes this through the analysis HTTP API:
curl -X POST https://hidden-guanaco-95.convex.site/analysis/query \
-H "Authorization: Bearer $API_KEY" \
--data-raw '{
"warehouseId": "<id>",
"query": "SELECT COUNT(*) FROM nexus_events WHERE occurred_at >= TIMESTAMP \"2026-03-01\"",
"dryRun": true
}'
# → { "data": [], "stats": { "totalBytesProcessed": 1069984, ... } }
What dry-run does measure
- Partition prune. A query with
occurred_at >= X AND occurred_at < Yvs. the same query without the date predicate will show dramatically differenttotalBytesProcessed. This isolates the partition-prune benefit cleanly. - Column-prune.
SELECT event_idvs.SELECT *shows the column- read cost difference for the same row set. - Join elimination (with PK/FK constraints — see below).
What dry-run does NOT measure
- Cluster prune. BigQuery's dry-run estimator is conservative and doesn't include block-pruning effects from clustering. Cluster prune happens at execution time using block-level statistics that the planner doesn't have at plan time. Use a real run + the query plan (see below).
- Slot time / wall clock. No execution, no compute. If your question is "how long will this take to render?", a dry-run won't answer it.
2. Real-run stats — slot time + cache hit + jobId
The same analysis API returns BigQuery job statistics on every real run too:
curl -X POST https://hidden-guanaco-95.convex.site/analysis/query \
-H "Authorization: Bearer $API_KEY" \
--data-raw '{"warehouseId":"<id>","query":"<sql>"}'
# → {
# "data": [...],
# "stats": {
# "totalBytesProcessed": 1234567,
# "totalSlotMs": 4321,
# "cacheHit": false,
# "jobId": "599b307d-..."
# }
# }
If cacheHit: true, all the byte/slot numbers will be zero or close
to it — the result was served from BigQuery's 24h result cache. To
force a cold run for measurement, prepend a per-run cache-busting
comment:
-- bench: 2026-06-03T12:34:56Z run-id-1
SELECT ... FROM nexus_events WHERE ...
Cache key includes the SQL text, so any unique comment defeats the cache without changing the query plan.
jobId can be looked up in INFORMATION_SCHEMA.JOBS_BY_USER for the
full per-stage query plan (see below).
3. Controlled A/B via the feature toggle
For changes whose effect doesn't show in dry-run (clustering, informational constraints applied at execution time), the cleanest measurement is the feature toggle:
# In your client's dbt_project.yml
vars:
nexus:
warehouse_optimization:
enabled: false # disable for this measurement
Then dbt run --select nexus_events nexus_event_dimensions ... to
rebuild without the optimizations, run your benchmark queries,
capture stats, set enabled: true (or remove the override), rebuild,
re-benchmark.
Yes, this rebuilds the substrate twice. It's the only way to get a true control vs. treatment comparison in the same warehouse, same dataset, same data — every other variable held constant.
4. The BigQuery query plan (when dry-run isn't enough)
For deep questions like "is cluster prune actually firing?" or
"which join stage is slowest?", read the per-stage query plan from
INFORMATION_SCHEMA.JOBS_BY_USER:
SELECT
job_id,
query,
total_bytes_processed,
total_slot_ms,
cache_hit,
query_info.query_plan AS plan
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE job_id = '599b307d-...' -- from stats.jobId
The query_plan array has one entry per stage with records_read,
records_written, shuffle_output_bytes, and a human-readable
name ("Read events", "Hash Join", etc.). For cluster prune
verification, look at the read stage's records_read — if it's much
smaller than the table's total row count for a query with a
clustered-column predicate, the prune fired.
JOBS_BY_USER requires elevated IAM in some projects. If you don't
have access, the BigQuery web console's "Execution details" tab on
any completed job shows the same plan tree visually.
5. Sanity-checking what you observe
A few things that look like build-time misses but actually aren't:
- Query has no
occurred_atfilter. Partition prune cannot fire. The build-time partition is irrelevant; rewrite the query to scope by date. COUNT(*)over a partition. BigQuery treats this as metadata-only and reports 0 bytes regardless of partition or cluster setup. The 0 isn't a perf win — it's the operation never needing data to begin with.- Cluster column used in
WHEREbut baseline already small. Adding a column predicate forces BigQuery to read that column, which can increase reported bytes for a metadata-light baseline. Compare against a query that already reads the same columns. - Result served from cache.
cacheHit: truemakes byte numbers near zero. Always cache-bust for cold measurements.
Reference
The dbt-nexus optimizations on the read path:
- BigQuery optimizations — what gets applied and the toggle to control it.