How We Handled Sensitive Salesforce Data Without Pulling It Into the Warehouse

The Problem

A financial services client wanted a report card that showed advisors which client records were complete in Salesforce and which were missing data, including Social Security Numbers.

But for secrutiy reasons, they didn't want to pull the Social Security Number into the data warehouse.

So we had to find a way to check if the Social Security Number existed in Salesforce without pulling it into the data warehouse via our normal ingestion pipeline used for other fields.

Fortunately, they used our custom ingestion pipeline for Salesforce data, which gives us full control over what gets fetched and how it gets stored, allowing us to store whether or not the contact had a Social Security Number without pulling the actual value into the data warehouse.

Here's how we did it.

How We Solved It

Because this client runs on our custom ingestion pipeline — which syncs Salesforce data into BigQuery and feeds dbt models — we had full control over what gets fetched and how it gets stored.

We separated sensitive fields from regular fields in our pipeline configuration. Regular fields (name, email, phone, account info) get fetched normally. Sensitive fields like Social Security Number go through a different process.

For each sensitive field, the pipeline runs a separate query using Salesforce's query language, SOQL, that only asks: "Which records have this field populated?"

SELECT Id FROM Contact WHERE Social_Security_Number != null

This returns a list of record IDs — nothing else. The actual SSN value is never requested, never returned, and never touches our infrastructure.

The pipeline then adds a simple boolean to each record — has_social_security: true or false — before writing it to the warehouse.

The result: the advisor report card can show "62% of core contacts have an SSN on file" and drill down to the specific clients who are missing one. The advisor goes into Salesforce directly to fix it. The SSN itself never left Salesforce.

Why This Matters

The security benefit isn't about encryption or access controls on the warehouse side. It's simpler than that: if the data never leaves Salesforce, it can't leak from your pipeline.

It can't show up in proxy logs. It can't end up in a warehouse backup. It can't be exposed through a misconfigured dashboard. It can't be included in a data export someone runs for a one-off analysis. The attack surface is zero because the data simply isn't there.

For a financial services company handling SSNs, that distinction — between "we have it but it's secured" and "we never had it" — matters a lot.

Why We Build Custom Ingestion Pipelines

We shipped this feature in a couple of hours. We added the field to a sensitiveFields config, the pipeline handler picked it up, and the boolean flowed through to the dbt model and into the report card.

If this client were using an off-the-shelf ETL tool like Fivetran or Airbyte, this would have been a different story. Those tools sync field values — you get the full value or you don't get the field at all. There's no built-in option to say "check if this field has a value, but don't actually sync the value." You'd have to either sync the SSN and try to restrict it downstream, skip the field entirely and lose the ability to report on it, or build some workaround outside the tool.

That's the advantage of a custom ingestion pipeline. When a client has a new requirement — especially one with compliance implications — we're not fighting the limitations of an off-the-shelf tool or filing a feature request. We just build it.

Reach out to us if you're dealing with sensitive Salesforce data in your pipeline and want to talk through your options.