How to build a Sheets ↔ HubSpot bidirectional sync in n8n with audit logs + drift detection

Most “we synced our CRM” tutorials I’ve seen stop at “fetch from A, push to B, done.” That works for a demo. It does not survive 6 months of two people editing the same record from different directions.

This is the architecture I use for production Sheets ↔ HubSpot syncs — the kind where missing a HubSpot status change means a renewal goes unnoticed. Posting it because every fortnight someone in this forum asks the same question, and answers tend to skip the “what happens when the two sides disagree” part.

The shape of the problem

You have a master customer list in Google Sheets that ops/marketing edit freely. You have HubSpot where sales lives.
Without discipline:

  • Leads added in Sheets never make it to HubSpot
  • HubSpot stage changes never propagate back to Sheets
  • When the two diverge, no one knows who’s right
  • Renewals get missed because ops is working from a stale view

High-level architecture

One n8n workflow, schedule-triggered every 5 minutes:

Cron(5min)
├─► Poll Sheets for rows changed since last_run_ts
├─► Poll HubSpot contacts for modified-since last_run_ts
├─► Merge + dedup (lower(email) + normalize(phone))
├─► Resolve conflicts: last-modified-wins, with audit reason
├─► For each merged change:
│ 1. Write audit row to Postgres FIRST
│ 2. Push to opposite system (Sheets or HubSpot)
│ 3. Update audit row with API response
└─► Update last_run_ts

Separate weekly workflow:

Cron(weekly)
└─► Fetch ALL contacts from both sources
└─► Compare end-to-end
└─► If drift > 2% → alert

Step 1 — Poll both sides for changes only

For Sheets, use a last_modified timestamp column. Filter the trigger node to LAST_MODIFIED > {{$json["last_run_ts"]}}. Don’t pull the whole sheet every run — Sheets has aggressive rate limits and you’ll hit them at any real scale.

For HubSpot, the Contacts API supports lastmodifieddate as a filter. Use that — don’t pull-everything-and-diff.

Store last_run_ts in Postgres (or n8n’s static data store), never in workflow variables. Workflow restarts wipe variables.

Step 2 — Dedup function

Two records with JOHN@acme.com and john@acme.com are the same person. So are +1 (555) 123-4567 and +15551234567. Dedup BEFORE the write is non-negotiable.

function normalize(record) {
  return {
    ...record,
    _key: (record.email || '').toLowerCase().trim() + '|' +
          (record.phone || '').replace(/[^\d+]/g, '')
  };
}

Merge by _key. If two records hit the same key but have different last names — don't guess. Flag for human review
(write to a dedup_review table). Soft-fail beats wrong-merge.

Step 3 — Conflict resolution: last-modified-wins + audit reason

If both sides changed the same field since last sync, take the more recent one. But — and this is what most tutorials
skip — write an audit row explaining WHY:

INSERT INTO sync_audit (
  contact_id, source, target, field_changed,
  old_value, new_value, reason, decided_at
) VALUES (
  '123', 'sheets', 'hubspot', 'lifecycle_stage',
  'lead', 'customer', 'sheets_modified_2hr_after_hubspot',
  NOW()
);

When someone asks 3 weeks later "why did this field flip?", you have the answer.

Step 4 — Audit-before-write (the discipline that matters)

Every intended change goes to the audit log BEFORE the API call. Then the API call runs. Then update the audit row with the API response.

Why this order:

- API call fails → audit row exists, marked pending. You can replay.
- API call succeeds but response lost (network drop) → audit row shows pending, you reconcile on next run.
- API call succeeds → audit row updated with success + the actual response payload.

This pattern eliminates "did this write actually happen?" forever.

Step 5 — Drift detector (separate weekly workflow)

Even with the above, drift sneaks in (manual HubSpot edits during n8n's polling window, race conditions). A separate
weekly job:

1. Fetches ALL contacts from both sides
2. Joins on _key
3. Counts mismatches
4. If mismatch rate > 2% → Telegram/Slack alert

This is your canary. If it fires, the sync logic itself has a bug — don't trust last-modified-wins until you've
debugged.

Gotchas I hit on the Sheets node specifically

A few that ate me a full day each:

- Case sensitivity: the Sheets node filter on a header named Email will NOT match a column named email. Match column names exactly.
- Editor schema-wipe: opening the Sheets node UI after a working run sometimes wipes the resolved schema. Re-execute the upstream node before saving.
- Deceptive success: true: the Sheets node returns success: true even when 0 rows are written, if the sheet is owned by a different Google account than the OAuth credential. Always verify row count, not just status.
- Drive API enablement: enabling the Sheets API in Google Cloud Console is not enough — the Drive API must also be enabled on the same project. Cryptic 403s if not.

Outcomes from one client (6 months in)

- ~2 hrs/week of manual data entry: gone
- Drift "constant" → 0
- Renewal miss-rate: 1/quarter → 0
- Weekly drift report became a trust-building artifact ("look, the sync is provably working")

If you want the workflow JSON

I'm not posting the production JSON — the audit-table schema is client-specific and I'd rather you understand the pieces than copy-paste a black box. Happy to walk anyone through the wiring in a reply or DM.

If you want this kind of thing built end-to-end for your team (production discipline, audit logs, drift detection, fixed pricing): noorflows.com

— Syed