Help debugging n8n + Airtable candidate/job matching workflow (embeddings + must-have gates)


Goal
I’m building a candidate–job matching workflow in n8n. Airtable is the source of truth for both candidates and job openings. The pipeline:

  1. Build a textual summary of the job

  2. Get a job embedding (Google Gemini text-embedding-004)

  3. Pre-filter candidates by cosine similarity against their stored embeddings

  4. Fetch detailed candidate↔skill and candidate↔language rows

  5. Score with gates (MustHave/NiceToHave, languages, etc.), then write matches to an Airtable “Matching” table

The candidate embeddings are produced in a separate n8n workflow and are working fine.

Stack / versions

  • n8n Cloud 1.109.2

  • Airtable node v2.1, Code node v2, HTTP Request v4

  • Embeddings: Google Gemini text-embedding-004

  • DB: Airtable (multiple linked tables + lookups)


What works

  • Job text → Gemini embedding

  • Candidate listing with {Statut}='Published' and non-empty “Embedding Text (candidate)”

  • Separate embedding generation workflow for candidates

What’s failing

I’m not getting any final matches out of the scoring step. Typical output:

[
  {
    "top": [],
    "toCreate": [],
    "_diag": {
      "jobId": "recHAwZDxnT1WvG9Q",
      "cands": 0,
      "reqSkills": 0,
      "reqLangs": 0,
      "cSkillsRows": 0,
      "cLangRows": 0,
      "topK": 10,
      "results": 0,
      "top": 0
    }
  }
]

I’ve also hit a few errors while iterating:

  1. Airtable filter formula

    • The formula for filtering records is invalid: Unknown field names: candidate record ids
      (Fixed by referencing the correct lookup field name and chunking ORs.)
  2. Item access

    • No data found for item-index: "1" on a node that expected $item(0) / single execution. (Happens when a downstream node evaluates an expression per-item while the upstream only emitted a single item.)
  3. Code node control flow

    • SyntaxError: Illegal continue statement: no surrounding iteration statement in a Code node (“Rank by cosine”) due to a continue inside a try/catch that wasn’t actually inside the for loop scope n8n uses.

Workflow outline (high level)

  • Webhook → Set Inputs (job_id, top_k, retrieval_k)

  • Airtable — Get Job (fetch job record)

  • Airtable — List Job↔Skill and List Job↔Langues (linked rows)

  • Build Job Text → HTTP — Gemini Embedding (job)

  • Airtable — List Candidates (with embedding) (published + non-empty embedding)

  • Code — Rank by cosine (score + build vector ID list + global candidate filter)

  • Code — Build chunked formulas (split OR formula into ≤30-ID chunks for Airtable)

  • Airtable — List Candidates (filtered) + List Candidate↔Skill + List Candidate↔Language

  • Merge → Function — Score & Collect TopK → (optional paraphrase) → Build Upserts → Create in Airtable


What I suspect and where I’d love guidance

  1. Per-item vs run-once semantics in Code nodes

    • Best practice for iterating over $items() and safely early-skipping invalid inputs (instead of continue)?

    • Reliable pattern to emit one item with a { filterFormula, vector_ids } payload for downstream Airtable search?

  2. Chunked Airtable filtering

    • Recommended way to safely construct filterByFormula with large OR(...) lists (I’m chunking by ~30 IDs and using a candidate lookup field like Candidate Record IDs, then FIND(id, ARRAYJOIN({Candidate Record IDs}))>0)?

    • Any better pattern you prefer for linking “Candidats ↔ compétences/langues” rows back to candidate IDs?

  3. Merging multi-run outputs

    • Patterns for reading all runs of upstream Airtable list nodes (e.g., getAllFromNode + expandRecords) to avoid losing items across batches?
  4. Defensive field access

    • Sanity checks/fallbacks to avoid undefined when a field or lookup is missing (especially with multilingual field labels).

What I can share

I can provide:

  • Full (current) workflow JSON

  • The exact code in these nodes:

    • Code — Rank by cosine

    • Code — Build chunked formulas

    • Function — Score & Collect TopK

  • Raw filterByFormula strings used in the 3 Airtable list nodes

  • Screenshots of node configs

  • Minimal JSON samples (1 job, 2–3 candidates with the Embedding Text (candidate) stringified vector, a few Job↔Skill/Job↔Lang rows)

  • Full error logs/stack traces


Questions to the community

  • Do you have a canonical snippet for a Code node that:

    • Iterates candidates from $items(),

    • Robustly parses a JSON string embedding field,

    • Emits a single { filterFormula, vector_ids } item for the next Airtable search,

    • And never uses continue/break in a way that conflicts with n8n’s per-item execution?

  • Is the chunked OR filter approach with ARRAYJOIN({Candidate Record IDs}) the best way to keep formulas short and map back to candidate rows in the junction tables?

  • Any recommended Merge strategy to reliably combine:

    • List Candidates (filtered) (possibly multiple runs due to chunking),

    • List Candidate↔Skill,

    • List Candidate↔Language,
      so the scoring node can see all rows?

  • Debug tips you use for these patterns (e.g., forcing alwaysOutputData, using $item(0) vs $json, or a utility to inspect runs)?


If it helps, I can post the current snippets and a minimal dataset. Thanks a lot for any pointers or best practices!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.