Architectural Advice: Migrating a complex AI Medical Citation Pipeline to n8n

Hi everyone,

I am migrating a high-precision medical research workflow from Opal to n8n. The primary goal is to extract clinical claims from a manuscript, find peer-reviewed references via AI agents, verify statistical data, and re-integrate everything into a final document.

The Technical Challenge: The workflow processes up to 30 claims at once. To prevent AI timeouts and ensure accuracy, I’ve split the process into several stages:

  1. Extraction: Scans a draft and produces a strictly numbered list of 30 claims (including [None] for empty slots).

  2. Parallel Search: Six separate AI Agents handle specific ranges (1-5, 6-10, 11-15, etc.) to keep the context window manageable and avoid tool-call crashes.

  3. Validation: Dual-stage verification for statistical consistency (p values, aHR, etc.) and source credibility.

  4. Judicial Gate: Deduplicates references (using DOIs) and maps original claims to unique citation IDs.

My questions for the n8n community:

  1. Index Preservation: What is the most reliable way to pass a 30-item array through parallel AI Agent nodes and merge them back while strictly maintaining the 1-30 order?

  2. Conditional Execution: How can I prevent an AI Agent from calling a tool (e.g., Google Search) when it receives a [None] input for a specific index?

  3. Document Assembly: Any tips on re-assembling the final text into a Google Doc while preserving specific formatting (like italics for statistical markers)?

I have attached a condensed version of my Opal JSON below (redundant search nodes removed for brevity). I would love to hear which n8n nodes (Split in Batches, Loop, or Merge) you would recommend for this architecture.

JSON

{
  "title": "Reference finder (Condensed)",
  "nodes": [
    { "id": "User_Input", "metadata": { "title": "User_Draft_Input" } },
    { "id": "Identify_Claims", "metadata": { "title": "Identify_Citation_Points (30 lines max)" } },
    { "id": "Search_Node_1_5", "metadata": { "title": "Search Group 1 (1-5)" }, "configuration": { "generation-mode": "agent" } },
    { "id": "Search_Node_Template", "metadata": { "title": "Search Group Template (Repeated for 6-30)" } },
    { "id": "Verifier_A", "metadata": { "title": "Fact-Checker" } },
    { "id": "Judicial_Gate", "metadata": { "title": "Deduplication & Mapping" } },
    { "id": "RIS_Export", "metadata": { "title": "Bibliography RIS Export" } }
  ],
  "edges": [
    { "from": "Identify_Claims", "to": "Search_Nodes" },
    { "from": "Search_Nodes", "to": "Verifiers" },
    { "from": "Judicial_Gate", "to": "Final_Assembly" }
  ]
}

Hey! Looks like your post got cut off mid-sentence — can you repost the full description of your pipeline stages and what specific part you’re stuck on with the n8n migration?

Hi Achamm, thanks for the incredibly fast response!

Sorry about the cut-off. Here is the full description of my pipeline and where I’m specifically struggling with the migration:

Full Pipeline Stages:

  1. Extraction (Identify_Citation_Points): Scans a medical manuscript to find up to 30 clinical/statistical claims. It outputs a strictly numbered 30-line list. If there are fewer than 30, it pads the list with [None].

  2. Query Generation: Converts each claim into an optimized search query.

  3. Parallel Search Nodes: Because processing 30 items in one agent often leads to timeouts or tool-call errors, I split them into 6 parallel AI Agent nodes (Group 1: 1-5, Group 2: 6-10, etc.). Each agent uses a Google Search tool.

  4. Multi-Stage Verification: - Verifier A (Fact-Checker): Compares the draft’s statistical data (p value, beta, aHR) against the search results.

    • Verifier B (Format Auditor): Checks journal credibility and DOI integrity.
  5. Final Judicial Gate: Resolves conflicts between verifiers and deduplicates references based on DOI.

  6. Final Assembly: Re-inserts the verified citation numbers into the original manuscript.

Where I’m stuck:

  • Parallel Sync & Merge: How do I trigger these 6 parallel Search Agents and merge their outputs back into a single, perfectly ordered 1-30 array for the Verification stage?

  • Conditional Tool Calling: How can I tell an AI Agent node to “skip” the search tool if the input for a specific index is [None]?

  • Formatting Preservation: I need to maintain strict Lancet/NEJM formatting (italics for statistical markers like p values, and absolutely no bolding or LaTeX) during the JSON-to-Document transition.

I’m attaching a screenshot of my current Opal workflow to help visualize the logic. Any guidance on the best n8n node architecture for this would be a lifesaver!

Hi Benjamin, thank you for the expert insights!

The index-tagging strategy is brilliant and exactly what I needed to ensure data integrity for the 30-item mapping. Regarding the Google Docs API: since strict italics for statistical markers (p value, beta, etc.) are a non-negotiable requirement for this medical journal (Lancet style), I will take your advice and use the HTTP Request node with batchUpdate.

Could you provide a brief example of how that batchUpdate payload should look for inserting text with specific italics? I want to make sure I don’t break the formatting during the final assembly.

Also, for the IF node logic, would you recommend routing the [None] inputs to a separate ‘Wait/Merge’ path to keep the array length consistent at 30?

For the italics thing you’d hit https://docs.googleapis.com/v1/documents/{{docId}}:batchUpdate with an HTTP Request node, payload looks something like this

{
  "requests": [
    {
      "updateTextStyle": {
        "range": {
          "startIndex": 10,
          "endIndex": 17
        },
        "textStyle": {
          "italic": true
        },
        "fields": "italic"
      }
    }
  ]
}

You just need to track the start/end index of each p value or aHR marker when you assemble the doc text, then loop through and fire off one batchUpdate with all the ranges at once

Hi @Benjamin_Behrens, the index tagging strategy sounds like the perfect solution for my data integrity concerns.

Could you provide a small Code node snippet for adding that numeric index field to each claim? I want to make sure I’m doing it the ‘n8n-native’ way before splitting the items.

Also, regarding the 30 parallel agents: I’m a bit worried about the workflow looking like ‘spaghetti’ if I draw 30 separate lines. Is there a more elegant way to handle this batch of 30 while still allowing the agents to run in parallel?

Reply to achamm:

Hi @achamm, thank you for the batchUpdate example! This is exactly what I need for Lancet-style formatting.

My biggest hurdle now is the character position tracking. Since the final manuscript is assembled from 30 dynamic slots, calculating the exact startIndex and endIndex for every p value or aHR marker seems daunting.

Do you have a recommended logic or a Code node snippet that can calculate these positions as the text is being concatenated? I need to fire off one clean batchUpdate at the end to ensure all statistical markers are correctly italicized.