How to best handle long-running HTTP Requests and large JSON payloads from ScholarAPI.net in a RAG workflow?

Hi n8n Community,

I am currently building an automated research ingestion pipeline using n8n to power an academic RAG (Retrieval-Augmented Generation) system.

The Workflow Goal:

The automation workflow triggers when a new research topic is logged. It calls an academic data infrastructure engine called ScholarAPI.net to pull full-text academic data and highly detailed citation metadata. Once the JSON payload is retrieved, n8n passes the data to an embedding node and pushes it into a Vector Store.

The Challenge:

When making the HTTP Request node call to ScholarAPI.net, the JSON response containing the full text of multiple scientific papers can be quite massive (sometimes several megabytes of clean, structured text per batch).

I am running into two specific architectural questions to keep the workflow optimized:

  1. Handling Timeouts / Execution Limits: For large batches of full-text data, the upstream API processing might take a few moments. What is the best practice in n8n for setting up resilient retry profiles or extending timeouts on the HTTP Request node so the workflow doesn’t fail prematurely?

2.Memory/Data Splitting: Processing huge nested JSON payloads directly in a single execution thread causes heavy memory usage. Should I use the “Item Lists” node to split the incoming text arrays immediately upon receiving it from ScholarAPI.net, or is a custom Code Node (JavaScript/Python) more efficient for chunking text strings before sending them to vector embeddings?

Would appreciate any advice or workflow templates from anyone who has built large-scale text ingestion/scraping pipelines inside n8n!

Thanks in advance.

@Asgef_Sha for timeouts, the HTTP Request node has a Timeout field under Options, bump it for the slow calls, and turn on Retry On Fail in Settings with Max Tries and Wait Between Tries. gotcha, retry only fires if On Error is Stop Workflow, set it to a Continue option and n8n ignores the retry counts.

for the payloads, dont hand-chunk with Item Lists or a Code node, n8n has a Recursive Character Text Splitter node for exactly this (Chunk Size + Chunk Overlap) feeding the Default Data Loader into your vector store. Split Out the papers array first so each one flows through on its own instead of holding the whole multi-mb blob in one execution.

Welcome @Asgef_Sha! achamm’s advice on the text splitter is exactly right. One extra pattern for multi-paper batches: after getting the HTTP response, use a SplitInBatches node to process papers in groups of 5-10 at a time, then feed each batch through the text splitter and vector store insertion. This prevents a single large payload from sitting in memory during embedding, which can cause the execution to stall even with timeout extended. Also worth setting the HTTP Request timeout to at least 120 seconds for ScholarAPI’s full-text calls since their server-side aggregation can be slow on large queries.

For this kind of RAG ingestion, I would not keep the whole thing in one long synchronous execution. Store the raw ScholarAPI response first, split the payload into manageable chunks, then process/embed in smaller steps with retries and checkpoints. Large JSON plus long HTTP calls is where boring reliability beats a clever all-in-one workflow. Are you able to save the raw response somewhere before the embedding step starts?

Good answers above already. The text splitter plus Split Out plus SplitInBatches covers the chunking, and OMGItsDerek is right that you want to persist the raw response before embedding. I would build on that last point, because for this kind of ingestion the architecture matters more than any single node setting.

A few things that have saved me on large text-ingestion pipelines:

  1. Split it into two workflows, not one. Workflow A just calls ScholarAPI and writes each paper to a table or object store with status “fetched”. Workflow B picks up rows that are “fetched”, chunks, embeds, and flips them to “embedded”. The expensive API pull and the slow embedding then fail independently. If embedding dies on paper 40 of 50, you never re-pull the batch, B just resumes the unfinished rows. That status column is your checkpoint.

  2. Make it idempotent. Key each paper by a stable id (DOI or ScholarAPI id) and check the store before embedding. Retries and reruns are inevitable on long jobs, and without a dedupe key you pay to embed the same paper twice and end up with duplicate vectors that pollute retrieval. Cheapest reliability win there is.

  3. Fix the size at the source if you can. If ScholarAPI supports paging or a per-paper fetch, pull smaller pages rather than one multi-mb batch. The most reliable way to not run out of memory on a huge JSON is to never hold it as a single item in the first place. Splitting Out early helps, smaller upstream calls help more.

  4. One small clarification on the retry point above: Retry On Fail will retry the node regardless. The On Error setting decides what happens after retries are exhausted, Stop Workflow halts, Continue passes the failed item downstream. For an ingestion job I keep retries on, On Error set to continue (using error output), and route failures to a dead-letter table so one bad paper does not sink the whole run.

  5. If you are self-hosted and payloads are genuinely large, two env levers help: raise Node’s heap with NODE_OPTIONS=–max-old-space-size, and turn down execution data retention so multi-mb runs do not bloat your database.

Net, the boring producer-consumer split with a status column and a dedupe key will get you further than tuning timeouts on one big execution. Happy to expand on any of these.

Hi :waving_hand:
I think I get what you mean — the issue here is not just timeout or batching, but the fact that n8n is handling a very large JSON payload inside a single execution context.

For cases like ScholarAPI returning huge structured text, the real bottleneck is usually execution memory + node chaining, not just HTTP limits.

I’ve seen similar behavior where splitting or “async-ing” the flow alone doesn’t fully solve it because the data is still held in memory during the execution lifecycle.

Curious if you’ve tried isolating the ingestion step completely from the transformation step (not just splitting inside the same workflow)?

Two concrete n8n settings to fix this: first, in the HTTP Request node > Options > Timeout, set it to 300000 (5 min) so the call doesn’t cut out mid-response on large payloads. Second, if ScholarAPI returns an array of papers, pipe the output through a Split In Batches node set to batch size 1 before the embedding step - this keeps only one paper in the execution context at a time instead of all of them. For fully isolating the memory scope like @Bella123 suggested, call a separate sub-workflow via Execute Workflow for the embed + store step. That way each paper’s text gets garbage-collected after the sub-workflow returns rather than accumulating in the parent execution.

I would split this into ingestion, normalization, and embedding rather than trying to make one big workflow handle everything at once.

For this kind of ScholarAPI → RAG path, the risky part is usually not just the long HTTP request. It is that a huge payload can fail at several different boundaries:

  1. fetch boundary
    The HTTP call may timeout or return too much data for one execution to comfortably hold.

  2. normalization boundary
    You need to decide what one “document” means before chunking: paper, abstract, section, citation block, or author metadata.

  3. chunking boundary
    The embedding step should receive predictable units, not raw API payloads.

  4. retry boundary
    If one record fails, you do not want to re-fetch and re-embed the whole result set.

The pattern I would use:

  • request a page/batch of records;
  • immediately write raw response metadata somewhere durable;
  • split each paper into a normalized item;
  • store an item id + processing state;
  • run chunking/embedding as a second workflow over pending items;
  • mark each item as fetched, normalized, chunked, embedded, failed, or skipped.

That gives you restartability. It also makes the RAG quality easier to debug because you can inspect which stage produced bad chunks.

One non-sensitive question: does ScholarAPI let you page or filter results by date/query, or are you receiving one large response that must be split after the request?