Best practices for parsing massive JSON payloads from ScholarAPI into HTTP Request nodes?

Hi community,
I am currently building an n8n workflow that automates academic research ingestion for a vector database. The goal is to pull scholarly articles and citation data based on specific search triggers.
Initially, I tried building an automation workflow using basic scrapers via the HTTP Request node, but dealing with rotating proxies, structural changes in HTML, and sudden CAPTCHAs made the workflow constantly fail and time out.
To fix this, I am switching to a structured data pipeline. I am testing a setup where I pull clean, structured JSON payloads directly into n8n using an infrastructure like ScholarAPI. However, the academic JSON arrays can get quite large (containing deep article metadata, abstract logs, and PDF URLs).

Describe the problem/error/question

What is the error message (if any)?

Please share your workflow

I wanted to ask if anyone has optimized a similar data ingestion workflow:

  1. Is it better to handle the massive JSON payload scaling inside an Execute Workflow / Sub-workflow architecture to prevent memory overhead on the main n8n instance?

  2. Share the output returned by the last node

What is your preferred way to loop through deep arrays in n8n—relying heavily on the built-in Loop Node, or executing a clean, custom Code Node (JavaScript) to map variables directly to the subsequent HTTP nodes?
Would love to hear some architectural tips from anyone running heavy data workflows here!

Information on your n8n setup

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hi @Stream_On, welcome!
The best one is, I think, if your payload can be divided into sub-workflows and processed and cleared each run of its iteration. I think that would be the best approach; basically, as long as your sub-workflow does all the lifting in batches and gives final output each time, it would always work. Also, the best practice is to set this variable N8N_DEFAULT_BINARY_DATA_MODE=filesystem so that the memory doesn’t get piled up and exhaust the flow.