How to provide large data to AI Agent (via file ?)

Describe the problem/error/question

I have a big list of post and analysis that I would like to pass trough LLM. Since the post list and job list is long, I would like to upload those as files, and then analyze the files using an agent, so that the initial prompt do not eat all the tokens and the response can be quite longer. Seeing some exemple on community node and on this forum I made a workflow where I upload a file to gemini than use the url with a document tools to read the file. Also the issue say it cant properly get the csv from URL or that the csv is not enough cause apparently it summerize the content (does it?) In wwehich case, how would you suggest doing this flow ?

What is the error message (if any)?

The issue I have is that the file is not always read. Lets assumne I have 500 post and split them in 100 into a loop node. sometimes the AI agent will do the analysis sometimes it will say he can’t read the files.

Please share your workflow (simplified)

Right here I am passing only 2 post with 1 job but usually i have over 50 jobs and over a thousand posts.

Share the output returned by the last node

sometimes it work, sometimes it does this, yet it should always work or always fail I dont get why the same prompt has various output.

I am unable to access the raw CSV content from the provided URLs directly through the DOCUMENT tool. The DOCUMENT tool provides a textual summary and analysis of the documents, not the raw data required for a programmatic per-post,

Information on your n8n setup

  • n8n version: 2.1.4
  • Database (default: SQLite): default
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default
  • Running n8n via (Docker, npm, n8n cloud, desktop app): cloud
  • Operating system: windows

An LLM has a limited context window. This limit applies regardless of whether the information is sent as text or a file. When you send a file in a chat tool like ChatGPT, it reads portions of that file. The file may also be stored in a vector database for retrieval using the RAG (Retrieval-Augmented Generation) method. That’s what the file analysis tool is doing. It’s reading portions of the big file and providing a summary.

So if the files are too big, you won’t be able to make the LLM read the whole thing.

Feed the file contents directly into the prompt as text. If the file is too large, find a way to shorten it by summarizing, removing extra characters, or reducing the amount of data.

Thanks for the reply.

The issue is the data is as stated a list of post with post data in json format, and I need to analyze each post. So there is not really a why to shorten the data a part from batch. I could run the LLM for each post (so like 700 times) but this would take a lot of time. If I batch tho, I still currently need relatively small batch or it will overflow, this is why I am trying to find/understand if there is alternative