Workflow that gets a CSV of 2K rworkows and create a summary with insights and actionable items

Describe the problem/error/question

I am trying to read a CSV file of about 2000 rows, pass those rows to an AI agent block that acts as a data analyst that is able to find patterns and trends in the data set I provided and to create a summary with valuable insights and recommended action items.
The workflow works fine until a maximum of 300 rows, then it gives error. I need to be able to pass thousands of rows and get a summary that includes all the details in the CSV. If I split in batches and let the AI run on them I would get multiple reports, while I need a full picture. So I need a way to overcome the fact that I am going over the max tokens allowed per request.

What is the error message (if any)?

400 status code (no body)

Please share your workflow


In the AI agent I take the data from the previous node all at once using the JSON.stringify($json.data). The input coming from the previous node is an aggregate with the following structure "data": [ {"field1":value, "field2":value},{"field1":value, "field2":value}, ... ]

Share the output returned by the last node

No output, only the error.

Information on your n8n setup

  • n8n version: 2.3.4 Self hosted
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Operating system: macOS

Could you please share the complete workflow code? (not only the screenshot)

On which node does the error occur ?

One other point: your ‘Loop Over Items’ does nothing in this workflow as you use ‘Nothing‘ node after. Depending on a specific action you want to perform on each item, you can delete the ‘Loop Over Items’ node and directly aggregate the items into one.

1 Like

The error occurs only in the Agent AI node. Unfortunately, because of the nature of the business I work at, I am not able to share the code of the workflow. So, maybe my question is more about, how to setup a similar workflow for the purpose of analyzing thousands of rows (they are support tickets), more than fixing the one that I put together (that looks trivial and not making a good use of the loop cycle). Ideally the loop cycle could be helpful in analyzing the rows (tickets) in batches of 50 for instance, but the result should be saved somewhere to be used to contribute to the final summary. I hope that what I shared makes sense :grinning_face:

1 Like

Hi @Carmela_Greco Welcome!
If your ticket data is going to be constant and is not going to get changed everyday, please consider using a Vector Database and convert your sheet data into a vector store that way there would not be any overhead of iterating over sheets items and reaching AI Model or AI Agent limit and hallucinated output.

2 Likes

in your use case i would approach it like this :

  1. use Split in Batches node to keep your loop, and set the batch size to something manageable like 20-50 rows.
  2. inside the loop, have an AI Agent that just says “Summarize these 50 tickets. Extract key themes.” do not ask for the final report here. just get the bullet points.
  3. use a Code node or the Loop node’s append feature to collect the summarized tickets from the previous node , not the original data.
  4. after the loop finishes, you’ll have like 40 small summaries instead of 2000 raw rows. send that list to a second AI Agent to write the final report.

just make sure your prompt inside the loop explicitly asks for “high density information” something like “list specific issues, counts”. otherwise the final agent will just write a generic report , which you don’t want.

i think this architecture can scales to 10k+ rows easily.

3 Likes

The vector store approach @Anshul_Namdev mentioned is solid if you’re doing repeated queries on the same data, but for what you’re describing (generating one summary report) I’d go with what @A_A4 is suggesting — the two-pass summarization pattern. You batch the data, get mini-summaries from each batch, then feed all those mini-summaries into a final AI call that synthesizes everything into one report. The key is that your batch summaries need to be dense enough to capture the important stuff but short enough that combining 40 of them still fits in context, so you’re basically compressing 2000 rows into maybe 40 paragraphs and then asking the AI to find patterns across those paragraphs. For support tickets specifically I’d have the first pass extract things like “ticket category counts, common complaints, sentiment breakdown, notable outliers” rather than just “summarize these tickets” because that gives you more structured data to work with in the final synthesis step.

2 Likes

I separated the summarization into 2 different agents, I still get that error sometimes (maybe it depends on the AI model I am using for the node, but I was able to summarize at least 1200 rows. I am trying to improve the outcome from the first agent, making it metrics oriented (just numbers and less words), so that it would be easier for the second agent to work on the result coming from there. I will let you know, but so far it was a huge progress! Thanks!

Nice, 1200 rows is a big jump from 300. Making the first agent output just numbers and metrics is exactly the right move, the less tokens in those intermediate summaries the more batches you can stuff into that final synthesis call. If you’re still hitting the 400 error occasionally you might also try switching to a model with a bigger context window for that second agent.

1 Like

Hi everyone and thanks for your help so far! I wanted to share the progress here. I was able to structure the metrics so that the final node is summarizing numbers instead of starting from sub-summaries, this makes the final summary more reliable and validation of final numbers so far has been correct. I have only one issue. The models that I’ve used so far are Llama nodes maverick 17, because they’re the largest model I have available. I don’t like the outcome of the final node, summary with other models (Such as Claude) are better structured. When I try to switch to a smaller model, it fails, even if It’s taking only a few items with metrics (for instance 12 items with a nice and clean json structure). Why does that happen? Am I missing anything important here?

@Carmela_Greco Everything is mostly correct in your flow, but the AI model you are using is not as capable of consistently meeting targets, consider using either Gemini models from Google AI Studio or i would prefer use OpenAI’s GPT-4o by far the best one for Automation related use cases, but if you want a specific output format from the AI agent use the Output parser tool in the AI agent and give a specific output instruction JSON there.