How to optimize AI Agent token sized requests to stay under rate limits?

Describe the problem/error/question

I am using the AI Agent node and Claude API. 1 of my request to Claude is 140k in size which is hitting the rate limit. I want to do 2 things to improve this.

  1. Can i first remove the user prompt from every request? Below is a screenshot where you see the sequential requests it makes and it seems each request includes the user prompt which is redundant i imagine and takes up valuable tokens. Since this session has memory i think i only need the user prompt or even the system prompt to be sent 1
  2. Also referencing the screenshot below you can see i make 4 requests and all passed except for the last one. Is there a node i can use that will batch the calls?

I basically make 4 requests and the last 2 are above 140k tokens each. What options do I have?

What is the error message (if any)?

{ "status": 429, "headers": { "anthropic-organization-id": "56f9697a-4d0c-4893-8667-9b9512dbd0b7", "anthropic-ratelimit-input-tokens-limit": "40000", "anthropic-ratelimit-input-tokens-remaining": "0", "anthropic-ratelimit-input-tokens-reset": "2025-06-04T17:47:28Z", "anthropic-ratelimit-output-tokens-limit": "8000", "anthropic-ratelimit-output-tokens-remaining": "8000", "anthropic-ratelimit-output-tokens-reset": "2025-06-04T17:45:41Z", "anthropic-ratelimit-requests-limit": "50", "anthropic-ratelimit-requests-remaining": "50", "anthropic-ratelimit-requests-reset": "2025-06-04T17:45:41Z", "anthropic-ratelimit-tokens-limit": "48000", "anthropic-ratelimit-tokens-remaining": "8000", "anthropic-ratelimit-tokens-reset": "2025-06-04T17:45:41Z", "cf-cache-status": "DYNAMIC", "cf-ray": "94a93ecf9804f09a-DFW", "connection": "keep-alive", "content-length": "529", "content-type": "application/json", "date": "Wed, 04 Jun 2025 17:45:41 GMT", "request-id": "req_011CPobeKF8LnxAZnGFUqakc", "retry-after": "52", "server": "cloudflare", "strict-transport-security": "max-age=31536000; includeSubDomains; preload", "via": "1.1 google", "x-robots-tag": "none", "x-should-retry": "true" }, "request_id": "req_011CPobeKF8LnxAZnGFUqakc", "error": { "type": "error", "error": { "type": "rate_limit_error", "message": "This request would exceed the rate limit for your organization (56f9697a-4d0c-4893-8667-9b9512dbd0b7) of 40,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits. You can see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase." } }, "lc_error_code": "MODEL_RATE_LIMIT", "attemptNumber": 7, "retriesLeft": 0 }

Please share your workflow

(Select the nodes on your canvas and use the keyboard shortcuts CMD+C/CTRL+C and CMD+V/CTRL+V to copy and paste the workflow.)

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 1.94.1
  • Database (default: SQLite): SQLite
  • n8n EXECUTIONS_PROCESS setting (default: own, main): own
  • Running n8n via (Docker, npm, n8n cloud, desktop app): Docker
  • Operating system: MacOS

Hi @unaligned9094

I would start with limiting the tokens. Have you done that?

Also, if you share your workflow in a code block, we can try and look into a way to batch.

I looked into this, but I was afraid it’s going to cut my input tokens or is this cutting the output only

Also, I’m on a self hosted instance. The share button doesn’t work as it asks me to join a plan.

Hi @unaligned9094

You dont need to use the share button. Simply copy your nodes, and paste them in here. Here’s a gif on how to do that.

pasteworkflow

@unaligned9094

In addition, the max tokens wont truncate your input, it will only limit the output. You could think about using a cheaper model to summarize the input to truncate it. Upload your workflow, and I’ll see if there is any way to help.

pasteworkflow

Here is my workflow.

OK so it seems someone has hidden my post with my workflow. Waiting for approval…

@unaligned9094

did you put your code in a code block, and did it render a workflow? Sometimes it gets stuck if you’ve just posted the json without rendering a workflow.

It did render a workflow actually I saw it on the right window before I posted it.

@unaligned9094
Right now, it says deleted by author. Try to upload it again

Let me try again now that I got my trust up.

@unaligned9094

The transactions tool is loading all rows from the google sheet into the model. You’ll need to add a filter in the google sheet node.

The agent’s memory will store everything from the google sheet, and keep refering back to for every new call to claude.

It is only loading the transactions from the year to date as it passes a parameter of Year = YYYY and that filters the dataset. However the data is still about 768 rows so I have felt that it is too much for the tool. I am now trying to explore cascading another agent to take care of the summarization or maybe just use a claude api directly to do the analysis itself? How do make clustering agents? Do you have to write different sub workflows to cascade them together?

@unaligned9094

That woud be a good idea. 768 rows of data is a ton for a model. Maybe try using a separate agent with a cheaper price to summarize the 768 rows into a paragraph. Then load it in.

Another approach would be to aggregate the google sheets data with a sql query or pivot table. Its likely that the model doesnt need every single record. You can send it agregates like “Total Spend”, Spend by Type that would be in much smaller form.

if this post helps solve your issue, please mark it as the solution so others can benefit from it

Hi @unaligned9094

Was this chat able to help you solve your issues? Do you need anything else from this post?

:heart:If any response helped you, please click the heart to show that it is useful
:white_check_mark:If any of the responses solved your issue, mark it as the solution to help the community

Yea perhaps. I am first going to try it with a different sub agent as i do not want to miss anything there are just different angles I want to view this from i.e. MoM, by category, by type, by joint vs personal, etc. Therefore I wanted the modal to do these different analysis’s. I think it works better to have a dedicated agent for the summary and the root agent generates a message based on that summary.

@unaligned9094
Great. Hope that works for you. There’s a good amount of trial and error when creating good agents.

Indeed! The rate limitation is the worse part about it, but i may have to just have higher usage in claude to get on a higher tier.