How to overcome 4096 token output limit in GPT (or any LLM model) ? (Ideas, open talk...)

Hello there!
I just started playing with LLM nodes and I realized about the 4096 token output limitation, which is quiet frustrating if you need no elaborate long contents like articles.
I played around the chunking feature in some of these nodes, but If I correctly understood, this is more to process large amount of data in input, not output.

Then, I came across the idea to split the scopes in more than one LLM node… But maybe there’s a better way to do the same.

What I mean with this? Pretty simple tho, like putting 3 LLM basic nodes each one with it’s own part to do in the article, but with the same input idea:

  1. Input → Write a pizza recipe
  2. First LLM → Write Introduction and ingredients needed for *Input
  3. Second LLM → Write just the second part of the article about *Input starting from here: *First LLM output
  4. Third LLM → Write the conclusion for the article about *Input from here: *First LLM output + *Second LLM output
  5. Merge node.

Am I crazy or this sounds to be a good idea? :sweat_smile:

Hey @GBOL,

Can you share the workflow you are using so we can have a play? I think the output may change depending on the model you are using.

Hey Jon, thanks for your interest.
However, I ended up in doing this out of n8n, not for the platform, but more due to the unpredictable responses from GPT which can cause more headache then results, and expenses in calling the API.
While the solution proposed would be interesting to implement, I preferred to do this specific task on a custom GPT because I need to adapt the responses, so an automation is difficult in this moment.
I’ll try to do the same using the assistant API when I’ll better define the flow I need.
But yeah, it’s definitely doable in n8n, of course :hugs:

1 Like