How does langchain and n8n handle deterministic JSON output?

LucBerge · April 25, 2024, 9:55am

Dear all,
I have questions regarding how langchain and n8n handle deterministic JSON output for OpenAI and MistralAI.

From what I understand. OpenAI used to take a JSON schema as input parameter but it is not the case anymore !? We have to explicitly ask the LLM to output a JSON.

But how can I make sure it is deterministic ?
How does langchain and n8n handle it ?
Should I use function calling instead ?

Thanks

n8n · April 25, 2024, 9:55am

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

Shirobachi · April 25, 2024, 10:47am

Hey @LucBerge

If you tick Require Specific Output Format and add schema in Structured Output Parser subnode for example as below:

{
  "tags": [ "string" ],
  "isHomework": "boolean"
}

Then this expected schema will be append to prompt, you can check it by executing and go to logs tab:

Hope that helps;)

LucBerge · April 25, 2024, 10:49am

Hello,
Ok ! So this is not deterministic then !?

There is a small probability that the output does not match the requested format ?!

Shirobachi · April 25, 2024, 10:56am

yes, that’s possible especially for GPT 3.5, for GPT4 I did not experience that.
There is also auto fix sub node, did not use it yet, but if you need to be sure that execution will not fail that might be worth to have a look:

Also you could use retry on fail

LucBerge · April 25, 2024, 11:38am

Let say that the model gives 1% incorrect json format:

If I use auto-fixing output format, it lowers the probability without removing it : 0.01 * 0.01 = 0.0001
If I use retry on fail with max tries to 3: 0.01^3 = 0.000001

It is not deterministic.

What about using function calling for deterministic JSON output format ?

The workflow I am planning to do will process up to 10000 requests and I cannot afford a single fail… If I increase the max tries variable, it could become expensive regarding the number of requests.

Shirobachi · April 25, 2024, 1:43pm

Maybe someone else will be able to help you better. I can only say that for GPT-3.5, it sometimes generates incorrect JSON. A good prompt makes a big difference, but AI is by design not deterministic, so I think that you will not be able to make sure that the AI returns correct JSON in 100% of cases. However, you could use the autofix subnode, which might make it happen, but I don’t have experience with that

You mentioned function calling, so do you know about the AI Agent node that can gather tools and make decisions by itself which to use? I also haven’t used it much, but maybe that could help you somehow

LucBerge · April 25, 2024, 2:14pm

You mentioned function calling, so do you know about the AI Agent node that can gather tools and make decisions by itself which to use? I also haven’t used it much, but maybe that could help you somehow

Yes I have already used it but the way langchain is integrated in n8n won’t allow me to access the JSON output between LLM and the actual call of the tool in n8n. I want to have access to the raw output like:

{'aspects_and_sentiments': [
  {'aspect': 'food', 'sentiment': 'positive'},
  {'aspect': 'ambiance', 'sentiment': 'negative'},
  {'aspect': 'waiter', 'sentiment': 'positive'},
  {'aspect': 'pizza', 'sentiment': 'positive'},
  {'aspect': 'burger', 'sentiment': 'positive'},
  {'aspect': 'coke', 'sentiment': 'negative'},
  {'aspect': 'drinks', 'sentiment': 'negative'}
]}

See full article: https://sauravmodak.medium.com/openai-functions-a-guide-to-getting-structured-and-deterministic-output-from-chatgpt-building-3a0ef802a616

system · July 24, 2024, 2:14pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.