Guardrails node
The idea is:
Add info.reason description to LLM output when flagging a violation in the Guardrails node.
My use case:
I’m running evaluation messages through a complex workflow and it can be difficult to determine exactly why some seemingly benign messages get flagged as NSFW or jailbreak. Being able to log the reasoning behind the flag would be a great help in troubleshooting and refining the guardrails prompts.
I think it would be beneficial to add this because:
Having the LLM return a reason when flagging a message as a guardrails violation would greatly aid in determining the exact cause of the flag.