We can enhance the Agent nodes to include a feature for anonymizing sensitive information before interacting with an LLM (Large Language Model). This would provide built-in support for anonymization, ensuring that personal or critical data do not expose sensitive data such as phone numbers, names, or other identifiable information.
My use case:
I work with personal documents that need processing via an AI agent on n8n. Currently, I have to rely on the LangChain code and external tool like Presidio to anonymize sensitive information. Integrating this functionality directly into the Agent node would streamline my workflow and ensure better privacy for consumers that rely on public LLM services.
I think it would be beneficial to add this because:
It enhances data privacy by minimizing the risk of sensitive information leakage.
Yes, I am willing to collaborate by developing and/or testing the feature. However, I may need assistance with the development process because I’m not a JS developer (I’m already trying to implement this feature, but the code will probably need some improvements).
Currently no, I can’t use the Presidio API as it is because the de-anonymization API requires the input of an analysis that isn’t really effective after using the encryption mechanism (the regular expression-based rule can’t work). And since n8n can’t currently execute LangChain code in python, I’ve put that part aside while I create my own API for this use case.
When it’s all done, I’ll put a link here with the API + workflow.
Hi, I didn’t have so much time to work on my spare time, but I’m making the proxy to handle this case more easily (and to be compatible with other LLM tools). I’ll post here when it will be public
I am also interested in this feature, as a cybersecurity engineer, I want to develop AI bots that can help us doing investigations and access control reviews.
It is interesting that, at least, for inputs, outputs and tools in the AI agent node, you can provide a list of words to anonymize (mainly, usernames, names, surnames, etc) so when an input or tool gives sensitive data from this list, the data is anonimized before being fed to the AI agent, then when the agent wans to give an output or use a tool, if there is anonymized data inside, the data could be deanonymized again so that the output and the input for the tools have real deanonymized data.