Support for anonymization data on AI node

xunleii · January 19, 2025, 2:26pm

The idea is:

We can enhance the Agent nodes to include a feature for anonymizing sensitive information before interacting with an LLM (Large Language Model). This would provide built-in support for anonymization, ensuring that personal or critical data do not expose sensitive data such as phone numbers, names, or other identifiable information.

My use case:

I work with personal documents that need processing via an AI agent on n8n. Currently, I have to rely on the LangChain code and external tool like Presidio to anonymize sensitive information. Integrating this functionality directly into the Agent node would streamline my workflow and ensure better privacy for consumers that rely on public LLM services.

I think it would be beneficial to add this because:

It enhances data privacy by minimizing the risk of sensitive information leakage.

Any resources to support this?

Presidio: Home - Microsoft Presidio
LangChain Documentation: https://docs.langchain.com/

Are you willing to work on this?

Yes, I am willing to collaborate by developing and/or testing the feature. However, I may need assistance with the development process because I’m not a JS developer (I’m already trying to implement this feature, but the code will probably need some improvements).

freddy-schuetz · January 20, 2025, 5:33am

This would be a really nice feature!

Anakin_Skywalker · January 20, 2025, 2:18pm

I double this!

Anakin_Skywalker · January 20, 2025, 2:20pm

can you share your current workflow of annonimizing PII data and recover those to be used after the LLA nodes?

xunleii · January 20, 2025, 6:23pm

Currently no, I can’t use the Presidio API as it is because the de-anonymization API requires the input of an analysis that isn’t really effective after using the encryption mechanism (the regular expression-based rule can’t work). And since n8n can’t currently execute LangChain code in python, I’ve put that part aside while I create my own API for this use case.

When it’s all done, I’ll put a link here with the API + workflow.

Anakin_Skywalker · January 23, 2025, 9:20am

thank you! i’d be interested in that

bvo · February 23, 2025, 11:17pm

This is a great idea! Do you have any updates on this process @xunleii?

xunleii · February 25, 2025, 3:23pm

Hi, I didn’t have so much time to work on my spare time, but I’m making the proxy to handle this case more easily (and to be compatible with other LLM tools). I’ll post here when it will be public

famez · October 2, 2025, 8:29am

Hi,

I am also interested in this feature, as a cybersecurity engineer, I want to develop AI bots that can help us doing investigations and access control reviews.

It is interesting that, at least, for inputs, outputs and tools in the AI agent node, you can provide a list of words to anonymize (mainly, usernames, names, surnames, etc) so when an input or tool gives sensitive data from this list, the data is anonimized before being fed to the AI agent, then when the agent wans to give an output or use a tool, if there is anonymized data inside, the data could be deanonymized again so that the output and the input for the tools have real deanonymized data.

creativion · November 7, 2025, 12:47pm

Any updates in this matter?