AI Security Shield: PII Redaction + Prompt Injection Test Harness

If you’re building AI workflows that process customer messages, form submissions, or any user-generated text, you’re one crafted input away from leaking data, exposing your system prompt, or letting someone hijack your agent.

This workflow gives you a working security layer you can test in 5 minutes.

What it does

  • Scans text for PII (emails, phone numbers) and redacts them with safe placeholders
  • Detects prompt injection attempts using keyword analysis, structural pattern matching, and heuristic scoring — no LLM call, no LLM latency, no per-scan AI cost
  • Includes 10 real adversarial attack tests (one from each category) with pass/fail results shown directly in the workflow output — no credentials needed to run
  • Returns a clear decision: allow, review, or block with reason codes

Set up in under 5 minutes — import the workflow, click “Execute Workflow,” and see the shield catch a sample attack. Then swap in your own test inputs.

Built for production, not demos

  • Deterministic checks — same input always produces the same decision
  • No external dependencies — runs entirely inside n8n Code nodes
  • No credentials required for the demo — just import and click
  • fail_closed by default — if something unexpected happens, it blocks rather than allows

10 attack categories tested:
direct injection, indirect injection, system prompt extraction, secret exfiltration, PII leakage, role impersonation, encoding obfuscation, hidden instructions, suspicious URLs, business scope bypass

Need more? The Pro version adds:

  • Output validation (catches your AI leaking data, instructions, or fabricating entities)
  • Hallucination risk flags (canary verification, source grounding, contradiction detection)
  • 66 adversarial attack tests with an auto-runner workflow and pass/fail report
  • Google Sheets dead-letter logging (audit trail you can share with clients)
  • Telegram alerting on blocked/suspicious inputs
  • Configurable policy: per-PII-type actions, injection thresholds, business scope rules
  • Client-facing security summary document
  • Threat model documentation
    AI Security Shield Pro — n8n Workflow Template

Lite: 10/10 built-in tests pass. Pro regression suite: 66/66 verified. No credentials in the Lite workflow file.

Download workflow JSON (GitHub Gist) FREE

1 Like

The review state is the most useful addition here - it’s easy to overlook but wiring that branch to a human-in-the-loop step (a Wait node + webhook resume, or just a Slack/Telegram notification with approve/deny buttons) turns this from a detection layer into a full moderation pipeline. One thing I’d also add: a rate-limit check at the entry point - repeated block decisions from the same user ID in a short window is a strong signal of active probing, worth logging separately from one-off blocks.

Two things worth adding to this pattern:

Audit log to Google Sheets. Pipe every block and review decision to a Sheets append row — timestamp, user ID, decision, matched pattern, raw score. Two reasons: (1) compliance audits want a tamper-evident record outside the app, and (2) after a week of real traffic you can tune your thresholds with actual data instead of guessing. The Google Sheets node makes this a 2-minute addition.

Allowlist bypass for trusted callers. If your workflow is also called by internal services or known integrations, add an early Check step before the PII/injection scanner: if the X-Internal-Token header matches a hashed value in an n8n Variable, skip to the main logic. This prevents your own tooling from getting blocked and keeps the security layer focused on untrusted input only.

Combined, the full flow becomes: trusted-caller check → PII redact → injection score → allow/review/block → audit log. Each step is a separate sub-workflow so you can test and update them independently.