I built a community node to extract text from legacy .doc files — n8n-nodes-word-extractor

I hit a wall building a document automation pipeline.

n8n handles PDFs. It handles .docx. But legacy .doc files? Nothing. No node. No workaround. Just a gap.

So instead of waiting, I built the node myself from scratch in TypeScript.

What I fought through:

  • A class name collision bug
  • CommonJS imports
  • The n8n node publishing pipeline end to end

The result: n8n-nodes-word-extractor

Handles both .doc and .docx. Zero system dependencies. Works on Windows.

How to install:
Settings → Community Nodes → Install → n8n-nodes-word-extractor

Built this while working on a real recruitment automation system. It’s a small thing but it solves a real problem.

Happy to answer any questions or hear feedback from the community.

nice, .doc files are a surprisingly common roadblock in document automation. what was the class name collision about exactly, conflict with n8n’s own internals or something in the word parsing library? and does it handle password-protected .doc files, or does it bail out with a clear error at least?