Problem in extracting HTML from docx file using mammoth

TrinhNhatHuy · January 7, 2026, 2:24pm

Hello everyone,

I’m currently developing a workflow that extracts data from a .docx file. Previously, I was using Mammoth to convert the .docx file into HTML, and it worked well—I was able to successfully extract all the content from the file.

However, after updating n8n to version 2.0.3, I configured Mammoth as an environment dependency when running n8n. While it seems that Mammoth is now available within the workflow, the code that previously worked for extracting HTML from the .docx file no longer behaves as expected. Instead, it returns an error similar to the one shown in the image.

Has anyone encountered a similar issue after updating n8n? I would really appreciate any guidance or suggestions on how to resolve this.

Thank you very much for your time and support!

jeimuzu18 · January 7, 2026, 3:41pm

Sir try to adjust your code:

If you’re sticking with Mammoth directly:

Javascript:

const mammoth = require(“mammoth”);

// Ensure binary data is converted properly
const binaryData = items[0].binary.data;
const buffer = Buffer.from(binaryData.data, “base64”);

return mammoth.extractRawText({ buffer })
.then(result => {
return [{ json: { html: result.value } }];
});

Use Buffer.from(..., "base64") instead of passing raw binary.
Make sure you’re accessing the correct binary property (items[0].binary.data may differ depending on your workflow setup).

2. Use a Community Node

n8n-nodes-docx-extractor: Converts DOCX → HTML, Text, or Markdown using Mammoth + Turndown.
@mazix/n8n-nodes-converter-documents: More robust, supports DOCX, ODT, TXT, PDF, XLSX, with error handling and sanitization.

These nodes are designed for n8n v2+ and avoid the manual buffer conversion issues.

3. Debug Encoding

Errors often stem from encoding mismatches (UTF-8 vs UTF-16).
If you see gibberish or corrupted output, explicitly set encoding when converting buffers.

TrinhNhatHuy · January 7, 2026, 4:35pm

@jeimuzu18 Thank you so much! I have tried all your approaches but the Docx Extractor and the Convert File to Json node return the errors shown in the images. I have also tried your recommendations to change my code when using mammoth but they still return the same error. I think the main problem here is likely the Docx file format (i tried different .docx files but nothing changed)

Topic		Replies	Views
Extract from XML - DOC & DOCX Questions node , extract-from-file	2	6913	September 2, 2024
Different solution for docx to XML conversion Tips & Tricks xml , extract-from-file	0	231	March 21, 2025
Which is the best API to use in n8n? HTML to DOCX Questions	2	448	October 30, 2025
I built a community node to extract text from legacy .doc files — n8n-nodes-word-extractor Tips & Tricks community-node , extract-from-file	0	171	March 23, 2026
Extract from microsoft word docx. file Feature Requests data-transformation , node	8	5450	May 19, 2026

Problem in extracting HTML from docx file using mammoth

Related topics