Problem in extracting HTML from docx file using mammoth

Hello everyone,

I’m currently developing a workflow that extracts data from a .docx file. Previously, I was using Mammoth to convert the .docx file into HTML, and it worked well—I was able to successfully extract all the content from the file.

However, after updating n8n to version 2.0.3, I configured Mammoth as an environment dependency when running n8n. While it seems that Mammoth is now available within the workflow, the code that previously worked for extracting HTML from the .docx file no longer behaves as expected. Instead, it returns an error similar to the one shown in the image.

Has anyone encountered a similar issue after updating n8n? I would really appreciate any guidance or suggestions on how to resolve this.

Thank you very much for your time and support!

Sir try to adjust your code:

If you’re sticking with Mammoth directly:

Javascript:

const mammoth = require(“mammoth”);

// Ensure binary data is converted properly
const binaryData = items[0].binary.data;
const buffer = Buffer.from(binaryData.data, “base64”);

return mammoth.extractRawText({ buffer })
.then(result => {
return [{ json: { html: result.value } }];
});

  • Use Buffer.from(..., "base64") instead of passing raw binary.

  • Make sure you’re accessing the correct binary property (items[0].binary.data may differ depending on your workflow setup).

2. Use a Community Node

These nodes are designed for n8n v2+ and avoid the manual buffer conversion issues.

3. Debug Encoding

  • Errors often stem from encoding mismatches (UTF-8 vs UTF-16).

  • If you see gibberish or corrupted output, explicitly set encoding when converting buffers.

1 Like

@jeimuzu18 Thank you so much! I have tried all your approaches but the Docx Extractor and the Convert File to Json node return the errors shown in the images. I have also tried your recommendations to change my code when using mammoth but they still return the same error. I think the main problem here is likely the Docx file format (i tried different .docx files but nothing changed)