Different solution for docx to XML conversion

bambury79 · March 21, 2025, 9:46am

Some solutions were proposed here to extract docx to XML and after some experimentation, I wanted to share another solution how to extract XML easily from docx.
The following code in a function node cuts all the parts out that do not concern the XML Code of the docx code.
It works through 2 code nodes:

Code node 1:

{{$json["data"].replace(/^[\s\S]*?(<w:document[\s\S]*)$/, '$1')}}

The output of the node then starts with <w:document and still ends with gibberish.

Code node 2:

{{$json.text.match(/<w:document[\s\S]*?(?=<\/w:document>)/)[0]+"</w:document>"}}

This removes the gibberish at the end of the output.

If needed you can use a third node to make it a XML binary however as you then already have the XML code itself, this is not needed.

Not the most elegant solution i agree, but maybe this helps somebody.

Topic		Replies	Views
Extract from XML - DOC & DOCX Questions node , extract-from-file	2	6912	September 2, 2024
Extract from microsoft word docx. file Feature Requests data-transformation , node	8	5448	May 19, 2026
Read text from docx document Tips & Tricks	0	72	August 18, 2025
Read content of word docx file Questions read-binary-files , extract-from-file	6	8259	April 22, 2025
Problem in extracting HTML from docx file using mammoth Questions	2	267	January 7, 2026

Different solution for docx to XML conversion

Related topics