Some solutions were proposed here to extract docx to XML and after some experimentation, I wanted to share another solution how to extract XML easily from docx.
The following code in a function node cuts all the parts out that do not concern the XML Code of the docx code.
It works through 2 code nodes:
Code node 1:
{{$json["data"].replace(/^[\s\S]*?(<w:document[\s\S]*)$/, '$1')}}
The output of the node then starts with <w:document and still ends with gibberish.
Code node 2:
{{$json.text.match(/<w:document[\s\S]*?(?=<\/w:document>)/)[0]+"</w:document>"}}
This removes the gibberish at the end of the output.
If needed you can use a third node to make it a XML binary however as you then already have the XML code itself, this is not needed.
Not the most elegant solution i agree, but maybe this helps somebody.