Extract from XML - DOC & DOCX

Hello everyone!

I am facing a situation where the I am using Extract from XML to get the text of a DOC or DOCX file.

The node works as expected but I am getting some strange text in the beginning and the end of the file.

Does anyone have any guidance on this

Thanks…

It looks like your topic is missing some important information. Could you provide the following if applicable.

  • n8n version:
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app):
  • Operating system:

Hello @AliFarahat ,

Thank you for bring to the community attention the issue you are facing.

I don’t aim to give you the definitive answer but assit in find the right paths forward.
After, checking the issue, it seems to me that the main matter is the fact Extract XML aims mostly to parse XML file and not convert DOC/DOCX into XML.
With that I believe you would need to conver the Doc file into DOCX (giving you are using DOC).
Once you have the DOCX, then essentially you have a XML file that is zipped. So you would need to unzip if and then parse it as XML.

In a nutshell you will need to ensure the following steps:

  • Execute Command Node: Convert .doc to .docx (if necessary).
  • Read Binary File Node: Read the .docx or .doc file.
  • Execute Command Node: Unzip the .docx file to access XML content.
  • Read Binary File Node: Read the specific XML file within the unzipped content.
  • Extract from XML Node: Parse the XML to work with it in n8n.

This would require you to access file in your disk. Also, to convert DOC to DOCX you would need some tool/command like pandoc Pandoc - Pandoc User’s Guide.

Also, I am not sure if the n8n node Comprssion would do the trick for unzip the DOCX file. If it does not then you would need to use command from the OS using unzip command for example.

I hope this gives some clarity and direction for you. If not, please share further insights so we can check and advise accordingly.

Happy n8n workflow design!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.