I’m looking for a solution to translate .docx or .pdf files into another language using OpenAI, while preserving the original formatting (headings, tables, bold text, etc.).
The ideal workflow would take a document as input, pass the content to OpenAI for translation, and output a new file with the same structure and style — just translated.
If anyone has done something similar or has tips on how to handle rich-text input/output in n8n using OpenAI, I’d appreciate any pointers!
- n8n version:
1.88.0
- Database (default: SQLite): Not using a database
- n8n EXECUTIONS_PROCESS setting (default: own, main):
main
- Running n8n via (Docker, npm, n8n cloud, desktop app):
n8n cloud
- Operating system:
Windows
Hello @Van_Tuan_Tran
Below is a suggestion to help you with your problem, see if this suggestion can help you. Adjust it according to your needs.
Import the workflow in N8N
Configure the OpenAI credentials and select the language model according to your needs, remember that the better the model, the more likely it is to process your request and succeed.
Activate the workflow
Access the generated web form
Upload the document and select the language
I hope I have helped in some way.
I hope this has helped in some way.
If this suggestion solved your problem, please mark my post as a solution (blue box with check mark) so that this ongoing discussion does not distract others who want to find the answer to the original question and click the heart. Thank you
Thanks @interss
However, I think this workflow only helps to translate and create a new file, but it does not keep the original formatting (like headings, tables, bold text, etc.).
I’m looking for a solution that can fully preserve the original structure and style after translation.
If you have any idea how to keep the formatting, I’d love to hear it! Thanks again!
Hi @Van_Tuan_Tran / @Interss,
This might be a crazy idea (there might be better ways) but what about this approach:
Step 1: Create a pdf “translation template”
- loop over the pdf document and find small chuncks of text which form an entity
- take the text, put it in a DB, assign it an id, replace the original text with the id.
- write the pdf translation template somewhere
Step 2: Perform the translation
- go over all the ids in the db and translate into the language you want.
- find the ids in the template pdf doc
- replace the ids with the text
- save the resulting pdf
This method does not guarantee that some text will fit as previously but I highly doubt any solution would allow for this. Also graphical representation of text is a corner case …
Another potential question: why a template and DB… it saves a lot of time on retries and it will be easier to translate to multiple languages
Hope it did make a bit of sense.
reg,
J.
1 Like
Hi @jcuypers
Thank you so much for the idea — honestly, it makes a lot of sense and could be very useful! 
I really appreciate you taking the time to think this through and share it with me.
That said, I do think it could get very complicated to implement
— especially when it comes to chunking text correctly, handling layout issues, and managing the special cases like images or formatted blocks.
But overall, it’s a super smart approach to make translations much more manageable and scalable across multiple languages.
1 Like
thank you @jcuypers, let me try this