Open PDF as Google Doc

Hi everyone!

I’ve been trying for a long time to solve the problem of opening an existing PDF document on Google Drive as a text file in Google Docs format. I tried configuring it via an HTTP Request node, changing the Content-Type, and creating a copy of the file in DOC format. I’ve asked all the smartest AI models multiple times (DeepSeek, GPTs from OpenAI, Grok3), but they keep talking about non-existent fields and checkboxes—they couldn’t help.

Please help!)

Thanks in advance!

![Pfdf|478x235](upload://5ZHY13h8w8sJc8HrFhvr2M7EeUZ.jpeg)


## Information on your n8n setup
- **n8n version:1.7**
- **Database (default: SQLite):**

- **Running n8n via (Docker, npm, n8n cloud, desktop app):NPM**
- **Operating system: windows 10**

Looks like your PDF didn’t upload correctly in your first post.

This solution could work for you:

.
.
.


If this helped you, please mark my reply as correct solution✅ and give it a like❤️
Have fun!:robot:

You can do it like this:
Download the pdf > Convert it to JSON > Create a new google doc > update it by the id of the created file.

Will it look good? Probably not since you have pretty limited formatting options with the google doc node.
I’ve tried using AI to transform the extracted text to Markdown since I know Google Docs supports it (You have to enable it though.) but then the update doesn’t paste it as markdown and you have to open the document cut and ‘paste from markdown’

Thanks for the answer! But that’s the first thing I tried. And instead of text, i see “\n\n\n\n\n\n\n\n\n\n.” The file must be protected. But if you open the file on the disk, right-click “Open with Google Docs.” there’s the text you need.

Not sure why this happens to you. Just tried it and it worked:

I try not to post untested suggestions :expressionless:
I also updated my previous comment to include my experience with trying to format it because as you can see it looks terrible.


Thanks for the answer! But here’s what I see in this node…

Are you sure the text in your PDF is actually text?
In many cases it is a picture.
If you can’t select it like this it probably is a scanned document or something.

Then you need to use some kind of OCR but i haven’t tested this with n8n

I don’t care about formatting at all, the main thing is to take the text for analysis.

I think if it was as a picture, it wouldn’t openI think if it was as a picture, it wouldn’t open in Google Docs as text.

If it is a scan/picture you will get something like \n\n\n\n which are the new lines between the pictures. For pictures you will have to run it through OCR but i haven’t done this with n8n yet.

I set up recognition through the api PDF.io, easy and simple, but it is paid, with large amounts of text will become noticeable.

Yep services can pile up. Not sure what is the price for OCR from PDF.io
but I can see n8n has AWS Textract node (pretty limited though but it can handle authentication for you while you send the actual request with an HTTP node)
AWS textract has a free tier of 1000 pages per month and after that prices are pretty reasonable.

However it introduces another service and a dependency so even if price is ok it still has some downsides.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.