Appreciate help please. I’ve tried a few options (kudos to Claude) and can’t get this to work.
I am trying to send a pdf that was an email attachment to CloudConvert using the http request node. The request node executes successfully but CloudConvert is left in a status of “Waiting for file upload”.
I am using n8n cloud version 1.65.2. I understand there is a community node for CloudConvert but keen to stay on n8n cloud and support n8n with a subscription.
As a prior step to get familiar with the http request node: I saved the pdf to Google drive and created an http request node to get the file from google drive, send to CloudConvert then save the resultant jpgs to Google drive. This works well. I therefore understand that a workaround would be to save the file first then upload to CloudConvert but seems a shame to create an additional step and a need for temporary file storage.
(I know that I cant use JSON to reference the file directly thanks to this post ).
I have removed sensitive info from nodes pasted below.
—The http request node that gets the pdf from google drive - this works well:—
—The node that tries to send the attachment direct to cloudconvert which doesn’t work:—
Hey so I think the issue you were facing is that Cloud Convert’s import/upload endpoint does not accept a payload, it generates a URL with which a user can upload a file via a browser. To pass a payload directly in, import/base64 accepts a base64 (limit 10mb otherwise you have to use a cloud data store). Please see an example workflow below & hope this helps!
Oh right, one more thing! Converting to an image may not be necessary as I believe in another thread you indicated this was to feed into a chat model? Claude supports PDF parsing natively now, example workflow below.
Last note: Giving LLMs images of things to read is generally less reliable than parsing the PDF using tools like Textract (by far the best in my testing but quite complex to wrangle), Jina Reader (my favourite for most cases), Llama Index, Unstructured.io, etc… and passing that to the LLM for inference. Just 2c.
Many thanks @Bojan this is very thorough and thoughtful of you. Thanks for checking my other posts to understand further context.
When first designing the workflow I used Claude manually to perform OCR on the pdf and it couldn’t extract the text I needed. So I fed it a screenshot (as jpg) of the pdf and it worked perfectly hence I am going down this road.
Thanks for your suggestions of other parsers - I will check these out. It did cross my mind that an LLM was not necessarily appropriate for this task. I am creating a prototype quickly to start getting some user feedback. I’ll look to replace with one of your suggestions as soon as I can after that.
Jina reader looks interesting - if you have a node for Jina reader that you’d be willing to share I’d appreciate it but I understand if you’d rather not.
When first designing the workflow I used Claude manually to perform OCR on the pdf and it couldn’t extract the text I needed. So I fed it a screenshot (as jpg) of the pdf and it worked perfectly hence I am going down this road.
The specific implementation I shared might be a little different than what you tested as Anthropic introduced an enhanced PDF understanding mechanism, currently available with the beta flag you see in the headers of that request. Essentially, previously they would try to extract text from the PDF and feed just that to the LLM and now they automatically pass images of the PDF. Source: Anthropic’s X post
Jina reader looks interesting - if you have a node for Jina reader that you’d be willing to share I’d appreciate it but I understand if you’d rather not.
Of course, see below. I included two versions, one standard version that responds with an easy-to-work-with JSON which is all you need for PDFs. I also included one that uses the stream option which I’ve found to be necessary for many websites should anyone want to use Jina Reader for that. Additionally threw in some code to remove links out of the text output as I find they usually just pollute the text. Again, for PDFs that’s probably not needed.
Thanks @Bojan. Ah! that’s why you had beta in the header! I wanted to learn so created the node from scratch using both your example node and the anthropic api documentation. I saw the beta parameter in your header but didn’t understand its importance so didn’t include it in my node. As a result the OCR didn’t get the text I needed. I tried Jina reader but afraid that didn’t get the text I need either. So I went back to the jpg OCR flow I was working on.
Then (after a few more hours trying to get the jpg OCR flow working) I spotted your post above. I added in the beta parameter and hey presto it worked! I get the text I need with a much simpler flow - no messing around with file conversion and pdf and jpg storage! Thank you so much @Bojan !
I just completed my first month in n8n so if you don’t mind me asking how do you know so much? Are there sources you’d recommend so that I can become as knowledgeable/skilled as you?
I added in the beta parameter and hey presto it worked! I get the text I need with a much simpler flow - no messing around with file conversion and pdf and jpg storage!
Wonderful, glad to hear that helped!
I just completed my first month in n8n so if you don’t mind me asking how do you know so much? Are there sources you’d recommend so that I can become as knowledgeable/skilled as you?
Congratulations on your first month on n8n! Did you have previous experience with low-code tools? I’ve also only been working with n8n for a couple of months but I have the advantage of having alot of experience with other platforms before this + my main work is advising other Automation & AI agencies so I get a lot of exposure to different types of problems.
The #1 advice I usually offer and see others giving (who aren’t selling courses) is to practice by building. Solving your own real-life problems is best but sometimes re-creating template / example flows is worthwhile too. One thing I do is to browse the forums looking for interesting problems (like yours) and trying to solve them! It goes without saying that completing the official n8n courses and Youtube series as you’ve already started can really help.
Best of luck, the community is here to help when you need it!
Thanks for all the advice and time @Bojan. Wishing you all the best.
To answer your question: n8n is my first no-code platform. I spent about 8 months learning python/machine learning before switching to this more practical approach. Other than that I have no coding background though my career has been in gathering software requirements so I am comfortable listening to people’s problems/needs and structuring requirements so that a system can be built/improved/tested. It feels really good to be on the build side now!