I am working on an automated system (Roland Project) to convert PDF files to DOCX using the LlamaParse API. My workflow follows these steps:
HTTP Request (POST): Successfully uploads the PDF and receives a Job ID (Status: PENDING).
Wait Node: I have implemented a 60-second delay to ensure the file is processed.
HTTP Request (GET): Attempts to retrieve the converted file using the following URL: https://api.cloud.llamaindex.ai/api/parsing/job/{{ $node[“HTTP Request”].json.id }}/result/docx
The Issue:
Even after the 60-second wait, the second node returns a 404 Error: “The resource you are requesting could not be found.”
Technical Setup:
Method: GET
Headers: Authorization: Bearer [My_API_Key]
Response Format: File
Send Body: OFF
I have confirmed that the Job ID is being passed correctly from the first node. Why is the resource not found despite the wait time and correct URL structure?
The 404 error occurs because LlamaParse does not support DOCX as an output format.
While LlamaParse can read DOCX files as input, its purpose is to convert documents into LLM-friendly formats like Markdown, JSON, or Text. It is not a direct “PDF-to-Word” conversion tool. The endpoint .../result/docx simply does not exist.
Why this is failing
Invalid Endpoint: You are requesting a resource (docx) that the API does not generate.
Supported Outputs: The valid output formats are markdown, json, text, xlsx (Excel), and pdf (reconstructed).
How to Fix It
Option A: Get the Valid Output (Markdown)
If your goal is to get data for an LLM, change your GET request URL to retrieve the Markdown or JSON result.
If you specifically need a DOCX file for a user to edit, you must add a conversion step after LlamaParse, or switch tools.
Step 1 (LlamaParse): Get the result as Markdown (/result/markdown).
Step 2 (Conversion): Use a tool to convert that Markdown to DOCX.
Method: Pass the Markdown string to an API like CloudConvert or Pandoc (if running locally/via script) to generate a .docx file.
Alternative: If you don’t need the advanced “LLM-optimized” parsing of LlamaParse and just want a visual conversion, consider using the Adobe PDF Services API or CloudConvert API directly instead of LlamaParse.
Pro Tip: Better “Wait” Logic
Instead of a fixed 60-second wait (which is brittle), use a “Loop” or “Retry” mechanism:
Loop Node: check the job status endpoint: https://api.cloud.llamaindex.ai/api/parsing/job/{{ $node["HTTP Request"].json.id }}
Condition: Continue only when status equals SUCCESS.
Result: Then fetch the /result/markdown endpoint.
Here is a video explaining LlamaParse’s output capabilities and how to control the result formats:
This video is relevant because it details the specific output structures (JSON, Markdown) available in LlamaParse, helping you understand why “DOCX” is not a valid option and how to extract the data you actually need.
Hi @abd2001!
You are getting 404 because that source does not exist, the endpoint you are trying to reach which is /result/docx does not exist and is not a supported format, the supported formats are:
/result/markdown
/result/text
/result/json
/result/pdf
For your use case you have to first convert that DOCX file into markdown or text and then use endpoints like /result/markdown , /result/text and this would work, hope this helps.
Hi @abd2001 ,
Looking at your screenshot, I can see you’re working on a n8n workflow for the Roland Project to convert PDFs to DOCX using LlamaParse API. The 404 error you’re getting is frustrating, but it’s likely one of a few common issues.
The main problem is probably that 60 seconds isn’t enough time for the file to be fully processed, or you might need to check the job status before trying to download the result.
Here’s what I’d suggest:
First, instead of just waiting 60 seconds and hoping the job is done, you should check if the job actually completed. Add a step that calls: https://api.cloud.llamaindex.ai/api/parsing/job/{{ $node["HTTP Request"].json.id }}
This will tell you if the status is “SUCCESS”, “PENDING”, or “PROCESSING”. Only try to download the result when it’s actually successful.
Second, double-check your result endpoint. You might need to try it without the /docx part: https://api.cloud.llamaindex.ai/api/parsing/job/{{ $node["HTTP Request"].json.id }}/result
And then specify the format using a query parameter like ?result_type=docx or through headers.
As a quick test, you could also try increasing your wait time to 2-3 minutes just to see if it’s purely a timing issue.
Technical Report: File Name Loss After Gemini Node Processing
1. The Issue:
When processing files (images or documents) through the Gemini node, the output in the subsequent Code Node loses its original “fileName”. The system then falls back to a hardcoded default name (like “مستند_نور”), even though we want the output file to match the input file’s name.
2. The Root Cause:
The Gemini node is primarily a JSON-based processing node. When it receives binary data, it extracts the content to analyze it but does not pass the binary metadata (specifically the fileName) to the next node in the chain. This creates a “data disconnection” where the downstream JavaScript code can see the generated text but has no record of the original file’s identity.
3. The Fallback Logic Problem:
The current code includes a try-catch safety net. Because the immediate input from Gemini lacks the filename, the code “gives up” and triggers the catch block, which assigns the static placeholder name. This is why the default name appears every time.
4. The Implemented Solution (Data Napping):
To fix this, we implemented a “Global Historical Search” strategy. Instead of relying on Gemini’s output, we forced the Code Node to scan the entire execution history to “snatch” the original name.
5. The Solution Code:
// 1. Extract text from Gemini
let rawText = items[0].json.output || “No text received”;
// 2. Global Search for the original FileName
let baseName = “”;
try {
// We use $(““).all() to scan all previous nodes in the workflow history
const allNodesData = $(””).all();
for (const item of allNodesData) {
// Look for the first node that contains binary data with a fileName
if (item.binary && item.binary.data && item.binary.data.fileName) {
let fullName = item.binary.data.fileName;
baseName = fullName.substring(0, fullName.lastIndexOf(‘.’)) || fullName;
break; // Stop once the original name is found
}
}
} catch (e) {
baseName = “Default_Fallback_Name”;
}
// 3. Inject the original name back into the new files (Word/HTML)
return [
{
json: { type: “doc_file”, fileName: baseName },
binary: {
data: {
data: “…”, // Formatted Content
fileName: baseName + “.doc”
}
}
}
];
6. Conclusion:
By using the $("*").all() method, we bypassed the metadata loss in the Gemini node and successfully re-established the link between the input source and the final output document.