Document Loader node fails with “DOMMatrix is not defined” while parsing PDF files

Hi everyone,

I’m facing an issue with the Document Loader node while ingesting PDF files in a Gemini RAG workflow on self-hosted n8n.

Workflow:

Webhook
→ Read/Write Files from Disk
→ Document Loader
→ Vector Store Insert

The PDF file is successfully detected and passed as binary data into the Document Loader node, but the node fails while parsing the PDF.

CSV ingestion work correctly. The issue only occurs with PDFs.

Error:

DOMMatrix is not defined

Input received by the Document Loader node:

{
  "mimeType": "application/pdf",
  "fileType": "pdf",
  "fileName": "agents.pdf",
  "fileExtension": "pdf",
  "fileSize": "2.03 MB"
}

Document Loader configuration:

  • Type of Data: Binary

  • Mode: Load All Input Data

  • Data Format: Automatically Detect by Mime Type

  • Text Splitting: Custom

Environment:

  • n8n version: 2.23.0

  • Database: SQLite

  • Running via: Docker (Self Hosted)

  • OS: Ubuntu Linux

Can you revert to 2.21.7, @Rahul_Parmar ? 2.23 is still beta

@Rahul_Parmar known regression since n8n 1.98 when pdfjs-dist was upgraded to a version that needs browser APIs like DOMMatrix — multiple github issues open on it (#16593, #16438, #16422). workaround that works on 2.23.0 without downgrading: bypass Document Loader for PDFs and use a Code node with the pdf-parse npm package instead. set NODE_FUNCTION_ALLOW_EXTERNAL=pdf-parse in ur env vars, extract the text in Code node, pass the extracted text into Vector Store Insert directly. is the rest of ur workflow heavily tied to Document Loader specifically, or just using it for PDF text extraction?

I was previously using 2.21.7, but I upgraded because of the pdf-parse v1.1 issue reported in “Error in Node Default Data Loader” , which appears to have been fixed in 2.23.0.

However, after upgrading, I’m now hitting the DOMMatrix is not defined error specifically during PDF ingestion in the Document Loader node.

Will wait for the stable release.

reference:

  1. 2.23 · Releases · n8n-io/n8n · GitHub
  2. Pdf-parse v1.1 Error in Node Default Data Loader - #5 by tamy.santos

good morning @Rahul_Parmar
please share your json.

good morning, @tamy.santos

here is the json file.

@Rahul_Parmar
I didn’t find any issues with your code, it might be an environment compatibility problem.
change the typeversion to 1.0, this forces the loader to use pdf-parse instead of pdfjs-dist, eliminating the DOM dependency or add a Code node as a polyfill before the loader.

I have the same problem as reported above (DOMMatrix is not defined). As you suggested I’ve reverted the Data Loader to typeversion 1.0, added a Recursive Character Text Splitter to the node, but I’m still getting the (DOMMatrix is not defined) error.

(I have n8n self hosted and I’m using version 2.23.1, since as the person posting the original issue, I was also getting pdf-parse v1.1 issue reported in “Error in Node Default Data Loader”.

hi @atdev150, welcome to the n8n community.
each case is different, it’s necessary to evaluate your json. please open a question so the community can support you directly or check if my previous answer also helps you.

Sorry, first time posting :sweat_smile: .

Will keep an eye to the responses from Rahul, since I think I have the same issue, if its not solved, I will open a new question. Thanks a lot Tamy!

Count on us :dizzy:

good morning,@tamy.santos

hello, @atdev150

I just wanted to know which n8n version you tested this on. Also, did you use the same Document Ingestion workflow? Were you able to successfully upload and process a PDF using it?

@Rahul_Parmar 2.22.5

A new version was released today on n8n cloud @2.25.1 (beta) but it appears this issue still persists. Still waiting for the fix though :slightly_smiling_face:

Welcome to the n8n community @BotRidwan

It’s recommended to opt for the stable version instead of Beta, for stability reasons.

While waiting for the official fix, a working workaround on self-hosted is to swap the Document Loader node for a Code node that uses pdf-parse instead. If you add NODE_FUNCTION_ALLOW_EXTERNAL=pdf-parse to your n8n environment variables and install the package in the container, you can parse the PDF binary directly without the DOMMatrix dependency:

const pdfParse = require('pdf-parse');
const buffer = Buffer.from($binary.data, 'base64');
const result = await pdfParse(buffer);
return [{ json: { text: result.text } }];

Not as clean as the native node, but unblocks the RAG pipeline until the fix lands in stable.

Hello, in my opinion the issue has been definitively resolved with version 2.23.4. Many thanks to everyone who pursued this matter so persistently! Best regards, Thomas

Actually, the problem hasn’t been resolved yet. The “DOM Matrix is not defined” error appears again in version 2.25.7. In version 2.23.4, the workflow runs smoothly. The workflow also runs smoothly in the “unofficial” version 2.26.0.