Hi all, i would like to know if it’s possible to Read Encrypted PDFs (by using a saved password) with/without the ‘Read PDF’ node?
My use-case:
IMAP (Download pdf attachments) > Read PDF (after decrypting with saved password) > other nodes…
Thanks.
lublak
December 17, 2020, 11:34am
2
Hi @shrey-42 ,
currenlty i working on a new Read PDF node.
opened 03:04PM - 09 Dec 20 UTC
**Describe the bug**
pdf-parse looks like complete unsupported and it use an … old pdf.js version.
**Expected behavior**
A maintained alternative of pdf-parse should used with newer pdf.js.
**Additional context**
Currently i'm look into it, if we just can use pdf.js without wrapper. Since pdf.js closed some issues.
I copied the tests of pdf-parse and write a new wrapper based on the new version with the npm package https://www.npmjs.com/package/pdfjs-dist. I make a pull request if it is simple to integrate.
**Some links**
https://github.com/n8n-io/n8n/blob/master/packages/nodes-base/package.json
https://github.com/n8n-io/n8n/blob/master/packages/nodes-base/nodes/ReadPdf.node.ts
https://gitlab.com/autokent/pdf-parse
**Update 2020-12-17**
I readed alot about transform in textContent and how to read pdf with pdfjs.
I implemented a dirty js library to test it out and it works fine.
Currently i try to implement a tool that get some outline information (bookmarks).
After that i translate to code to typescript.
But at the moment I don’t know whether I should create an external library with a test environment which will then be integrated or whether the implementation should be built directly into the node direct on pdfjs without extra dependencies
**Update 2021-02-02**
Through other projects, it took a little longer, but the rewriting in typescript is progressing. It just depends on this pull request:
https://github.com/DefinitelyTyped/DefinitelyTyped/pull/50979
**Update 2021-04-15**
Sorry for the delay, but I have now built a typescript project.
(https://github.com/lublak/pdfdataextract)
There are still 3.5 things missing.
**Update 2022-01-17**
And another delay (sry). But the development has begun and we are gradually replacing it. Plus more functions that were missing will be added.
Other functions will also follow, but they still need a bit of work: https://github.com/lublak/pdfdataextract/tree/contentinfoextractor
**Todos**
- [x] tests (jest is already done, only pdf examples are missing)
- [x] npm release
- [ ] rewriting the current node
and the 3.5 thing:
- [x] improve the code
PDF.js supports password protected pdfs.
So i included it in the new node.
But at the moment I don’t know whether I should create an external library with a test environment which will then be integrated or whether the implementation should be built directly into the node direct on pdfjs.
have u got solution for this ??