Regex in Function Node Fails Silently on Read PDF Output

Describe the issue/error/question

I’m trying to apply Regex (str.matchAll) text matching to the output of the Read PDF node but it seems to see the output but can’t actually process it. I’ve read through a lot of posts here and they all seem to say that the node is extremely dated and possibly error prone. Any workarounds to this? I’m getting the PDF file via an API call so any external processing is not possible, unfortunately. Is this an error in my code or am I crazy? This has been haunting me for hours now!

What is the error message (if any)?

None, fails silently.

Please share the workflow

Input: PDF File with the string literal saved in Function Node.
The Function Node works perfectly fine if it’s processing the string literal internally or from a Set Node but not from the Read PDF node.

Share the output returned by the last node

Current:
Screen Shot 2022-03-24 at 19.24.18

Expected:

Information on your n8n setup

  • n8n version: Same problem on latest and on an older version. Desktop or self hosted with Docker.
  • Database you’re using (default: SQLite): Default

Hey @ugly, I am sorry to hear you’re having trouble here. Could you share the data returned by your Read PDF node so I can test this on my end against your actual data? Feel free to redact confidential parts, I am more interested in the structure than the actual content.

Is a screenshot okay?

Otherwise here’s the raw data:

[

{

"numpages": 1,

"numrender": 1,

"info": {

"PDFFormatVersion": "1.4",

"IsAcroFormPresent": false,

"IsXFAPresent": false,

"Title": "Test",

"Producer": "Skia/PDF m101 Google Docs Renderer"

},

"metadata": null,

"text": " from here 812062Shimano Ultegra R8000 GS Rear Derailleur Medium Cage 1 911126Shimano Ultegra R8000 Front Derailleur1 1013346Shimano Umwerferschelle 31.8mm1 1111123Shimano Ultegra Di2 ST-R8070/BR-R8070-L Brakeset Shift/Brakeset, Left 1 1211122Shimano Ultegra Di2 ST-R8070/BR-R8070-R Brakeset Shift/Brakeset, Right 1 1311548Shimano Ultegra R8000 Crankset 52/36 170mm1 1411557Shimano Ultegra R8000 CN-HG701 Chain 116g1 1511530Shimano Ultegra CS-R8000 Cassette 11-281 until here and from here 1611709Shimano Ultegra Icetech Disc Rotor RT800 CL, 140mm1 1711929Shimano Ultegra Icetech Disc Rotor RT800 CL, 160mm1 1810918Zipp Service Course SL70 Ergo Handlebar 38cm1 1910925Zipp Service Course SL Stem, 6°, 31.8, 110mm1 2010928Zipp Service Course SL Seatpost, 27.2, 400mm, 0mm, Black 1 2113597Fizik Vento Microtex 2mm Tacky, Black1 2212895Fizik Vento Argo R3 Saddle, 140mm1 2312065DT Swiss ER 1400 Dicut DB 21 DISC, Shimano1 2411426Vittoria Corsa Control 28mm, Tan Wall2 2511372Continental Schlauch Race 28 S422 until here",

"version": "1.10.100"

}

]

@ugly can you please share a copy of the pdf file?

never mind … I already made a test.pdf myself and tested it. Very strange that your function seems to work fine when working with sample data, but does not work with the json from the Read PDF node, which looks exactly the same.

@ugly I think I found the solution by adding JSON.stringify(…), as follows:

const str = JSON.stringify(items[0].json.text);

@MutedJam what make the json data from the Function and Read PDF nodes different?

2 Likes

Wow. Tbh, I have no idea what is causing this, typeof items[0].json.text tells me the data coming from the Read PDF node is a valid string. I’ll see if I can revisit this when I have some more time on my hands, for now thanks so much for sharing a solution here @dickhoning!

This has indeed fixed it so that I can use the output of the Read PDF node but has broken my regex due to all the page breaks being converted to the literal “\n” :sweat_smile:

I think I’m going to have to put this project on pause because the Asana node has an issue now as well when I try to create a subtask for each item from the function node, the first item goes through and the rest have this cryptic error.

The second error is most likely due to an expression only working for the first item. Seems like asana is expecting a numeric value for your task field - which expression are you using for that field?

Definitely a numeric value, the gid of the parent task. Have tried it with the Asana node and the HTTP Node, (in JSON format) and both create the first item without problems and the second onwards all have the same 400 Error. The gid doesn’t change between tasks which is strange.

Hey @ugly, and this gid of the parent task would be fetched using an expression? Are you using $item(0) in your expression as described here?

If not, your expression would for the first item it processes try to read the first element from the node providing your gid value. Then for the second item if would try and read the second element from the node providing your gid value which I suspect might fail from the sounds of it.

1 Like

Hot damn I learned something new again, thanks @MutedJam! And thanks @dickhoning for the stringify hack!

I’d be interested in knowing what the problem with the Read PDF Node was if you do ever figure it out.