PDF Extract node not returning candidate name – mapping issue in Code node?
Body:
Hi everyone,
I’m building a CV parsing + AI screening workflow in n8n, but I’m running into an issue where the candidate name (and sometimes email/phone) is not being populated, even though the name is clearly present in the resume.
Context:
Webhook (POST multipart/form-data) receives PDF
PDF Extract node parses the resume
AI node (OpenAI) screens the candidate and returns structured output
Code node normalizes fields before creating candidate records
Problem:
All fields are extracted correctly except candidate_name, e-mail, phone number and location. It often comes through as empty, even though the name is visible in the extracted text. If anyone can help me let me know. Thank you so much!
Please attach additional info/minimal workflow/etc that others can use to help, a screenshot alone isn’t sufficient here!
Hi, thank you for answering. what do you mean by aditional info?
I mean, could you show exactly where the reference/mapping issue is, so anyone can see it and help?
or you could share the workflow up to that point only, so anyone can reproduce the issue and help, or even the entire workflow, whatever you prefer..
Thanks for the suggestion , good point.
From further debugging, it looks like the issue is happening before the mapping/reference step. The problem appears to be at the PDF text extraction + AI parsing stage.
Specifically:
-
The PDF extraction node sometimes returns noisy or malformed text (e.g. OCR issues like
2O21instead of2021). -
Because of this, the AI node is not only returning inconsistent or incorrect date formats, but it also seems to struggle extracting additional key information from the resume, such as the candidate’s name, email address, and phone number.
-
Without strict normalization and validation, the AI output looks valid as strings, but fails or gets misinterpreted when mapped into structured or date-type fields downstream.
At the moment, I’m adding logging of the raw extracted text and enforcing strict ISO 8601 output and stricter structured output rules in the AI node to better isolate and fix both the date and contact info extraction issues. If needed, I can share a reduced version of the workflow focused on the PDF → AI → structured output part so others can reproduce and help debug.
Appreciate any advice on best practices for improving reliability of date and contact info extraction in this setup.
