RAG Agent returns inaccurate answers from uploaded PDF file

:wave: Hi everyone,

I’m building an RAG-based workflow in n8n to analyze financial reports (PDFs) of public companies.
The goal is to allow users to ask structured questions and get accurate answers based only on the contents of the uploaded reports.


:arrows_counterclockwise: Workflow Overview:

  1. The user uploads a PDF via a web form.
  2. The file is saved in Google Drive and retrieved by the workflow.
  3. Text is extracted using the Extract PDF Text node.
  4. The extracted content is split into chunks using a custom splitter (or the Text Splitter node).
  5. Each chunk is classified into a financial section (e.g., Balance Sheet, Income Statement, etc.).
  6. The chunks are then embedded (using OpenAI’s text-embedding-3-small) and stored in a Supabase Vector Store.
  7. A LangChain RAG Agent (powered by GPT-4) is used to answer user questions based on those embeddings.

:exclamation: The Problem:

Even when the data clearly exists in the report, the RAG Agent often returns inaccurate, vague, or missing answers.

Example:
If a user asks about net profit, the agent sometimes responds with generic info or “data not found”, despite the number being explicitly present in the report.


:question: My Questions:

  • Could the chunking strategy (length or method) be causing this?
  • Is it better to split content by headings or logical sections instead of fixed-size chunks?
  • Could the classification or metadata fields (e.g., section, year, etc.) be affecting the retrieval?
  • Are there best practices to improve semantic linking between chunks and user intent?

:bulb: I’d love to hear from others who tried similar use cases or have ideas on improving RAG performance with long financial PDFs.

I’m also happy to share the full workflow JSON if that helps.

Thanks in advance :raised_hands:

help ? someone ?

Hi

I’ve built something similar.
Text embeddings large is more accurate at 1024 with a 150 overlap.
Second is to use supabase hybrid search as it returns more accurate results.
It’s still not perfect but it’s much better.
I’m currently testing contextual embeddings which should also improve retrieval.
I hope that helps!
B