OpenAI Model responding with "I don't know" when it should be answering from the vector store

OpenAI Model responding with “I don’t know” when it should be answering from the vector store

Response: “I don’t have the information required to provide the full section of Section 41.12 from Chapter 41 of the Texas Property Tax Code.”

Workflow code -

Share the output returned by the last node

Output:
“I don’t have the information needed to provide the full section of Sec. 41.12. You may want to check the Texas Administrative Code or the applicable legal texts for the detailed contents of Sec. 41.12.”

Expected Output:
“Sec. 41.12. Approval of Appraisal Records by Board.
(a) By July 20, the appraisal review board shall:
(1) hear and determine all or substantially all timely filed protests;
(2) determine all timely filed challenges;
(3) submit a list of its approved changes in the records to the chief appraiser; and
(4) approve the records.
(b) The appraisal review board must complete substantially all timely filed protests before approving the appraisal
records and may not approve the records if the sum of the appraised values, as determined by the chief appraiser, of all
properties on which a protest has been filed but not determined is more than five percent of the total appraised value
of all other taxable properties.
(c) The board of directors of an appraisal district established for a county with a population of at least one million by
resolution may:
(1) postpone the deadline established by Subsection (a) for the performance of the functions listed in that subsection
to a date not later than August 30; or
(2) provide that the appraisal review board may approve the appraisal records if the sum of the appraised values,
as determined by the chief appraiser, of all properties on which a protest has been filed but not determined does not
exceed 10 percent of the total appraised value of all other taxable properties.
HISTORY: Enacted by Acts 1979, 66th Leg., ch. 841 (S.B. 621), § 1, effective January 1, 1982; am. Acts 1981, 67th Leg., 1st C.S., ch.
13 (H.B. 30), § 136, effective August 14, 1981; am. Acts 1985, 69th Leg., ch. 312 (H.B. 2301), § 4, effective June 7, 1985; am. Acts 1985,
69th Leg., ch. 630 (S.B. 575), § 1, effective June 14, 1985; am. Acts 1993, 73rd Leg., ch. 1031 (S.B. 893), §§ 7, 8, effective September 1,
1993; am. Acts 2007, 80th Leg., ch. 626 (H.B. 538), § 1, effective January 1, 2008.”

Information on your n8n setup

  • n8n version: 1.46
  • Database (default: SQLite): Supabase
  • n8n EXECUTIONS_PROCESS setting (default: own, main): default I think
  • Running n8n via (Docker, npm, n8n cloud, desktop app): n8n cloud
  • Operating system: Microsoft Windows 11 Home (Version 10.022631 Build 22631)

@Jon

Hi @dustinAIAIDE,

would it be possible to share the PDF you’re embedding so I could reproduce the workflow? :pray:
It seems like it’s returning the context doc but perhaps it’s only returning it partially due to chunking. It might also help to tone down the temperature of the model a bit.

Sure thing, of course.

TEXAS PROPERTY TAX CODE

https://comptroller.texas.gov/taxes/property-tax/docs/96-297-21.pdf

Let me know if there is anything else I can do. Thank you for any of your time or consideration @oleg

Thanks! I tried with embedding the doc and came up with pretty much the same results as you did when asking about a specific section content.
I think the problem comes from the fact that embeddings represent the “relatedness” of text, which doesn’t work great when searching for specific numeric sections. That would require some more custom approach, like creating an agent which would have access to tool that could extract a full section.
Using Question and Answer Chain is generally more useful when asking questions about specific information that could be found in the context.

1 Like

@oleg I see. Do you have any potential solutions? Really need this resolved for us to continue utilizing this chatbot.

If the the requirement is to be able to QA sections of the document, you could try following approach:

  1. Ingest the PDF and split it into array of sections objects. Each object would contain section number and content
  2. Populate the vector store with section_number metadata
  3. Create two sub-workflow tools for retrieving data from vector store. One generic and one for searching for a specific section number.
  4. Set-up tools agent and configure it to use these two tools.

The implementation could look something like this:

Now when we ask for a specific section number, the agent would use search_by_section tool and provide a correct response:

Thank you so much @oleg. I feel like I am super close to replicating what you did but in my own workflow (and with using supabase instead of pinecone).

Running into an issue getting the workflow to utilize the section_number tool properly. See that attached screenshots.

I feel like I am so close. I will be pasting what I have done thus far so you can review and let me know where I am being an idiot.


I couldn’t get a checkmark by “Aggregate”… so, I’m guessing there is something wrong with the “Search with section number” node.

and it says “No output data returned”

I got it to work 1 time! But, it didn’t return a lengthy response.

I got it to partially work if I kept it as a super duper simple question. @oleg

what can i do to make this even smarter now?

Why did my response say May 15th in 1 spot… and then say July 20th in another spot. There should have been 0 mention of May 15th in this response.

Hi @dustinAIAIDE, glad to read you’re getting close!
A few things I noticed in your workflow:

  1. There’s a typo in the Search with section number meta1 node’s “Prompt”. You’re using expression syntax, but the field setting is set to “fixed”. So you’re not actually passing down the agent’s query for vector search.

  2. If the model isn’t sending the section number, you can try to force it by setting both fields as required in the schema:

{
  "type": "object",
  "properties": {
    "search_query": {
      "type": "string",
      "description": "Search query for vector store similarity matching"
    },
    "section_number": {
      "type": "number",
      "format": "float",
      "description": "A float number of the section to search in"
    }
  },
  "required": ["search_query", "section_number"]
}
  1. How did you create the Supabase embeddings table? I see your table name is “tptc”. But did you also update the matching function name, or is it the default one?

For example, I just created a new table on my Supabase instance called texas_tax_code but I also had to modify the matching function. So the SQL insert looks like this:

-- Create a table to store your texas_tax_code
create table texas_tax_code (
  id bigserial primary key,
  content text, -- corresponds to Document.pageContent
  metadata jsonb, -- corresponds to Document.metadata
  embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
);

-- Create a function to search for texas_tax_code
create function match_texas_tax_code (
  query_embedding vector(1536),
  match_count int default null,
  filter jsonb DEFAULT '{}'
) returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language plpgsql
as $$
#variable_conflict use_column
begin
  return query
  select
    id,
    content,
    metadata,
    1 - (texas_tax_code.embedding <=> query_embedding) as similarity
  from texas_tax_code
  where metadata @> filter
  order by texas_tax_code.embedding <=> query_embedding
  limit match_count;
end;
$$;

When you do this, it’s important to set the “Query Name” in all the Supabase nodes. So in my case match_texas_tax_code:

I think issues you’re seeing is a combination of modal hallucinating(lowering the temperature to something like 0.2 could help with this). It might also help to first implement the workflow without the memory so that Agent doesn’t get steered by the previous responses.

Here’s an updated version of the workflow, implemented with Supabase and with improvements mentioned above. You can see that even with the memory, agent is correctly passing section number in both cases.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.