How to filter multiple metadata values in a Vector Store Retriever PGvector node?

I’m building a Q&A workflow using n8n with a Vector Store Retriever connected to Postgres PGVector.

The user sends a JSON payload from the frontend with the following structure:

{
  "question": "Is Benito Juárez's birthday considered a non-working day?",
  "collection": "1952708",
  "category_id": ["35724307", "35724308"]
}

So i receive the data and parse it in a Set for obtaining the question, collection and category

I want the Vector Store Retriever to:

  • Search only within the specified collection.
  • Filter results by one or multiple category_id values, which are sent as an array.

When I pass a single value for category_id in the Vector Store Retriever, the filtering works perfectly.

However, when I pass two or more values (as an array), the retriever returns no results.

No error message is displayed, but the retriever returns an empty result when filtering with multiple values.

It seems that the retriever interprets the array incorrectly, as if the metadata contains category_id = a,b rather than filtering by category_id = a OR category_id = b.

This is how my database looks like for metadata in embeddings:

{
  "loc": {
    "lines": {
      "to": 28,
      "from": 1
    }
  },
  "source": "blob",
  "blobType": "text/plain",
  "filename": "Oficio día inhábiles Aduana de Queretaro.pdf",
  "idTrafico": "1952708",
  "idDocumento": "35724300"
}

My question is: How can I properly filter by multiple category_id values in the Vector Store Retriever? I need to select one or more categories for retriever because the final goal is:
In my frontend, the user can:

  • Select one or more documents they want to query.
  • These documents are associated with one or more category_id values.
  • The frontend sends the selected category_id values as an array along with the question, so the retriever should only search within those categories.

Here is my workflow:

Hi @merarisosa
Welcome to the community!

I think this happens because you are joining the array of category_id’s into a comma separated string. This results in a query for documents with a category:'category1,category2' => empty response.

If I remember correctly to achieve a combination of filters with OR condition between them you have to pass them as a list/array.

Instead of using {{ $('set database search').item.json.category_id.join(',') }}
you can try to directly pass: {{ $('set database search').item.json.category_id }}.

Hi! @Ventsislav_Minev Thank u for u reply. I already tried just passing the array

i passed it to the pgvector store retriever but it didnt work as expected

It seems like the PGVector Store is ignoring the filter. I tried asking something that is not contained in the array I passed, but when the retriever looks up the embeddings, it ignores the filter and brings up an answer that actually would be correct if I had specified that category, even though it belongs to a category that was not included in the JSON.


as you can see, the json does not specified the category (or idDocumento)

"idDocumento": "35724300"

which is where the retriever is looking for the answers

1 Like