How to create meaningful chunk IDs / metadata in n8n vector store (instead of UUIDs)?

Hi everyone,

I’m currently building a RAG pipeline in n8n using a vector database, and I have a question regarding how chunk IDs and metadata are generated and stored.

Right now, the vector store automatically generates IDs like this:

000c27f1-d67b-4dae-a1b4-91d4c6f157ae

However, for debugging and traceability purposes, we would prefer something more meaningful.

What we are trying to achieve:

We would like the ID (or metadata) to include:

  • The original file name

  • The chunk index (e.g. 1, 2, 3, … per document)
    → Example: filename_1, filename_2, etc.
    OR

  • The line range of the chunk already available in metadata, for example:

"loc": {
  "lines": {
    "from": 450,
    "to": 512
  }
}

Ideally, we would like something like:

  • filename_chunk_3
    or

  • filename_450-512

Question:

Is there a recommended way in n8n (or LangChain vector store nodes) to customize:

  • the document chunk IDs, or

  • the metadata structure used for storage in pgvector?

Or is the UUID generation fixed, and we should instead handle this purely via metadata enrichment before insertion?

Any best practices would be really appreciated

@Leon22 the UUIDs are generated by langchain under the hood, you can’t override them from the n8n vector store node directly. what you want to do is enrich metadata before insertion — put a Code node between your text splitter and the PGVector node, loop through items and add filename_chunk_1 etc to each item’s metadata using $json.metadata.loc.lines which is already there. then you can query/filter by that field in pgvector. the Default Data Loader also has a metadata field where you can set key-value pairs per document if you want to keep it simpler

Hi @Leon22
honestly the UUID generation for chunk id is automatic and is not directly customizable in n8n given nodes for vectors, for now you can read this:

i would say do not reply on UUID, focus on meta data enrichment, you can get that working with the default data loader menu option or a code node if you prefer more customization in injestion, try using a code like:

for (const item of $input.all()) {
  item.json.metadata = item.json.metadata || {};
  item.json.metadata.source_file = 'filename.pdf';
  item.json.metadata.chunk_index = item.json.metadata.loc?.lines?.from + '-' + item.json.metadata.loc?.lines?.to;
  // e.g. results in "filename.pdf" + "450-512" in metadata
}
return $input.all();

That’s a really great tip, thanks for pointing that out

Quick follow-up question:

How can I actually verify that the metadata fields are being indexed and used at all?

Since pgvector stores all embeddings in the same column, I’m not sure how to confirm whether Postgres is really using the metadata indexes during retrieval.

@Leon22 run \d your_table_name in psql to see existing indexes, then check with EXPLAIN ANALYZE on a query filtering by your metadata field — if it shows a sequential scan instead of an index scan, the index isn’t being used. heads up: if you’re on the default langchain pgvector schema the column is called cmetadata not metadata. you can add one with CREATE INDEX ON your_table USING gin (cmetadata jsonb_path_ops); if it’s missing — jsonb_path_ops is the right opclass here since langchain filters use @> containment.

Thanks again for all the help so far — really appreciate the detailed guidance

I have two follow-up questions regarding metadata and file handling:

1. File path in metadata
Right now I’m already able to pass things like the file name into the metadata, since I get that when uploading/processing the file.

But how can I also include the full file path (e.g. from the file server)?

Is there a way in n8n to:

  • read the original file path directly, or

  • pass it along from the file crawler / source node into the workflow?

And as a follow-up:
How would this work when using a Webhook?

Is it possible to pass the file name and file path via the webhook payload and then use it the same way as when uploading a file manually

2. Clickable file links in chat
Is it possible to return the file path as a clickable link in the chat output?

For example:

  • User asks about a document

  • The system answers + provides the file path

  • And ideally, I can click it and jump directly to the file (e.g. file server path)

Does something like this work in n8n chats?
Are file path links (e.g. network paths or URLs) supported as clickable links?

Would be great to know how others are handling this