According to Qdrant documentation when inserting with the same ID, points will be replaced. But in n8n I can’t see how I can set up ID for qdrant points. I set id in metadata, but it’s not the point id. I need it because every time I insert same document it duplicates.
It looks like your topic is missing some important information. Could you provide the following if applicable.
- n8n version:
- Database (default: SQLite):
- n8n EXECUTIONS_PROCESS setting (default: own, main):
- Running n8n via (Docker, npm, n8n cloud, desktop app):
- Operating system:
Hey @Alex5
The builtin Qdrant node does not allow you to specify the id for points, only content and metadata. To specify the id, you’ll need to use an alternative method such as a custom API call.
Regarding your problem however, if you are having trouble with duplicates I’d recommend the “clean slate” approach - clearing all revelant points and inserting them again.
You can delete all previous points using the Qdrant Delete API which lets you delete using a filter:
POST http://<qdrant_endpoint>/collection/<name>/delete
{
"filter": {
"must": {
"key": "metadata.custom_id",
"match": {
"value": '<my_custom_id>'
}
}
}
}
Downsides of Upserting
Say you have v1 of a document which is 3 pages long and you insert into your vector store, it creates about 9 points. Some time passes and you receive v2 of the same document but is now a bit shorter - it’s 2 pages long for example.
When you attempt to upsert v2 into your vector store, you find that v2 creates roughly 6 points and updates only the first 6 previous points with matching ids. The problem is that there’s still 3 points from the previous v1 which remain in the vector store. Now, 3 outdated points might not sound like much but then think about what happens with v3, v4 and so on.
Upserting only works if the length of the new/updated document is exactly the same so that chunking it produces the same number of points. Most likely in real world scenarios, an updated document is either longer or shorter than a previous version and so should be considered a new document altogether when using with vector stores.
Best approach IMO to ensure you don’t get these outdated points in your vector store is just to clear everything* out and reinsert as new.
(*where everything is scoped to the document in question, not implying you need to empty your entire vector store each time!)
Thanks so much, Jim. It’s really helpful
hi Jim, can you give an example of how to scope a delete to the points associated with a given document?
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.