Storing json in Qdrant

Describe the problem/error/question

Hi,

I have a massive json containing a list of holiday destinations.

Here is an example

[{
        "id": 1,
        "name": "AAA",       
        "rating": 5,
        "thematics": [
            "Plages de rêve",
            "Chil & Nature",
            "Best sellers"            
        ],
        "situation": {
            "latitude": 44.461913,
            "longitude": -1.131084,
            "address": "6523 Route de Bordeaux",
            "postalCode": "40600",
            "city": "Biscarrosse",
            "countryCode": "fr"            
        },
        "description": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum",
        "url":"destination1url",
        "photos":[
            "photo1Url",
            "photo2Url",
            "photo3Url"
        ]
    },
    {
        "id": 2,
        "name": "BBBB",
        "rating": 5,
        "thematics": [            
            "Bord de lac",
            "Toboggans",
            "Jardins luxuriants",
            "Spring breaks",            
        ],
        "situation": {
            "latitude": 44.461913,
            "longitude": -1.131084,
            "address": "6523 Route de Bordeaux",
            "postalCode": "40600",
            "city": "Biscarrosse",
            "countryCode": "fr",
            "department": "Landes",
            "region": "Aquitaine",
            "touristicZone": "Bisca Grands Lacs"
        },
        "description": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum",
        "url":"destination2url",
        "photos":[
            "photo1Url",
            "photo2Url",
            "photo3Url"
        ]
    }
]

The ultimate goal of my project is for an AI agent to be able to use all the information contained in this JSON file to search for information and respond correctly based on user queries.

I made my first attempts by providing this JSON to the AI agent, but I realised that the processing time was too long and too costly in tokens.

After some research, it seemed to me that storing this JSON in a vector database was a good idea.

I did so (using Qdrant) and indeed, the AI’s response is significantly faster and less costly in terms of tokens.

However, my question is, how do I provide the data to Qdrant so that it is indexed in the best possible way?

I am using the ‘Qdrant Vector Store’ tool to send the data to Qdrant, but in my dataLoader, how do I choose what to put as “content” or “metadata”?

I am a beginner on vector database so…

Am I right in my process, do you have any advice ?

Please share your workflow

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 1.107.3
  • Database (default: SQLite):
  • n8n EXECUTIONS_PROCESS setting (default: own, main):
  • Running n8n via (Docker, npm, n8n cloud, desktop app): npm
  • Operating system: Windows 11

In order to properly index your JSON in Qdrant for AI retrieval, utilize the destination description as principal “content” for vectorization since it has the richest semantic data. Index all other fields (id, name, rating, thematics, situation, etc.) as “metadata” for filtering and context. This enables the AI to rapidly locate applicable destinations through semantic search in the description, while metadata facilitates accurate filtering (e.g., by rating, by location, or themes) without inflating the vectorized content. Utilize a sentence transformer model (e.g., all-MiniLM-L6-v2) to produce embeddings from the description field.

if it helped mark it please.
thanks.

1 Like

Thanks for your help
What do you mean by “Utilize a sentence transformer model (e.g., all-MiniLM-L6-v2) to produce embeddings”

In my “Embeddings OpenAI” tool, I have 3 model values :

  • text-embedding-3-large
  • text-embedding-3-small
  • text-embedding-ada-002

:thinking:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.