Trouble with Pinecone Vector Store node: Cannot read properties of undefined (reading 'toString')

Bret_Truchan · September 28, 2024, 8:27pm

Ok, I figured this out! The other user who asked this before had it correct, but I just didn’t wrap my brain around it.

Here’s what happened.

I used Python to load my data into my Pinecone database. The python script was written by AI, and it did a relatively good job. However, there was one important part missing: The “text” metadata.

As ChatGPT defines it:

Metadata in vector databases refers to additional information associated with each vector, such as labels, tags, or attributes, that help in organizing, filtering, or querying the vectors.

When querying vectors, metadata is used to filter or refine search results, enabling more targeted retrieval. For example, you can specify conditions on metadata to only return vectors that meet certain criteria, like those belonging to a specific category or timestamp.

My workflow setup was essentially correct:

However, the Pinecone node requires that the “text” key exists in the metadata.

I updated my python script, upserted the data into the pinecone database, and there you have it! It worked!

Here’s my python code, but BE AWARE that this was written by AI and is likely flawed in some way.

#
# Written by AI, and there are likely a lot of issues with this script.
# Please be wary of using it.
#

import openai
import json
from pinecone import Pinecone, ServerlessSpec
from tqdm import tqdm

# Set your API keys
PINECONE_API_KEY = ""
OPENAI_API_KEY = ""
index_name = "candidate-profiles"  # replace with your index name

# Initialize Pinecone
pc = Pinecone(api_key=PINECONE_API_KEY)

# Create or connect to an index
try:
    if index_name not in pc.list_indexes():
        pc.create_index(
            name=index_name,
            dimension=1536,  # OpenAI's ada-002 model uses 1536 dimensions
            metric="cosine",  # Assuming cosine similarity is what you're using
            spec=ServerlessSpec(
                cloud="aws",
                region="us-east-1"
            )
        )
    else:
        print(f"Index '{index_name}' already exists.")
except Exception as e:
    if "ALREADY_EXISTS" in str(e):
        print(f"Index '{index_name}' already exists. Proceeding to use it.")
    else:
        raise e

index = pc.Index(index_name)

# Initialize OpenAI Client
client = openai.OpenAI(api_key=OPENAI_API_KEY)

# Function to get embeddings
def get_embedding(text):
    response = client.embeddings.create(input=text, model="text-embedding-ada-002")
    return response.data[0].embedding

# Load and process profiles
with open('profiles.json', 'r', encoding='utf-8') as f:
    profiles = json.load(f)

print("Loading data...")

# Prepare data for Pinecone
for profile in tqdm(profiles):
    # Safely extract all required fields with default empty strings
    name = profile.get('name') or ''
    title = profile.get('title') or ''
    location = profile.get('location') or ''
    industry = profile.get('industry') or ''
    experience = profile.get('experience') or ''
    skills = profile.get('skills') or ''
    detailsLink = profile.get('detailsLink') or ''
    profileId = profile.get('profileId') or ''

    # Skip the profile if profileId is missing
    if not profileId:
        print(f"Missing profileId for profile: {name}. Skipping.")
        continue

    # Combine text fields for embedding
    text_fields = [name, title, location, industry, experience, skills]
    text_for_embedding = ' '.join(filter(None, text_fields))

    # Get the embedding
    try:
        embedding = get_embedding(text_for_embedding)
    except Exception as e:
        print(f"Error generating embedding for profile {profileId}: {e}")
        continue

    # Prepare metadata and include the "text" key
    metadata = {
        "text": text_for_embedding,  # Add this line to include the "text" key
        "name": name,
        "title": title,
        "location": location,
        "industry": industry,
        "detailsLink": detailsLink,
        "profileId": profileId
    }

    # Remove metadata entries with None or empty values
    metadata = {k: v for k, v in metadata.items() if v}

    # Upsert into Pinecone index
    try:
        index.upsert([(profileId, embedding, metadata)])
    except Exception as e:
        print(f"Error upserting profile {profileId} into Pinecone: {e}")
        continue

print("Data loading complete!")