Pinecone Vector Store: Getting 132 records for a batch of 20 rows(can’t get 1 row → 1 chunk → 1 vector)

Hi everyone,

I’m trying to build a simple RAG-ish workflow where each row from a Google Sheet becomes exactly one document / one embedding / one vector in Pinecone – no extra splitting.

But I keep ending up with far more vectors than rows. For a single batch of 20 rows, I’m seeing 132 records in my Pinecone index.


What I’m trying to achieve

  • Read rows from a Google Sheet (one row per “tool”).

  • Build a single text field per row like:

    Tool: {{name}}\nDescription: {{description}}\nLink: {{link}}

  • Add metadata (name, link, description).

  • Generate embeddings with Gemini.

  • Insert into Pinecone Vector Store so that:

    • 1 sheet row → 1 text “chunk” → 1 embedding → 1 vector in Pinecone.
    • I’m not interested in further chunking; each tool is already small.

Current workflow

Nodes (simplified):

  1. Manual Trigger
  2. Read Google Sheets (Get Row(s) from a specific sheet)
  3. CodePrepare Data with Metadata
    • Takes each row and outputs:
      {
        "pageContent": "Tool: ...\nDescription: ...\nLink: ...",
        "metadata": {
          "name": "...",
          "link": "...",
          "description": "..."
        }
      }
      
  4. Split In Batches (Loop Over Items)
    • batchSize: 20
    • Used to avoid sending too many items to Pinecone at once.
  5. Embeddings Google Gemini Plus (3rd‑party embedding node)
    • Connected as ai_embedding into Pinecone.
  6. Character Text SplitterDefault Data Loader
    • Character Text Splitter connected into Default Data Loader as ai_textSplitter.
    • Default Data Loader connected into Pinecone as ai_document.
    • Text splitting mode: Custom, metadata mapped from metadata.name, metadata.link, metadata.description.
  7. Pinecone Vector Store (Insert Documents mode)[Pinecone node]
    • Connected to:
      • ai_document from Default Data Loader
      • ai_embedding from Embeddings Google Gemini Plus
    • mode: insert
  8. Wait + back to Split In Batches (classic looping pattern)[Loop Over Items]

What I expect vs what happens

Expected:

  • If I feed 20 rows into this flow, I expect:
    • 20 items going into Pinecone.
    • 20 vectors in my n8n-tools index (one per row).

Actual:

  • For 20 rows in a single batch, I end up with 132 records in Pinecone.
  • So something in the pipeline(Seems like the text splitter to me) is fanning out items. It feels like:
    • Either the text splitter / data loader is creating multiple “documents” per row.

I tried the suggested approach of:

  • Skipping the Default Data Loader and Text Splitter and just sending my prepared pageContent straight to Pinecone.
    However, in Insert Documents mode, the Pinecone Vector Store node requires a document sub‑node connection (data loader), so I can’t fully bypass that.[Pinecone node]

I also tried:

  • Setting the Character Text Splitter chunkSize to a very large value to “avoid” splitting (as suggested in another thread for similar use‑cases), so each input doc should become a single chunk.
  • But even with a huge chunkSize, the total number of records in Pinecone is still far above my input row count.

What I have checked so far

  • I understand that:

    • Split In Batches only limits how many items per iteration, not the total: if 132 items arrive at its input, they’ll all eventually be processed in a series of 20‑item batches.[Loop Over Items]
    • The Pinecone Vector Store node can batch multiple records to upsert at once and doesn’t enforce 1:1 row mapping – it just takes all items it receives and inserts them.[Pinecone node]
    • Namespaces / batching behavior is similar to what’s described in the community thread about namespaces and batching: the node groups records to upsert for efficiency, not by source file/row.[Namespace batching]
  • However, even when I try to “disable” splitting via a large chunkSize, the math still doesn’t add up: a batch of 20 rows ends up as 132 vectors, which strongly suggests that more items are being produced somewhere between my sheet and the Pinecone node.

Unfortunately, with the info I found in the docs and forum, I still can’t pinpoint exactly where that multiplication happens in this workflow.


What I’m looking for

  1. Concrete debugging guidance:

    • Where exactly should I inspect item counts (per node) to confirm:
      • Input items at Prepare Data with Metadata
      • Items after Character Text Splitter
      • Items after Default Data Loader
      • Items the Pinecone node actually receives
  2. A reliable pattern for “no extra chunking”:

    • In Insert Documents mode, what’s the recommended way to wire things so that:
      • Each plain JSON item with pageContent and metadata becomes one document.
      • No automatic splitting happens (or is effectively a no‑op).
    • Is there a way to configure the Default Data Loader so that it simply wraps each input item as a single document without further splitting, while still being compatible with the Pinecone vector store node?
  3. Confirmation of expected behavior:

    • Is it expected that even with “custom” text splitting and a huge chunkSize, the Data Loader / Text Splitter combo might still produce more than one document per row?
    • Or am I missing some subtle setting that makes the loader treat previous input items again?

Thanks in advance for any help in getting to a clean 1 row → 1 document → 1 vector setup!

Hi @ShashwatSingh vector stores don’t behave like spreadsheets with strict “1 row → 1 fixed chunk → 1 vector” they’re built for semantic similarity search where text is often split and embedded into multiple vectors, so even if you configure chunking to be a no op, the system is free to create multiple embeddings per input text and you shouldn’t rely on getting exactly one stored vector per sheet row

Thank you for your response. Is there no workaround? I mean, if I want no splitting of data? Or will I need to create a custom node for that?

@ShashwatSingh There can be workarounds like limit the retrieval to 1 in the pinecone, but i think the best one would be something like to bypass that pair (for example via a custom node or simple wrapper that sends each {pageContent, metadata} as a single document to the Pinecone Vector Store node) rather than relying on the built‑in text‑splitting pipeline. Like so you can se segregate that pair to retrieve that 1 single chunk.

Thank you for your suggestions. I tried sending each {pageContent, metadata} as a single document" in the Pinecone Vector Store node, but the problem of splitting even a single row in chunks still persists, as it’s mandatory to add a data loader, and if I add that, it splits the data. I tried increasing the chunk size, but for some reason it still breaks a single row into chunks.

@ShashwatSingh Maybe turn off/bypass splitting and send exactly one combined text field per sheet row into Pinecone? Like those must be just text splittings duplicating your rows. Let me know what works.

Yeah so i found a fix in default data loader. I was using all input data; I changed it to specific data and then pointed that to the row, as for some reason putting all input which was also only the row, the default data loader was splitting it.

2 Likes

That is a nice fix! But using Vector data base and fetching a single chunk is a bit rocky haha! This cannot be used in production as Sheets is gonna tobbe better alternative in most of the cases as this going to increase cost in embeddings , glad u found a workaround!
Cheers!

Yeah, it is Rocky. This is one of my hobby projects. Also, all rows had completely distinct data, so I wanted to store it in different vectors.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.