Pinecone Vector Store: Getting 132 records for a batch of 20 rows (can’t get 1 row → 1 chunk → 1 vector)

Hi everyone,

I’m trying to build a simple RAG-ish workflow where each row from a Google Sheet becomes exactly one document / one embedding / one vector in Pinecone – no extra splitting.

But I keep ending up with far more vectors than rows. For a single batch of 20 rows, I’m seeing 132 records in my Pinecone index.


What I’m trying to achieve

  • Read rows from a Google Sheet (one row per “tool”).

  • Build a single text field per row like:

    Tool: {{name}}\nDescription: {{description}}\nLink: {{link}}

  • Add metadata (name, link, description).

  • Generate embeddings with Gemini.

  • Insert into Pinecone Vector Store so that:

    • 1 sheet row → 1 text “chunk” → 1 embedding → 1 vector in Pinecone.
    • I’m not interested in further chunking; each tool is already small.

Current workflow

Nodes (simplified):

  1. Manual Trigger
  2. Read Google Sheets (Get Row(s) from a specific sheet)
  3. CodePrepare Data with Metadata
    • Takes each row and outputs:
      {
        "pageContent": "Tool: ...\nDescription: ...\nLink: ...",
        "metadata": {
          "name": "...",
          "link": "...",
          "description": "..."
        }
      }
      
  4. Split In Batches (Loop Over Items)
    • batchSize: 20
    • Used to avoid sending too many items to Pinecone at once.
  5. Embeddings Google Gemini Plus (3rd‑party embedding node)
    • Connected as ai_embedding into Pinecone.
  6. Character Text SplitterDefault Data Loader
    • Character Text Splitter connected into Default Data Loader as ai_textSplitter.
    • Default Data Loader connected into Pinecone as ai_document.
    • Text splitting mode: Custom, metadata mapped from metadata.name, metadata.link, metadata.description.
  7. Pinecone Vector Store (Insert Documents mode)[Pinecone node]
    • Connected to:
      • ai_document from Default Data Loader
      • ai_embedding from Embeddings Google Gemini Plus
    • mode: insert
  8. Wait + back to Split In Batches (classic looping pattern)[Loop Over Items]

What I expect vs what happens

Expected:

  • If I feed 20 rows into this flow, I expect:
    • 20 items going into Pinecone.
    • 20 vectors in my n8n-tools index (one per row).

Actual:

  • For 20 rows in a single batch, I end up with 132 records in Pinecone.
  • So something in the pipeline(Seems like the text splitter to me) is fanning out items. It feels like:
    • Either the text splitter / data loader is creating multiple “documents” per row.

I tried the suggested approach of:

  • Skipping the Default Data Loader and Text Splitter and just sending my prepared pageContent straight to Pinecone.
    However, in Insert Documents mode, the Pinecone Vector Store node requires a document sub‑node connection (data loader), so I can’t fully bypass that.[Pinecone node]

I also tried:

  • Setting the Character Text Splitter chunkSize to a very large value to “avoid” splitting (as suggested in another thread for similar use‑cases), so each input doc should become a single chunk.
  • But even with a huge chunkSize, the total number of records in Pinecone is still far above my input row count.

What I have checked so far

  • I understand that:

    • Split In Batches only limits how many items per iteration, not the total: if 132 items arrive at its input, they’ll all eventually be processed in a series of 20‑item batches.[Loop Over Items]
    • The Pinecone Vector Store node can batch multiple records to upsert at once and doesn’t enforce 1:1 row mapping – it just takes all items it receives and inserts them.[Pinecone node]
    • Namespaces / batching behavior is similar to what’s described in the community thread about namespaces and batching: the node groups records to upsert for efficiency, not by source file/row.[Namespace batching]
  • However, even when I try to “disable” splitting via a large chunkSize, the math still doesn’t add up: a batch of 20 rows ends up as 132 vectors, which strongly suggests that more items are being produced somewhere between my sheet and the Pinecone node.

Unfortunately, with the info I found in the docs and forum, I still can’t pinpoint exactly where that multiplication happens in this workflow.


What I’m looking for

  1. Concrete debugging guidance:

    • Where exactly should I inspect item counts (per node) to confirm:
      • Input items at Prepare Data with Metadata
      • Items after Character Text Splitter
      • Items after Default Data Loader
      • Items the Pinecone node actually receives
  2. A reliable pattern for “no extra chunking”:

    • In Insert Documents mode, what’s the recommended way to wire things so that:
      • Each plain JSON item with pageContent and metadata becomes one document.
      • No automatic splitting happens (or is effectively a no‑op).
    • Is there a way to configure the Default Data Loader so that it simply wraps each input item as a single document without further splitting, while still being compatible with the Pinecone vector store node?
  3. Confirmation of expected behavior:

    • Is it expected that even with “custom” text splitting and a huge chunkSize, the Data Loader / Text Splitter combo might still produce more than one document per row?
    • Or am I missing some subtle setting that makes the loader treat previous input items again?

Thanks in advance for any help in getting to a clean 1 row → 1 document → 1 vector setup!

Ah! This happens because the Default Data Loader + Character Text Splitter combo is producing multiple chunks per row, even with a huge chunkSize. That’s why 20 rows end up as 132 vectors the loader treats each incoming item as splittable.

How to debug item counts:

  1. Prepare Data with Metadata → check output.items.length (should match your row count).
  2. After Character Text Splitter → see how many items exist now (likely multiplied).
  3. After Default Data Loader → final pre-Pinecone item count.
  4. Pinecone node → receives all items as-is.

Fix / recommended setup:

  • Skip Default Data Loader and Text Splitter if you don’t need splitting.
  • Use a Code node to output the exact JSON structure Pinecone expects:
{
  "document": {
    "pageContent": "Tool: ...\nDescription: ...\nLink: ...",
    "metadata": {
      "name": "...",
      "link": "...",
      "description": "..."
    }
  }
}
  • Connect the Code node directly to Pinecone → Insert Documents mode.
  • Result: 1 row → 1 document → 1 vector, no extra items.

If you want, I can make a visual workflow showing this clean setup:

I think you missed my message. I have already stated that I have tried this approach.