Hi everyone,
I’m trying to build a simple RAG-ish workflow where each row from a Google Sheet becomes exactly one document / one embedding / one vector in Pinecone – no extra splitting.
But I keep ending up with far more vectors than rows. For a single batch of 20 rows, I’m seeing 132 records in my Pinecone index.
What I’m trying to achieve
-
Read rows from a Google Sheet (one row per “tool”).
-
Build a single text field per row like:
Tool: {{name}}\nDescription: {{description}}\nLink: {{link}} -
Add metadata (name, link, description).
-
Generate embeddings with Gemini.
-
Insert into Pinecone Vector Store so that:
- 1 sheet row → 1 text “chunk” → 1 embedding → 1 vector in Pinecone.
- I’m not interested in further chunking; each tool is already small.
Current workflow
Nodes (simplified):
- Manual Trigger
- Read Google Sheets (
Get Row(s)from a specific sheet) - Code – Prepare Data with Metadata
- Takes each row and outputs:
{ "pageContent": "Tool: ...\nDescription: ...\nLink: ...", "metadata": { "name": "...", "link": "...", "description": "..." } }
- Takes each row and outputs:
- Split In Batches (Loop Over Items)
batchSize: 20- Used to avoid sending too many items to Pinecone at once.
- Embeddings Google Gemini Plus (3rd‑party embedding node)
- Connected as
ai_embeddinginto Pinecone.
- Connected as
- Character Text Splitter → Default Data Loader
Character Text Splitterconnected intoDefault Data Loaderasai_textSplitter.Default Data Loaderconnected into Pinecone asai_document.- Text splitting mode: Custom, metadata mapped from
metadata.name,metadata.link,metadata.description.
- Pinecone Vector Store (
Insert Documentsmode)[Pinecone node]- Connected to:
ai_documentfrom Default Data Loaderai_embeddingfrom Embeddings Google Gemini Plus
mode: insert
- Connected to:
- Wait + back to Split In Batches (classic looping pattern)[Loop Over Items]
What I expect vs what happens
Expected:
- If I feed 20 rows into this flow, I expect:
- 20 items going into Pinecone.
- 20 vectors in my
n8n-toolsindex (one per row).
Actual:
- For 20 rows in a single batch, I end up with 132 records in Pinecone.
- So something in the pipeline(Seems like the text splitter to me) is fanning out items. It feels like:
- Either the text splitter / data loader is creating multiple “documents” per row.
I tried the suggested approach of:
- Skipping the Default Data Loader and Text Splitter and just sending my prepared
pageContentstraight to Pinecone.
However, in Insert Documents mode, the Pinecone Vector Store node requires a document sub‑node connection (data loader), so I can’t fully bypass that.[Pinecone node]
I also tried:
- Setting the Character Text Splitter
chunkSizeto a very large value to “avoid” splitting (as suggested in another thread for similar use‑cases), so each input doc should become a single chunk. - But even with a huge
chunkSize, the total number of records in Pinecone is still far above my input row count.
What I have checked so far
-
I understand that:
Split In Batchesonly limits how many items per iteration, not the total: if 132 items arrive at its input, they’ll all eventually be processed in a series of 20‑item batches.[Loop Over Items]- The Pinecone Vector Store node can batch multiple records to upsert at once and doesn’t enforce 1:1 row mapping – it just takes all items it receives and inserts them.[Pinecone node]
- Namespaces / batching behavior is similar to what’s described in the community thread about namespaces and batching: the node groups records to upsert for efficiency, not by source file/row.[Namespace batching]
-
However, even when I try to “disable” splitting via a large
chunkSize, the math still doesn’t add up: a batch of 20 rows ends up as 132 vectors, which strongly suggests that more items are being produced somewhere between my sheet and the Pinecone node.
Unfortunately, with the info I found in the docs and forum, I still can’t pinpoint exactly where that multiplication happens in this workflow.
What I’m looking for
-
Concrete debugging guidance:
- Where exactly should I inspect item counts (per node) to confirm:
- Input items at
Prepare Data with Metadata - Items after
Character Text Splitter - Items after
Default Data Loader - Items the Pinecone node actually receives
- Input items at
- Where exactly should I inspect item counts (per node) to confirm:
-
A reliable pattern for “no extra chunking”:
- In Insert Documents mode, what’s the recommended way to wire things so that:
- Each plain JSON item with
pageContentandmetadatabecomes one document. - No automatic splitting happens (or is effectively a no‑op).
- Each plain JSON item with
- Is there a way to configure the Default Data Loader so that it simply wraps each input item as a single document without further splitting, while still being compatible with the Pinecone vector store node?
- In Insert Documents mode, what’s the recommended way to wire things so that:
-
Confirmation of expected behavior:
- Is it expected that even with “custom” text splitting and a huge
chunkSize, the Data Loader / Text Splitter combo might still produce more than one document per row? - Or am I missing some subtle setting that makes the loader treat previous input items again?
- Is it expected that even with “custom” text splitting and a huge
Thanks in advance for any help in getting to a clean 1 row → 1 document → 1 vector setup!