N8N + extract from xlsx + Aggregate + Summarize + insert into Supabase Vector Store

ReinventNYC · January 2, 2025, 4:09am

I’m extracting data from an .xlsx file in my Google Drive, processing it through an aggregate and summarize node, and then inserting it into my Supabase vector store (table=documents). The records update in the Supabase table, but I can’t chat with the data.

When testing with Google Docs and PDF files, everything works fine, but .xlsx files are problematic.

The N8N chat response says: “It seems I’m unable to access specific details about your media data in your Supabase table. However, I can guide you on how to analyze your media data and what kind of insights you might gather.”

I did get it to work once with only 3 columns and 28 rows, but now even that test doesn’t work anymore.

n8n · January 2, 2025, 4:09am

It looks like your topic is missing some important information. Could you provide the following if applicable.

n8n version:
Database (default: SQLite):
n8n EXECUTIONS_PROCESS setting (default: own, main):
Running n8n via (Docker, npm, n8n cloud, desktop app):
Operating system:

ihortom · January 14, 2025, 8:51pm

Welcome to the community @ReinventNYC !

Tip for sharing information

Pasting your n8n workflow

Ensure to copy your n8n workflow and paste it in the code block, that is in between the pairs of triple backticks, which also could be achieved by clicking </> (preformatted text) in the editor and pasting in your workflow.

```
<your workflow>
```

That implies to any JSON output you would like to share with us.

Make sure that you have removed any sensitive information from your workflow and include dummy or pinned data with it!

Can you share your workflow? I think either your aggregation/sumarization does not produce the valid document or there is an issue with how you load it to Supabase.

C_K · February 26, 2025, 9:21am

Hi @ReinventNYC did you find a solutino for that? I have the same problem… i think its better to store one entry per row but the supabase vectore store and the text splitter cuts into many peaces…

Chris_Gates · February 26, 2025, 9:33am

Layman’s Terms (No Jargon!)

Problem: Your Excel file isn’t “talking” to Supabase like PDFs/Docs do.
Why? Excel hides sneaky formatting (colors, formulas, blank rows) that breaks robots.

Simple Fix:

Clean Your Excel:

Delete empty rows/columns
Remove colors/formulas (copy-paste values only)

Test with Baby Steps:

Use 3 columns, 5 rows (like “Name, Age, Job”)
Reprocess → Chat again

Still Broken? Let me know – we’ll debug like detectives!

Experienced User Guide (You’ve Done This Before)

Root Cause: Supabase vector stores need strict formatting. Excel often includes:

Hidden metadata (e.g., cell styles)
Inconsistent data types (mixing text/numbers)
Auto-added “phantom” rows

Workflow Fix:

Preprocess Excel:

Add a “Code Node” after Google Drive node:

javascript

Copy

// Remove hidden characters/formatting return items.map(item => { const cleanData = {}; Object.keys(item.json).forEach(key => { cleanData[key] = String(item.json[key]).trim(); }); return { json: cleanData }; });
2. Aggregate Node Settings:

Set “Group By” to a unique column (e.g., ID)
Disable “Keep All Fields” (prevents junk data)

Supabase Node:

Enable “Skip if Empty” (ignores blank rows)
Check Primary Key matches your table’s unique ID

Test Script:

Copy

Process a 3-row test file 2. Check Supabase table RAW (pgAdmin/Supabase dashboard) 3. If data exists but chat fails → Vector schema mismatch!

Professional Deep Dive (For Engineers)

Technical Analysis:

Failure Point: Data ingestion succeeds, but embeddings aren’t queryable.
Likely Culprits:
- Schema Mismatch: Supabase documents table expects vector column with vector(1536) type (OpenAI standard).
- Data Truncation: Excel’s row limits/aggregate node dropping fields.
- Metadata Collision: Supabase auto-detects schema on first insert; subsequent changes break compatibility.

Advanced Solution:

A. Validate Supabase Schema

sql

Copy

– Run in Supabase SQL Editor SELECT column_name, data_type FROM information_schema.columns WHERE table_name = ‘documents’;

Ensure:

vector column exists → vector type
All Excel columns are present → text/varchar

B. Reprocess with Strict Typing

Add a “Function Node” before Supabase:

javascript

Copy

// Enforce schema for vectors return items.map(item => { return { json: { content: item.json.content, // Your text data metadata: item.json.metadata, // Optional vector: item.json.vector || // Replace with embedding logic Array(1536).fill(0) // Placeholder for testing } }; });

C. Debug with n8n CLI

bash

Copy

n8n execute --workflow --file your_workflow.json --data test.xlsx

Flags:

--pretty (show raw output)
--timeout 99999 (catch hangs)

D. Fallback: Bulk Delete & Re-ingest

sql

Copy

– Reset Supabase table TRUNCATE documents RESTART IDENTITY;

Re-run workflow to force schema detection.

Critical Checks

Excel → JSON Format: Use n8n’s “Spreadsheet File” node (not Google Drive node) for direct .xlsx parsing.
Vectorization: Confirm embeddings are generated before Supabase insert (e.g., OpenAI node).
Supabase Indexes: Create an IVFFlat index for vector column if using pgvector.

Need More? Share:

A screenshot of your aggregate node settings
Supabase documents table schema
5 rows of sample Excel data

Please mark as solution if this resolved your issue, be blessed!

system · May 27, 2025, 9:33am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.