I have problema with too much data

joaobem · August 21, 2025, 1:36pm

Workflow Description (Tables A, B, C)

Table A (ideal): the complete and “official” set of valid rows. It can vary in size, reaching up to ~4,000 rows.
Table B (imperfect): the set of rows received from the form that need to be validated against A.
Table C (result): the set of rows from B that are not present in A.

Process

I receive two files from the form and process them separately.
Each file goes through extraction and the creation of a new field.
The flows are merged.
A Python script runs that:
- Compares each row in B against all rows in A.
- Produces C with the rows from B that were not found in A.

Constraints and Current Problem

A must always be compared as a whole for every row in B (it cannot be split), because the validation requires scanning the entire ideal dataset.
When A is large (up to ~4,000 rows) and B is also large, the workflow crashes due to lack of memory.
A “loop and chunking” approach is only possible for B. For A, it doesn’t work, since comparisons must always use the full table.

Expected Result

C = B \ A (all rows in B that do not exist in A).

Note: The comparison is not just a simple equality check — it is more complex, but I didn’t go into details because they are irrelevant here. Keep in mind that this will only be executable through Python code. If anyone understands my problem and has a solution, please help.

Information on your n8n setup

n8n version: 1.107.4
Database (default: SQLite): SQLite
n8n EXECUTIONS_PROCESS setting (default: own, main): own
Running n8n via (Docker, npm, n8n cloud, desktop app): n8n Cloud
Operating system: windows 11

jabbson · August 21, 2025, 5:43pm

Hey @joaobem hope all is well. Welcome to the community.

Could you please provide an example of input files and the expected output? That would really help with understanding the requirements and provide enough context so that we could hypothesize a solution.

Mohammed_BOUAZZA · August 21, 2025, 7:57pm

Hi , first of all try to think to migrate to PostgreSQL (supabase)i f you have larger data.