Workflow Description (Tables A, B, C)
-
Table A (ideal): the complete and “official” set of valid rows. It can vary in size, reaching up to ~4,000 rows.
-
Table B (imperfect): the set of rows received from the form that need to be validated against A.
-
Table C (result): the set of rows from B that are not present in A.
Process
-
I receive two files from the form and process them separately.
-
Each file goes through extraction and the creation of a new field.
-
The flows are merged.
-
A Python script runs that:
-
Compares each row in B against all rows in A.
-
Produces C with the rows from B that were not found in A.
-
Constraints and Current Problem
-
A must always be compared as a whole for every row in B (it cannot be split), because the validation requires scanning the entire ideal dataset.
-
When A is large (up to ~4,000 rows) and B is also large, the workflow crashes due to lack of memory.
-
A “loop and chunking” approach is only possible for B. For A, it doesn’t work, since comparisons must always use the full table.
Expected Result
- C = B \ A (all rows in B that do not exist in A).
Note: The comparison is not just a simple equality check — it is more complex, but I didn’t go into details because they are irrelevant here. Keep in mind that this will only be executable through Python code. If anyone understands my problem and has a solution, please help.
Information on your n8n setup
- n8n version: 1.107.4
- Database (default: SQLite): SQLite
- n8n EXECUTIONS_PROCESS setting (default: own, main): own
- Running n8n via (Docker, npm, n8n cloud, desktop app): n8n Cloud
- Operating system: windows 11