Extract from File fails entire file for individual row errors

I’m running into an Issue where extracting data from a file is erroring the entire worflow for a single row in a file. I’ve got a file with 10k rows and users will modify this file manually (much to my chagrin).

Why is the entire file failed when it’s only one row or a few rows are formatted wrong?

Is there another method that I should be using to extract the data so that I can get the good data and carry on with my workflow and then ship the error rows off to another workflow to deal with?

I’m not seeing a clear pathway here and I didn’t find anything useful through search but i feel like this must be an obvious issue others have run into.

Example File:

Field1|Field2
ValueA1,ValueA2
ValueB1,ValueB2
… thousands more lines…
ValueC1,ValueC2,Garbage

This entire process fails. When you choose the settings option “Continue (Using Error Output)” there’s none of the valid data in the succss branch.

I feel like this has to be a solved problem because who would want to fail a file where 99.99% of it parses correclty, but I’m not finding anything useful in search.

1 Like

hello @Bryce_L

That’s quite a common issue when you are dealing with data parsing :slight_smile:

You can try different options:

  1. Use the Starting Line and Max Rows to Load options of the Extract From File node (CSV mode), then loop over the file by offsetting these two values (via Loop node or better as a sub workflow).

  2. Try to clean up the file with some local scripts and then parse it via n8n.

  3. Load the File as Text and parse it directly. Like in an example:

It’s really only been recently historically only “good” lines have been processed and then configurable “thresholds” were added in before Failing a file. These past 3 years though this failing the whole file for 1 error has started to become the norm and it’s pretty disapointing. I’ll have to see if it’s a reasonable PR to put together some better options.

Option 1 won’t work for our case but Option 2 is disappointing because the goal is to remove on-device scripts and get processing into a visual and unified location.

Hmm I had a similar Idea but I’ve been running into what looks like very weird bugs with the merge node. In this example the workflow has a new bug where it refuses to return anything when you query “input2”, but it also has the same bug as Production where you can’t query input1.UserID. If you do a select * on input 1 it gives you UserID and DeptNumber but it gives you blank results when you call for input1.UserID.

It’s not weird, as input 2 is not connected.

It works after connecting the input

Figures I’d miss something when quickly putting together a sample, thanks for that.

However, there’s still the primary issue of not being able to query UserID. It shows up when you query for a.* but you can’t select the field itself.

Can’t say about SQL mode, but Choose branch mode works as intended

Well that was fun. Looks like the second “Extract From CSV” needs to include the “Exclue BOM” setting otherwise it adds the zerowidth non breaking space to the name of the first field.

I Included an example of it working and not working just for posterity.

After going through all this I think that for simple files this is fine but for critical workflows it would make the most sense to do a light weight script wrapper around qsv that will generate 2 files, 1 file that has all the rows with the correct number of columns and an error file with the others. Then I would process only the good file in N8n and send the error file off to soemone to look at if it has any content.

For anyone from the future here’s a QSV command to filter by number of columns in each row:

qsvp luau filter --delimiter="|" --no-headers --colindex “#col ~=5” NameOfFile.txt > GoodFile.txt

qsv/qsvp: just has to do with how the binary was compiled.
luau: This is the scripting language in qsv
filter: The operation
--delimiter="|": This is where you specify your delimiter if it’s not a comma.
--no-headers: This option is because I don’t use headers on that specific file. If your file has headers make sure you take out this option.
--colindex: This lets you access columns by index which allows you to use #col and get the number of columns for the row. (You don’t technically need if you use--no-headers. I’m specifically including it so you don’t get messed up if your file does have headers.)
~=5: This is your filter condition. The correct number of columns for this file is 5. Change this to be what you need for your specific file. I don’t like that they use tilde equals for “not equal to” but I don’t make the rules here.

If you don’t like QSV you can do it with XAN
xan filter --delimiter=“|” --no-headers ‘len(cols()) != 5’ NameOfFile.txt > GoodFile.txt

1 Like