Analyzing a log file, line by line

Describe the issue/error/question

(newby question)

I want to analyze a log file (text, one line at a time), the problem being it can be quite large (between 500k and 1M lines). What is the proper way of doing that with n8n? I tried loading the file as binary and using the “spreadsheet” module ; I also tried “split in batches” but I cannot figure out how to merge after batching.

What is the error message (if any)?

“Workflow execution process did crash for an unknown reason!” or Chrome crash or error code 413.

Please share the workflow

I tried various things but I think I’m doing it wrong.

Share the output returned by the last node

Information on your n8n setup

  • n8n version: 0.206.1
  • Database you’re using (default: SQLite): default
  • Running n8n with the execution process [own(default), main]: ?
  • Running n8n via [Docker, npm, n8n.cloud, desktop app]: Docker self hosted

Hey @Lecture_Forums,

Welcome to the community :tada:

There isn’t going to be an easy way as it could be a lot of data, When we read a file we load the entire file into memory so there is potential for it to run out of memory which is likely what you have seen.

Depends on what you are doing with the file it might be worth using the Execute command node to parse the file or split it into smaller chunks before working with it.

Thanks for your answer. Is there a way to execute the parsing without loading the whole file? For example a parameter to “load binary”? Because as far as I understand, I cannot split it from inside n8n, so that would mean using an external tool, so would it be possible to loop according to an external command’s output?
TIA

I was thinking, if no “brick” exists which would load a text file by chunks of lines, I’ll try to write one.

Hey @Lecture_Forums,

The only thing I can think of at the moment would be to use an external command to split the file then return the output of an ls command and return that then extract those file paths splitting on a new line character or space whatever is used and read them in to process.

You can also use the command

sed -n '100,150p' /logs/my-file.txt

It will then output only the lines 100-150. If you add that to an Execute Command Node, set the line numbers dynamically via an expression, and then loop until it is empty you should be able to keep the memory usage very low. At least if you keep the reading part of the loop in a Sub-Workflow which does not return any data.

Here is an example workflow that does at least part of it (reading the file and then splitting each line into a different item:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.