I am using the RabbitMQ trigger in a very simple workflow: get the messages from a queue and then append the raw data to a Google Sheet.
The issue I have found, is that if there are multiple messages in a queue, and the trigger is activated, then the workflow is not executed secuentially. In fact, there is a strange behavior with Google Sheets, that appends the queued messages but the cell value changes before continuing to the next row (please see the gif below):
To reproduce the issue:
1- Create the following workflow. Save but NOT activate.
2- Send multiple messages to the RabbitMQ queue (example, messages with just: 1; 2; 3; 4; 5; 6…)
3- Activate the workflow created at step 1
4- Take a look at the Google Sheet. The data is incomplete and/or randomly skipped during the workflow.
UPDATE: I did another test, this time with Postgres instead of Google Sheets. This time the data is complete, so the trigger seems to be working normally, beside the order. For each run, it inserts the data to the database and there are no missing messages.
The issue seems to be happening only with Google Sheets. For some reason, for each trigger run, the data in a cell can change if the next run is executed very quickly.
I wonder what would happen if the workflow would be more complicated / longer to run… The Rabbit trigger would wait to each workflow execution to finish, before consuming the message and process the next one?
Hi @pbdco, many thanks for the detailed description you have provided! I have managed to reproduce this in the meantime and what is happening here is this:
The RabbitMQ trigger doesn’t fetch all queued items in a single execution. Instead, it will start a single workflow execution for each item it receives, meaning you have multiple executions running in parallel.
So we now have each workflow execution using the Google Sheets node which itself is a wrapper around the Google Sheets API. It uses this method to append data to an existing sheet. So it would appear the behaviour you are seeing here is simply how Google’s API processes (nearly) parallel requests. My guess would be that it internally just checks for the next available row and then writes the data in there. If there are overlapping requests it might use the same “available” row for each.
So if you have the option of using a proper database not behaving like this (such as PostgreSQL), this would certainly be preferrable here.
You could also have it act as an intermediary using an additional workflow if you want your data to end up in a Google Sheet eventually: Have a workflow using the RabbitMQ to write into PostgreSQL and another workflow writing from PostgreSQL to Google Sheets in regular intervals to avoid the aforementioned overlap.
Hey @MutedJam thank you very much for looking further into this issue, and for your clear explanation.
The workaround you are proposing is very logic. Beside this straightforward solution, have you ever thought about adding a feature to n8n’s workflows, in which the user could enable or disable parallel processing easily, at a workflow -or even better, nodes- level?
Hey @pbdco, I can see how this would be useful in many situations, but can also cause quite a strain on resources quite easily (as it means potentially having a lot of data in memory at the same time). Definitely worth raising as a feature request though!
You could implement a check like this manually using the REST API endpoints used by n8n’s UI when opening the execution list though. This would be undocumented (and I haven’t tried this myself) and the endpoints might change in future versions without warning, but it has been done before. Check out this topic for more details: