Working execution shows as failed

Hi,

I have a workflow that ingests data from Mailchimp to Bigquery and Updates Mailchimp profiles with data from a Bigquery table. The execution lasts around 10 minutes and ends with a memory error even though all nodes are succesfully executed and all batches seem processed. The error shows up when the batch node goes to the done output.

The last part that uploads data BQ to Mailchimp works if I execute it separately

I have tried different batch sizes and the behaviour is the same.

Would a subworkflow help here? Use a Set node with no items after ingesting data in Bigquery and then trigger the read from Bigquery and update Mailchimp workflow.

Do you see any other issue with the wokflow in total?

Thank you in advance!

Information on your n8n setup

  • n8n version:1.6.1
  • Running n8n via: cloud

Hi @xristos :wave: Breaking this up into subworkflows would definitely be recommended if you’re getting memory errors :+1: It might just be slightly too much data at once :bowing_man:

A quick, sneaky edit:

You may also want to ensure the sub-workflow doesn’t return all of its data to the parent - you can do that with a Set node, like this:

The “Execute Once” will return only a single (very small or even empty) item :+1:

1 Like

Hi @EmeraldHerald,

thank you for your response! I actually tried putting a set node in the end of the subworflow so that no data is transferred to the parent one but still get an error in the parent workflow. Subworkflow finishes successfully and parent workflow fails.

Parent:

Subworkflow:

Hi @xristos - thanks for sharing that! Out of curiosity, how much data is this processing? :thinking:

@EmeraldHerald currently it is importing ~30k contacts from Mailchimp and is updating 2700 contacts. The data size is not big. In Bigquery the whole contacts table is 5MB and the Update Table is less than 20MB.

In production there will be higher amounts of data in the update node (~10x) so the workflow should be robust. Does it make sense to split the workflow in more subworkflows i.e. after IF Node?

@xristos that might be too many calls at once - you could split into another sub-workflow or split this in batches if you want to make absolutely certain :bowing_man:

1 Like

@EmeraldHerald could you elaborate more on this?
I am already using split in batches in the update node to update 100 contacts each time. I have tried with less and it just takes more time.

In the HTTP request there is also an offset for pagination and I am requesting 500 contacts each time. There is a loop with IF node. I guess only the 500 contacts are stored in memory on each run right?

Hi @xristos - sorry, I didn’t mean to be confusing!

The sub-workflow may not be working to help with memory issues as it’s still processing a large amount of data (such as thousands of items), so this may need an overall change in workflow architecture to ensure that you’re processing everything in smaller chunks :bowing_man:

One issue that you may want to take a peek at - it looks like you’re using the HTTP Request node to call the Mailchimp API in your parent workflow, which then loops until you have all the items in the parent, which might be where some of the memory issues start.

A suggestion here would be fully processing only one page of data from Mailchimp at a time in the sub-workflow - as in downloading the Mailchimp data, inserting it into BigQuery, querying the profiles in BigQuery, and then updating Mailchimp per page. You could then return a single response item with information such as the processed page (and perhaps the number of total results to the parent) and then have the parent workflow decide if another sub-workflow execution is needed, if that makes sense?

I’m not sure what n8n cloud plan you’re currently using, but an upgrade to that plan (if possible) might help too, as each tier of n8n cloud has double the resources of the previous tier. :bowing_man:

2 Likes

Hi @EmeraldHerald ,

I am already in the Pro plan :slight_smile:
Unfortunately querying and udpating in batches does not really fit my case but I splitted the workflow into 2 subworkflows and it is working when updating ca. 3k profiles.
Parent


Workflow 1

Workflow 2

The issue I am having now is that the profiles table can be quite large i.e. 30k-50k Profiles that need to be updated in production. After 45 min Workflow 2 fails which I suppose is some timeout error because it takes too long. 3k Profiles take around 15 minutes so if the size is x10 the execution will last very long.

Do you have any Idea how to speed it up? is there a sweet spot in the update batches in Mailchimp node? currently I am using 50. I found out that with higher batches i.e. 500 it took longer. I fear this can be a dealbreaker for using n8n.

Hi @xristos :wave: Thanks for going into that detail.

Is there any possibility for you to break this down further into smaller batches? By that I mean having the workflows handle things in even smaller batches than they do currently - possibly even through an additional sub-workflow that has the empty Set node at the end as well?

If you truly can’t break this down further, then looking into self-hosting may unfortunately be a good shout :bowing_man:

I see a couple of options here. One option is to use a global variable to store the offset value (of the pagination). The logic would be:
Master flow that will receive one page with contacts at a time (e.g. first/next 500 contacts) and pass those contacts to the next workflow to process it. Then that master flow saves the next offset value (or next page url) to the global variable and exits (and thus releases the memory). That master flow should be run with a schedule trigger with enough time to not overlap the executions (e.g. run every 15 minutes if the flow works for 10-12 minutes)

So the overall time for 10k contacts would be:
10 000 / 500 x 15 = 300 minutes in that setup
and the memory will be limited to 500 contacts + processing of them (two workflows at a time)

It can be adjusted to pass X contacts to the sub-flow and exit.

It has two disadvantages:

  1. It is bound to the fixed time (every 15 minutes in the example above) even if the sub-flow can process contacts faster.
  2. If the contacts will be processed for a longer time (more than 15 minutes), then the overlapping may occur (e.g. the first execution is still running and the second one is started with the same set of data).

The other option is to use a queue service like RabbitMQ to store the contacts. And a workflow with a trigger node to process the contacts. The main problem here is how to get all 30-50k contacts as the workflow will definitely run out of memory at some point. I use an AWS lambda function with s3 storage for such cases

Thank you for your replies. I have solved the issue by not using the Mailchimp node to update contacts.
Instead I am using an HTTP request to the batch operations endpoint of Mailchimp which is much faster.
It seems like the node was making a single request for each email in the input .

so in a subworkflow I create the table with the data that need to be uploaded in Mailchimp, I iterate through them and call another workflow with the batch update request


This takes ~4 min to update 30k contacts.

1 Like

Yes, the batch method will always be better, if it is supported by the API :slight_smile:

Hi @EmeraldHerald, the parent workflow in the above post ends as failed when the subworkflow ends. There is no error message in the executions and the data is actually uploaded to mailchimp. Do you know what would cause that? In the subworkflow I am using a set node in the end to avoid carrying data back to parent workflow. Thanks!

Hi @xristos - Could you try updating to the latest version of n8n and letting me know if this still happens to you?

Hi @EmeraldHerald , I updated to 1.11.2 and now it fails after the the subworkflow has been called 2 times. This means, still no errored node or error message but also no complete upload of the data in mailchimp. Before the update I was running v.1.8

Hi @xristos - can you DM me your cloud username? :pray:

1 Like

Hi @xristos - it looks like your cloud instance is running out of memory :see_no_evil: There are two tiers of Pro, and the higher tier may help - but if this cannot be broken down further, I’m afraid you might need to look into self-hosting :bowing_man:

Thank you for looking into this. Can you see which operation is causing the memory runout?
It is strange because the error was showing only after the last batch was completed and the subworkflow ended.
If the subworkflow is not returning any data to the parent workflow why would this cause a memory error to it instead of just finishing?
After upgrading it fails before completing all the subworkflow iterations so I guess sth has changed?

Self hosting is not an option so I will try is in smaller chunks.
For this I would appreciate it if you could point me to the nodes I should focus more.
i.e. Should I reduce the profiles I get from BQ in the parent node to 1-2k instead of 4k at the moment and call the subworkflow more times than I am doing it now? there are 65k items that need to be read from BQ and synced in Mailchimp. Async Batch requests to Mailchimp can hold up to 1000 items per request. So by getting 1k each time from BQ I would call the subworkflow around 65 times.

Is that what you mean by breaking it down further?

Hi @xristos :wave: It looks like it is indeed your Mailchimp workflows - keep in mind I can only see a minimal amount of data through the logs, and I can’t see your actual workflows beyond anything you’ve shared here through the forum thread.

Out of curiosity, are you also saving successful executions? The fact that it was only erroring in the way you describe suggests it could be something like this causing it :thinking:

It’s difficult to say exactly which node, but the usual nodes on the “naughty” list that I can see here would be the Code nodes. You can see some strategies to avoid out of memory errors here: Memory-related errors | n8n Docs

That’s partially what I mean by breaking it down further - I would definitely suggest reducing the amount of profiles you’re getting from BigQuery and seeing if that helps. But also if you can split up certain operations into other sub-workflows, that may help, too.

Otherwise you may need to yet again upgrade, but even then I can’t promise it will fully fix the issue :bowing_man: