N8n fails to run the flow with webhook and shows no helpful info in the execution history

Describe the issue/error/question

I’ve deployed node red on a 2 core t2 medium machine on aws in the own process mode and my main flow starts from a webhook from invision community software. Sometimes my flows fail and when I got to the execution none of the nodes received any data and have neither ticks/checks nor error signs. The execution time is less than a second and retrying it will not make it succeed either.

I thought the webhooks might be the issue and simply don’t send any info but I created a small ASP.NET core app and send the same wbhook to them and log the content with date and time and content of the sent body to a file and the webhooks sent are fine.

Please share the workflow

As you can see I’m not even parsing the workflow body. It is www-x-form-urlencoded and I could not parse it using raw body so for now am pulling the info from their API

Information on your n8n setup

  • n8n version: The latest 8.8 or whatever which is on npm
  • Database you’re using (default: SQLite): default SQLLite
  • Running n8n with the execution process [own(default), main]: own (this was my first suspission)
  • Running n8n via [Docker, npm, n8n.cloud, desktop app]: npm and running using pm2

I’m new to n8n but I’m a professional developer for mre than 13 years so can send you as much technical info as possible.

I don’t know if the webhook node somehow has a timeout which doesn’t wait enough to get the data or what happens but usually the errors happen in burst one after another and they go away and I checked my system’s both memory and CPU usage but CPU was under 8% and memory and network were totally fine as well.

How can I find out why these errors which have no data entered into the webhook and re-executing them doesn’t help happen.

Hi @Ashkan, welcome to the community! I am sorry to hear you’re having trouble here.

Which status does your execution have in the execution list? Is this problem reproducible, for example by sending a certain payload or a certain amount of requests to your webhook endpoint?

Hi @MutedJam
The status is error and re-running it doesn’t result in anything else (Seems the data of the input is either not read or stored for the execution at all). I have not tried to reproduce by sending specific payloads again but you might be right that specific payloads can reproduce it. However once in a few days , multiple ones fail in a row.

@MutedJam Since I have the logs for older requests, I sent one of them to the server and it works. However if I don’t send the Content-Type header the webhook runs for ever and it doesn’t even honor my timeouts. The contents have personal customer info so I cannot post them here but message/email me at ashkan (dot) saeedi (dot) 1989 (at) gmail (dot) com if having the content helps.

This said the ones sent by the actual server don’t timeout and finish rather fast without any data being received by the webhook

Also if it helps when trying to retry those faulty executions which cannot even read the webhook, I get a toast notification at the bottom right saying this

Problem with retry

Cannot read properties of undefined (reading ‘nodeExecutionStack’)

Hi @Ashkan, which version of n8n are you running at the moment (you can run the n8n --version CLI command to display the current version or click About n8n in the Help menu of the sidebar in the UI)?

Can you perhaps share a redacted version of your payload, ideally as an easy to copy curl request? I am not interested in your personal data but rather the data format, structure and size.

@MutedJam The version is 0.185.0

And a redacted version of the request is attached, the content type header is urlencoded and not application/json.

{"member":{"id":2000,"name":"xxxxxxx","title":null,"timeZone":"Europe\/Berlin","formattedName":"xxxxxxx","primaryGroup":{"id":13,"name":"Professional","formattedName":"Professional"},"secondaryGroups":[],"email":"[email protected]","joined":"2022-06-29T15:57:31Z","registrationIpAddress":"2004:d1:6f04:3e07:e9f6:f425:8c5a:e864","warningPoints":0,"reputationPoints":0,"photoUrl":"data:image\/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%201024%201024%22%20style%3D%22background%3A%238bc462%22%3E%3Cg%3E%3Ctext%20text-anchor%3D%22middle%22%20dy%3D%22.35em%22%20x%3D%22512%22%20y%3D%22512%22%20fill%3D%22%23ffffff%22%20font-size%3D%22700%22%20font-family%3D%22-apple-system%2C%20BlinkMacSystemFont%2C%20Roboto%2C%20Helvetica%2C%20Arial%2C%20sans-serif%22%3ER%3C%2Ftext%3E%3C%2Fg%3E%3C%2Fsvg%3E","photoUrlIsDefault":true,"coverPhotoUrl":"","profileUrl":"https:\/\/canopy.procedural-worlds.com\/profile\/2750-xxxxxx\/","validating":false,"posts":0,"lastActivity":"2022-07-15T12:49:08Z","lastVisit":"2022-07-15T09:22:09Z","lastPost":null,"birthday":null,"profileViews":2,"customFields":{"1":{"name":"Personal Information","fields":{"1":{"name":"About Me","value":null}}},"3":{"name":"Company Information","fields":{"3":{"name":"Company Name","value":null}}}},"rank":{"id":3,"name":"Newbie","icon":"\/\/content.invisioncic.com\/d317391\/monthly_2021_12\/1_Newbie.svg","points":0},"achievements_points":20,"allowAdminEmails":false},"changes":{"last_visit":1657876929,"last_activity":1657889348}}

This is sent as a post request body without any query string and is one of the failed requests when sent as a webhook. I grabbed this from my ASP.NET logs and resent it suing a client side .NET app to n8n and worked. However If i don’t send any header the webhook node runs for ever and doesn’t stop at my 50 seconds timeout and canot be killed either.

I can give you credentials to look at our n8n instance if it helps as well.

To be clear the requests which fail when sent as webhooks don’t run for a long time and run for a fraction of a second and as noted above cannot be retired and have no data in them. I just mentioned the header related insident as a fragility of the system.
f

@MutedJam It seems our 2 core t2 medium with own process execution model can handle up to 30 requests at the same time and If i send 40-50 to it then it starts to mess up and return ok but not process the requests. Maybe if I increase my timeout of 50 seconds to 5 minutes so processes eventually finish when webhooks come in bursts and also execute them in the main process, I get better results.

I am not sure if that is the only problem but I thought it might be because our failures sometimes are 5-10 errors in a single second or 2-3 seconds

Also one other thing I realized is that a single request takes less than a second with my two workflows combined. I do 2 http requests and have a webhook and it is not that bad but when I send 5 requests with the same payload I posted for you to it, all of them take a bout 10 seconds which again could be fine for a single NodeJs process but the thing is that still requests show that they finished in 0.3 seconds or 0.7 seconds. While in running state they show 8 seconds or 9 seconds so then they are fine but when finished successfully, they don’t count their waiting time as time (it seems)

Using EXECUTIONS_PROCESS=main or own did not make a difference in the results much either. still completely hangs at 50 requests sent and processor usage of my 2 cores (given the fact that t2 instances have 20% cpu and some bursted cpu time).

Anyways this means that those failed tasks might not have finished quickly but rather timed out or something else happend to them in a long time.

I think this is too low and I wonder if something like sqlLite perf/disk perf/network perf of t2 is causing this behavior.

I’m tempted to compare to a node-red process as well.

I’m sure my .NET core process on the machien can handle 100 times more in a second but it is not doing as much as a flow here but even Cadance and alike perform at least more reliably if not faster. We are still in evaluation phase and that is why I’m asking these many questions. The fact that you have these many nodes is interesting to us and you are much better than node-red when it comes to suitability for workflows while still lacking compared to Elsa or cadance/temporal/…

Do you have any perf is so low and how to improve it? I ran a microsoft orleans app with its frontend socket server for a match maker on a t2 micro with much more load and requests on a t2 micro so the machine should not be all of the problem at least.

To see if sqllite is the issue or not, we have more than 10k executions in the system and 5-6 flows.

I’m willing to do whatever needed so us together can find out how to make this faster and more efficient because your UI and system concepts and ready made nodes are rather nice and while I’m new to NodeJs and typescript and js , I’ve done lots of distributed and network programming.

@MutedJam Can you tell me if this is normal for n8n to totally freeze if you execute 20-30 tasks at the same time on a 2 core machine (t2 medium in our case)? What am I supposed to do to bring the performance of n8n up? Seems a deadlock in the webhook to me more than anything else TBH

Hi @Ashkan, based on your description it seems like the available resources could be the problem here, but you mentioned that CPU and memory consumption where really low.

As for whether the performance is normal this very much depends on the complexity of your workflows. I have seen instances processing hundreds of webhooks each minute without a problem while only using a single CPU core.

You could consider a scalable setup as described here to increase performance, the page describes an approach to setting up webhook scaling as well.

Thanks for the response @MutedJam Actually now when I start 20 requests and they never finish, CPU usage goes through the roof like all webhooks are busy waiting for a lock or something. I can implement the scaling system and see how it works but I would not have issues with 300 per minute if I have them in bursts of 5 at a time. Even with the current setup but what concerns me is that why a single process should not be able to process the 20 webhooks one by one at the same time.
Let’s say i have a 4 core machien and 4 of these processes, what would be the limit, 80?

The limit for me clearly is the webhook node? Is it a limitation of express or sql-lite for reading/writing executions? do you have any idea?