I have an n8n setup deployed in Azure Container Apps in Queue mode. Things seem to be going well all told, though I’m not 100% sure I set everything up quite right and the docs aren’t that helpful for my case.
One particular issue I’m running into is the following: I have a workflow that starts with a form value, goes through some steps (including calling a sub workflow), then calls out to Azure DevOps to run a pipeline and waits for the pipeline to hit the Wait node’s $execution.resumeUrl value. Unfortunately, this is where things go poorly.
When I test the workflow, I get through all the preceding nodes just fine, then kick off the pipeline and start waiting. In the debug logs, I can see a few things that are headscratchers: The main node shows a final log of “Execution removed” for my test run’s execution ID, then when the pipeline finally gets around to calling the webhook URL around 30 seconds later, the logs show “Started execution of workflow ‘My WOrkflow’ from webhook” as if it were starting the whole thing over again, and this fails because of course the Wait node is not the entrypoint for the workflow. I also see the error log around the execution failure on the main node, the webhook processor and the worker node, which is odd because I was under the impression that when running workflow tests the main node handled everything.
I’m not sure what to do here. I don’t really want to split the workflow out into this one and another one with an actual webhook entrypoint, but it seems like that’s where things are going due to the Wait node not working as I would expect on my setup.
What is the error message (if any)?
No node to start the workflow from could be found (execution 559)
Ensure Proper Workflow Structure: The main workflow should start with a Webhook node. If your workflow begins with a form or another trigger, the resume might fail because the system doesn’t know where to resume. Consider restructuring your workflow to start with a Webhook node.
Split Workflow if Necessary: If restructuring isn’t feasible, create a separate workflow that starts with a Webhook node. This workflow can trigger the main workflow, allowing the Wait node to resume correctly.
>> Also I suggest you share the whole json file in viewable formate, i would be much easier to Debug and understand
By addressing the workflow structure and ensuring proper configuration, the Wait node should function as intended.
I tried adding the workflow JSON in the OP but ran into character limit issues. Our workflow does indeed start with a Form, but we were having Next Form Page node issues as well so we were hoping to achieve the desired level of waiting through the webhook wait node since it’s easy to GET a webhook url in a yaml pipeline.
Hey @awalbear
In Queue mode, n8n distributes workflow executions across multiple workers for scalability. However, this setup can introduce challenges with nodes that rely on state persistence, such as the Wait node. The Wait node generates a unique $execution.resumeUrl during execution, expecting the same worker to handle the incoming call to this URL to resume the workflow. In a distributed environment like Queue mode, the worker handling the resume call might not have the necessary context, leading to unexpected behavior, such as restarting the workflow.
Potential Solutions:
Review Workflow Design:
Separate Workflows: Consider splitting your workflow into two distinct workflows:
Primary Workflow: Handles the initial form input, processes data, and triggers the Azure DevOps pipeline.
Secondary Workflow: Starts with a Webhook node to receive the callback from Azure DevOps and continues the process.
Advantages: This design aligns with stateless execution models, reducing reliance on in-memory state and ensuring that each workflow is independently triggerable.
Utilize External Storage for State Management:
Persisting State: Store the necessary state information (e.g., execution ID, context data) in an external storage system (like a database) after initiating the Azure DevOps pipeline.
Resuming Execution: When the callback is received, retrieve the stored state to determine the appropriate execution path, ensuring continuity.
Evaluate Deployment Configuration:
Single Worker Mode: If feasible, configure your n8n deployment to use a single worker. This setup ensures that the same worker handles both the initial execution and the resume call, maintaining the necessary context.
Considerations: While this approach simplifies state management, it may impact scalability and fault tolerance.
Hmm. I guess I find that behavior kind of strange; I would expect that by the main node and the worker node sharing the same database that any execution information (including in-flight context needed for crossing an async boundary like a webhook wait or next form page wait) we would be able to avoid having to make breaks in the workflow chain to accommodate intermediate wait behavior. Furthermore, I was under the impression that when running the Test mode, all executions should be running on the main node, and only when production URLs, etc. were being used would the executions get scheduled on the worker. Is this a bug perhaps? Maybe an issue in my queue mode setup?
I did, that’s how I set up the Queue mode stuff we have right now. I think I’m just going to split the baby and make another workflow, not my ideal solution but at least it’ll unblock me.