Feature: Add Parallel Execution Support for Agent Nodes

The idea is:

Implement true parallel execution capabilities for Agent nodes or LLM processing steps within a workflow. This feature should allow multiple independent LLM calls (e.g., multiple agents analyzing different documents, or multiple tools being called simultaneously) to run concurrently rather than sequentially. This is crucial for optimizing workflows that rely heavily on high-latency operations like complex LLM reasoning or external API calls.

My use case:

I need to conduct multi-angle, in-depth analysis and seek opinions on a high-value input document (e.g., a complex legal contract, a detailed annual financial report, or a critical technical design document).

Current Challenge:
Although the platform supports single-model batch processing, it cannot efficiently handle scenarios where one document requires the collaboration of multiple different Agents for analysis.
Suppose we need to run the following three independent analysis tasks to generate a comprehensive summary:

  1. Legal Compliance Agent (法律合规 Agent): Review risk clauses in the contract (takes 8 seconds).
  2. Financial Analysis Agent (财务分析 Agent): Extract key financial indicators and predict trends (takes 12 seconds).
  3. Summarization and Refinement Agent (摘要与润色 Agent): Consolidate the first two reports into the final executive summary (takes 5 seconds).

In the current serial execution mode, the total time taken is $8 + 12 + 5 = 25$ seconds (ignoring data transfer time).
Desired Parallel Solution:

Task 1 and Task 2 are completely independent; both can read the original document and start working simultaneously. Only Task 3 depends on the output of the first two tasks.

  1. Parallel Execution (8 seconds, 12 seconds): The Legal Compliance Agent and the Financial Analysis Agent start concurrently. The parallel phase time is determined by the slowest task, which is 12 seconds.
  2. Serial Execution (5 seconds): Task 3 starts after Task 1 and Task 2 are complete.

Total Time: $12 + 5 = 17$ seconds.

By introducing parallel functionality, we can reduce the waiting time for critical tasks by approximately 32% (from 25 seconds to 17 seconds), significantly improving the response speed and efficiency of complex decision-support systems. This feature is crucial for scenarios requiring the rapid synthesis of multiple expert opinions (such as risk control, auditing, or real-time customer service decision-making).

I think it would be beneficial to add this because:

  1. Significantly reduce execution time: Especially in tasks involving large amounts of data or high-latency external services, parallel processing can reduce the overall workflow execution time from hours to minutes.
  2. Improve resource utilization: Allows underlying computational resources (CPU, network bandwidth, LLM API limits) to be utilized more fully, rather than waiting for a single request to complete.
  3. Lower costs: Many LLM platforms charge based on usage time; faster processing speed means shorter billing cycles (if the workflow engine itself is self-hosted).
  4. Enhance user experience: Makes it possible to build near real-time, high-throughput automated processes.

Any resources to support this?

Are you willing to work on this?

No, but I am happy to provide further testing and detailed feedback on the implementation

Agreed. This is a frustrating limitation.