The current Summarization Chain sends multiple requests in parallel, which could cause timeouts when used with local models like Ollama that can only process requests sequentially. This has been reported in this thread:
To address this, I propose adding a concurrency limit option to the Summarization Chain. This would allow users to control the number of parallel requests, preventing timeouts when working with models that cannot handle high levels of parallelism.