TL;DR (the debugging flow)
Do not start with configuration changes. Start with isolation: reproduce the slowness, then remove variables until you find the slow layer.
Once you know the slow layer, the fix is usually obvious: cap timeouts, cap retries, or reduce work per run.
- Step 1: reproduce with one channel and one workflow
- Step 2: isolate model latency vs tool latency vs network latency
- Step 3: cap retries and timeouts
- Step 4: reduce scope per run
- Step 5: capture a minimal repro for future debugging