[[Language model agents]] are too linear - in the sense that they're stuck in a call-and-response paradigm. A user invokes an agent, the agent runs off to do its task (maybe sending intermediate steps back), then returns with its final answer. [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT), ChatGPT, and every other tool-using agent work this way.
This is fine for one-off tasks, but is insufficient for many key use cases, including
- **Embodied agents.** Something receiving realtime video, audio, or other input can't just disappear for a while.
- **Persistent agents.** If a system needs to continuously operate, a single-threaded loop would be unable to incorporate other threads, e.g. a self-monitoring process, a memory retrieval process, etc without adding unacceptable latency.
- **Self-regulating agents.** Related to above, a system should be able to output accurate information & take actions with low latency. An ideal self-regulation system would run parallel to output generation, so it protects the system and the user without slowing either down. This applies to many self-regulation processes, including avoiding loops, learning over time, adapting responses based on fetched memory or tool output, etc. These can be applied in a single thread, but that would come at a steep latency cost, especially if you add more steps. Necessary to create [[Language Model Entities (LMEs)]].
Human brains do this well - almost all stimulus is filtered out, leaving you with only the most relevant info.
A good solution to this could make existing agents into systems capable of more realtime or long-term use.
An early workaround: [[Language model agents can approximate being proactive by scheduling their own self-invocations]]