Current (2023) language model agents are too linear

[[Language model agents]] are too linear - in the sense that they're stuck in a call-and-response paradigm. A user invokes an agent, the agent runs off to do its task (maybe sending intermediate steps back), then returns with its final answer. [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT), ChatGPT, and every other tool-using agent work this way. This is fine for one-off tasks, but is insufficient for many key use cases, including - **Embodied agents.** Something receiving realtime video, audio, or other input can't just disappear for a while. - **Persistent agents.** If a system needs to continuously operate, a single-threaded loop would be unable to incorporate other threads, e.g. a self-monitoring process, a memory retrieval process, etc without adding unacceptable latency. - **Self-regulating agents.** Related to above, a system should be able to output accurate information & take actions with low latency. An ideal self-regulation system would run parallel to output generation, so it protects the system and the user without slowing either down. This applies to many self-regulation processes, including avoiding loops, learning over time, adapting responses based on fetched memory or tool output, etc. These can be applied in a single thread, but that would come at a steep latency cost, especially if you add more steps. Necessary to create [[Language Model Entities (LMEs)]]. Human brains do this well - almost all stimulus is filtered out, leaving you with only the most relevant info. A good solution to this could make existing agents into systems capable of more realtime or long-term use. An early workaround: [[Language model agents can approximate being proactive by scheduling their own self-invocations]]