The Treachery of Memory: On Long Contexts and Agentic Failures

Long context is your friend… when we are talking about summarization and retrieval. For agentic workflow it is often detrimental due to reasons such as:

Context poisoning, where the model hallucinates and messes with its own context. The ripples are powerful and they die slowly;
Context distraction, where the model starts to repeat itself instead of trying new strategies;
Context confusion, which happens when one gives the model too many tools (sometimes 2 is too many);
Context clash, where the model cannot let go of its initial wrong assumptions. And it assumes something each turn of the conversation, be it a tool call, file read, or user’s request.

Those problems become especially emphasized when the context starts getting too long.

Fortunately, there are some methods to minimize the impact of those issues:

RAG. Rumors about its death are greatly exaggerated. It is still a very powerful tool to control the context length;
Tool loadout. Give the model only the tools it needs. A task changed? Select another set of tools. Or let a RAG system or another LLM do that.
Context quarantine. You can delegate, then why the agent shouldn’t be able to? Split tasks into subtasks with their own contexts, consume the results.
Context pruning. Trim the context mercilessly. The model doesn’t need to remember its reasoning traces or tool calls results from 10 turns back. And keeping everything in some structured format lets you efficiently develop strategies for pruning.
Context summarization. When context gets too long, summarize it and throw away. Don’t forget to reinitialize the agent after that. If you have files with instructions, add them to the resulting summary.
Context offloading. Give the model a scratchpad. Or, better yet, several. Let it write down, forget, consult, repeat.

All of that will make the agent system not only more reliable, but also cheaper.