LLM | Notatki

The Treachery of Memory: On Long Contexts and Agentic Failures

How Long Contexts Fail How to Fix Your Context Long context is your friend… when we are talking about summarization and retrieval. For agentic workflow it is often detrimental due to reasons such as: Context poisoning, where the model hallucinates and messes with its own context. The ripples are powerful and they die slowly; Context distraction, where the model starts to repeat itself instead of trying new strategies; Context confusion, which happens when one gives the model too many tools (sometimes 2 is too many); ...

Claude Deep Research, or How I Learned to Stop Worrying and Love Multi-Agent Systems

I usually approach shiny new things with a healthy dose of skepticism. Until recently, this was precisely my attitude toward multi-agent systems. This is hardly surprising, given the immense hype surrounding them and the conspicuous absence of genuinely successful examples. Most implementations that actually worked fell into one of the following categories: Agentic systems following a predefined plan. These are essentially LLMs with tools, trained to automate a very specific process. This approach allows each step to be tested individually and its results verified. Such systems are typically described as a directed acyclic graph (DAG), sometimes dynamic, and developed using now-standard primitives from frameworks like LangChain and Griptape1. The early implementation of Gemini Deep Research operated this way: first, a search plan was created, then the search was executed, and finally, the results were compiled. Solutions operating in systems with a feedback loop. Various Claude Code, Cursor, and other code-generating agents fall into this group. The stronger the feedback loop—that is, the better the tooling and the stricter the type checking—the greater the chance they won’t completely wreck your codebase2. Models trained using Reinforcement Learning, such as those with interleaved thinking, like OpenAI’s o3. This is a separate, very interesting conversation, but even these models have a certain modus operandi defined by the specifics of their training. Meanwhile, open-ended multi-agent systems have largely remained in the proof-of-concept stage due to their general unreliability. The community lacked a clear understanding of where and how to implement them. This was the case until Anthropic published a deeply technical article on how they developed their Deep Research system. It defined a reasonably clear framework for building such systems, and that is what we will examine today. ...

MCP's June Update: Safer, Smarter, Simpler?

The Model Context Protocol, despite its aggressive adoption (or perhaps because of it), continues to evolve. Anthropic recently updated the MCP specification, and below, we’ll look at the main changes. Security Enhancements An MCP server is now always classified as an OAuth Resource Server, and clients are required to implement Resource Indicators (RFC 8707). This is necessary to protect against attacks like the Confused Deputy. Previously, tokens requested by a client from an authorization server were “impersonal,” meaning they could be used by anyone. This allowed an attacker to create a phishing MCP server, deceive a client, steal the token, and use that token to gain access to the real MCP server. ...

Griptape, Part 2: Building Graphs

In the previous post, I broke down the basic concepts of the Griptape AI framework, and now it’s time to put them into practice. We’ll try to use them to develop a small application that helps run a link-blog on Telegram. The application will receive a URL, download its content, run it through an LLM to generate a summary, translate that summary into a couple of other languages, combine everything, and publish it to Telegram via a bot. The general flow can be seen in the diagram below: ...