posts

The Treachery of Memory: On Long Contexts and Agentic Failures

How Long Contexts Fail How to Fix Your Context Long context is your friend… when we are talking about summarization and retrieval. For agentic workflow it is often detrimental due to reasons such as: Context poisoning, where the model hallucinates and messes with its own context. The ripples are powerful and they die slowly; Context distraction, where the model starts to repeat itself instead of trying new strategies; Context confusion, which happens when one gives the model too many tools (sometimes 2 is too many); ...

Claude Deep Research, or How I Learned to Stop Worrying and Love Multi-Agent Systems

I usually approach shiny new things with a healthy dose of skepticism. Until recently, this was precisely my attitude toward multi-agent systems. This is hardly surprising, given the immense hype surrounding them and the conspicuous absence of genuinely successful examples. Most implementations that actually worked fell into one of the following categories: Agentic systems following a predefined plan. These are essentially LLMs with tools, trained to automate a very specific process. This approach allows each step to be tested individually and its results verified. Such systems are typically described as a directed acyclic graph (DAG), sometimes dynamic, and developed using now-standard primitives from frameworks like LangChain and Griptape1. The early implementation of Gemini Deep Research operated this way: first, a search plan was created, then the search was executed, and finally, the results were compiled. Solutions operating in systems with a feedback loop. Various Claude Code, Cursor, and other code-generating agents fall into this group. The stronger the feedback loop—that is, the better the tooling and the stricter the type checking—the greater the chance they won’t completely wreck your codebase2. Models trained using Reinforcement Learning, such as those with interleaved thinking, like OpenAI’s o3. This is a separate, very interesting conversation, but even these models have a certain modus operandi defined by the specifics of their training. Meanwhile, open-ended multi-agent systems have largely remained in the proof-of-concept stage due to their general unreliability. The community lacked a clear understanding of where and how to implement them. This was the case until Anthropic published a deeply technical article on how they developed their Deep Research system. It defined a reasonably clear framework for building such systems, and that is what we will examine today. ...

MCP's June Update: Safer, Smarter, Simpler?

The Model Context Protocol, despite its aggressive adoption (or perhaps because of it), continues to evolve. Anthropic recently updated the MCP specification, and below, we’ll look at the main changes. Security Enhancements An MCP server is now always classified as an OAuth Resource Server, and clients are required to implement Resource Indicators (RFC 8707). This is necessary to protect against attacks like the Confused Deputy. Previously, tokens requested by a client from an authorization server were “impersonal,” meaning they could be used by anyone. This allowed an attacker to create a phishing MCP server, deceive a client, steal the token, and use that token to gain access to the real MCP server. ...

Blogs People Write

At a time when the words “AI” and “hype” have become almost synonymous, it’s crucial to be smart about choosing your sources of information. There is far too much information noise out there, and sifting through the sea of articles from various AI evangelists and generated garbage to find something truly worthwhile is incredibly difficult. In this post, I’ll share the materials I read to stay up-to-date on the latest developments. ...

Poisoned Context: The Hidden Threat of Using Multiple GPTs

It’s summer. Time to plan a vacation getaway. You open ChatGPT, select the increasingly popular “Travel Advisor” GPT, and start discussing options. The advisor gives excellent suggestions, offers fascinating details about local attractions, generates pretty good itineraries, and generally leaves a great impression. Sure, some oddities pop up here and there, but you dismiss them as harmless hallucinations. You settle on Barcelona. Excellent choice. In the same chat, you switch to another familiar and popular GPT, “Booking Agent,” which has never let you down, and book your accommodations. ...

Griptape, Part 2: Building Graphs

In the previous post, I broke down the basic concepts of the Griptape AI framework, and now it’s time to put them into practice. We’ll try to use them to develop a small application that helps run a link-blog on Telegram. The application will receive a URL, download its content, run it through an LLM to generate a summary, translate that summary into a couple of other languages, combine everything, and publish it to Telegram via a bot. The general flow can be seen in the diagram below: ...

OpenAI Codex Gains Internet Access: First Impressions

What on Earth is Codex? Good question, right? The thing is, until recently, OpenAI had a model called Codex, which was used as the foundation for autocompletion in GitHub Copilot. Then, OpenAI released a console agent for development, which they named, so no one would get confused, Codex. 1 Everyone had a laugh at OpenAI’s naming skills 2, and life went on. Until the fateful day when a tweet like this appeared from Sam Altman: ...

Griptape: A Framework for AI Applications, Part 1: Introduction

Today we will look at Griptape, a framework for building AI applications, which offers a clean Pythonic API for those tired of LangChain’s abstraction layers. It provides primitives for building assistants, RAG systems, and integrating with external tools. Honestly, in my experience, most people tired of LangChain switch to custom-written wrappers around lower-level libraries like OpenAI or LiteLLM. But who knows, maybe they’re missing out. Let’s dive in. A Bit of History Personally, I’ve been hearing about Griptape for about a year and a half. As far as I remember, It started as a sort of LangChain competitor with quite similar primitives, but their paths gradually diverged. As of the time of the writing, it has 2.3k stars on GitHub, which is somewhat less than LangChain’s 109k, but still enough to consider the project quite mature. Besides the open-source framework, it has also developed its own cloud where you can run your applications, ETLs, and RAGs, and a visual builder, Griptape Nodes, allowing non-professionals to click together applications in minutes. 1 ...

Seeed Re:Camera review, part 1

Alright, Here We Go I got my hands on the Re:Camera from Seeed. Essentially, it’s a small box (a cube about 4 cm per side), wrapped in a heatsink. Inside, there’s a dual-core RISC-V based MPU (updated: only one core is visible to the system, the second one is apparently reserved for special operations), an ancient 8051 microcontroller, an OmniVision camera sensor, LEDs for illumination, Wi-Fi, BT, and, you know, all sorts of peripherals. RAM is a bit scarce, only 256 megabytes, so getting Greengrass on it will be problematic. You can connect Ethernet via a special dongle-adapter that barely stays put, but for development, there’s no point, because the camera shares its network over USB type C, and it’s easier to work that way. If you’re short on storage (and the device comes in 8 GB and 64 GB built-in storage options), you can stick in a MicroSD card. You can also stick the box to something metallic, as it has magnets on one side. ...