AI | Notatki

The Treachery of Memory: On Long Contexts and Agentic Failures

How Long Contexts Fail How to Fix Your Context Long context is your friend… when we are talking about summarization and retrieval. For agentic workflow it is often detrimental due to reasons such as: Context poisoning, where the model hallucinates and messes with its own context. The ripples are powerful and they die slowly; Context distraction, where the model starts to repeat itself instead of trying new strategies; Context confusion, which happens when one gives the model too many tools (sometimes 2 is too many); ...

Claude Deep Research, or How I Learned to Stop Worrying and Love Multi-Agent Systems

I usually approach shiny new things with a healthy dose of skepticism. Until recently, this was precisely my attitude toward multi-agent systems. This is hardly surprising, given the immense hype surrounding them and the conspicuous absence of genuinely successful examples. Most implementations that actually worked fell into one of the following categories: Agentic systems following a predefined plan. These are essentially LLMs with tools, trained to automate a very specific process. This approach allows each step to be tested individually and its results verified. Such systems are typically described as a directed acyclic graph (DAG), sometimes dynamic, and developed using now-standard primitives from frameworks like LangChain and Griptape1. The early implementation of Gemini Deep Research operated this way: first, a search plan was created, then the search was executed, and finally, the results were compiled. Solutions operating in systems with a feedback loop. Various Claude Code, Cursor, and other code-generating agents fall into this group. The stronger the feedback loop—that is, the better the tooling and the stricter the type checking—the greater the chance they won’t completely wreck your codebase2. Models trained using Reinforcement Learning, such as those with interleaved thinking, like OpenAI’s o3. This is a separate, very interesting conversation, but even these models have a certain modus operandi defined by the specifics of their training. Meanwhile, open-ended multi-agent systems have largely remained in the proof-of-concept stage due to their general unreliability. The community lacked a clear understanding of where and how to implement them. This was the case until Anthropic published a deeply technical article on how they developed their Deep Research system. It defined a reasonably clear framework for building such systems, and that is what we will examine today. ...

Beyond Supply and Demand: The Real Labor Pains of the AI Revolution

The public conversation about AI and labor is stuck in a tedious loop. “AI will take our jobs,” declare the headlines, a statement of faith in technological determinism that serves as a conversation-stopper, not a starter. A more useful, if still imperfect, entry point begins with a simple economic model. It starts with an observation, such as Arvind Narayanan’s on radiology: AI has surpassed human performance on many discrete tasks, yet the number of human radiologists continues to grow. This suggests the dominant effect isn’t automation, but augmentation. My initial take was that this boils down to a classic supply and demand problem. One AI-augmented specialist can do the work of many, increasing supply. In fields with vast, unsaturated demand—think of the queues at hospitals or the perpetual backlogs in software development—this new capacity will simply be absorbed. Problem solved. ...

MCP's June Update: Safer, Smarter, Simpler?

The Model Context Protocol, despite its aggressive adoption (or perhaps because of it), continues to evolve. Anthropic recently updated the MCP specification, and below, we’ll look at the main changes. Security Enhancements An MCP server is now always classified as an OAuth Resource Server, and clients are required to implement Resource Indicators (RFC 8707). This is necessary to protect against attacks like the Confused Deputy. Previously, tokens requested by a client from an authorization server were “impersonal,” meaning they could be used by anyone. This allowed an attacker to create a phishing MCP server, deceive a client, steal the token, and use that token to gain access to the real MCP server. ...

Blogs People Write

At a time when the words “AI” and “hype” have become almost synonymous, it’s crucial to be smart about choosing your sources of information. There is far too much information noise out there, and sifting through the sea of articles from various AI evangelists and generated garbage to find something truly worthwhile is incredibly difficult. In this post, I’ll share the materials I read to stay up-to-date on the latest developments. ...

Poisoned Context: The Hidden Threat of Using Multiple GPTs

It’s summer. Time to plan a vacation getaway. You open ChatGPT, select the increasingly popular “Travel Advisor” GPT, and start discussing options. The advisor gives excellent suggestions, offers fascinating details about local attractions, generates pretty good itineraries, and generally leaves a great impression. Sure, some oddities pop up here and there, but you dismiss them as harmless hallucinations. You settle on Barcelona. Excellent choice. In the same chat, you switch to another familiar and popular GPT, “Booking Agent,” which has never let you down, and book your accommodations. ...

Griptape, Part 2: Building Graphs

In the previous post, I broke down the basic concepts of the Griptape AI framework, and now it’s time to put them into practice. We’ll try to use them to develop a small application that helps run a link-blog on Telegram. The application will receive a URL, download its content, run it through an LLM to generate a summary, translate that summary into a couple of other languages, combine everything, and publish it to Telegram via a bot. The general flow can be seen in the diagram below: ...

OpenAI Codex Gains Internet Access: First Impressions

What on Earth is Codex? Good question, right? The thing is, until recently, OpenAI had a model called Codex, which was used as the foundation for autocompletion in GitHub Copilot. Then, OpenAI released a console agent for development, which they named, so no one would get confused, Codex. 1 Everyone had a laugh at OpenAI’s naming skills 2, and life went on. Until the fateful day when a tweet like this appeared from Sam Altman: ...

Mastering AI Crawler Control: A Guide to `robots.txt` and Advanced Webmaster Tools

1. Introduction: The Imperative of AI Crawler Management 2. Understanding robots.txt: The Foundation of Crawler Instruction 2.1. Core Syntax and Directives 2.2. File Placement and Formatting 2.3. How Crawlers Interpret robots.txt 2.4. Testing Your robots.txt 3. Identifying Key Crawler Types: AI Agents vs. Search Engine Bots 3.1. Distinguishing Characteristics 3.2. Categories of AI Crawlers and Their User Agents 3.2.1. AI Crawlers for Model Training 3.2.2. AI Crawlers for Live Retrieval and Search Assistance 3.3. Standard Search Engine Crawlers (to be Allowed) 3.4. Table of Prominent AI Crawler User Agents 4. Strategically Blocking AI Crawlers with robots.txt 4.1. Targeting Specific AI User Agents 4.2. Applying Rules to Specific Pages or Directories 4.3. Ensuring Search Engines Are Not Blocked from Specific Pages 4.4. The Challenge of “All Possible AI” 5. Advanced Methods for Granular AI Crawler Control 5.1. HTML Meta Tags (Page-Level Control) 5.2. HTTP X-Robots-Tag Headers (Server-Level Page Control) 5.3. Server-Side Blocking 5.4. Web Application Firewalls (WAFs) and Content Delivery Networks (CDNs) 5.5. Table: Comparison of AI Crawler Control Mechanisms 6. Limitations and Best Practices 6.1. robots.txt is a Directive, Not an Enforcement Mechanism 6.2. Importance of Regular Review and Updates 6.3. Testing robots.txt Changes 6.4. Avoiding Common Pitfalls 6.5. Log File Analysis 7. Conclusion: Implementing a Robust, Layered AI Crawler Defense 1. Introduction: The Imperative of AI Crawler Management The proliferation of Artificial Intelligence (AI) has introduced a new class of web crawlers designed to gather vast quantities of data for training Large Language Models (LLMs) and powering AI-driven applications. While these advancements offer significant potential, website operators often require precise control over which content AI crawlers can access, particularly to protect intellectual property, sensitive information, or manage server resources. Simultaneously, maintaining visibility and crawlability for traditional search engine bots like Googlebot and Bingbot remains paramount for organic search performance. ...

Griptape: A Framework for AI Applications, Part 1: Introduction

Today we will look at Griptape, a framework for building AI applications, which offers a clean Pythonic API for those tired of LangChain’s abstraction layers. It provides primitives for building assistants, RAG systems, and integrating with external tools. Honestly, in my experience, most people tired of LangChain switch to custom-written wrappers around lower-level libraries like OpenAI or LiteLLM. But who knows, maybe they’re missing out. Let’s dive in. A Bit of History Personally, I’ve been hearing about Griptape for about a year and a half. As far as I remember, It started as a sort of LangChain competitor with quite similar primitives, but their paths gradually diverged. As of the time of the writing, it has 2.3k stars on GitHub, which is somewhat less than LangChain’s 109k, but still enough to consider the project quite mature. Besides the open-source framework, it has also developed its own cloud where you can run your applications, ETLs, and RAGs, and a visual builder, Griptape Nodes, allowing non-professionals to click together applications in minutes. 1 ...