Notatki

Konstantin Meshcheryakov. PhD, currently Head of AI in the IoT sector, with a background spanning C++ to cloud technologies. On this blog, I share practical thoughts on AI (TinyML, LLMs, and more), and occasionally other tech topics. Expect a no-hype, substance-focused approach with a touch of irony.

Clean your docs first

Lately, there have been numerous alerts about security vulnerabilities connected to indirect prompt injection attacks. The main conclusion is: LLMs are gullible, nobody knows how to make them completely robust. Therefore, no AI system is safe.

It is, of course, completely true, and the attention those attacks attract is most definitely welcome, but the main question is left unanswered: what is to be done about it?

Purists would push for banning all external inputs that could lead to prompt injection, but, frankly, I wouldn’t be so rigid. A lot of genuinely useful applications do rely on external input, and so, I am afraid, we have to retreat to the last resort: engineering discipline.

There are various schemes that use a clever interaction of different models to minimize their exposure to attacks (see, for example, the CaMeL paper). I believe, though, that we don’t pay enough attention to simple input sanitization.

A lot of such attacks rely on text hidden from the human, but visible to the machine. Detecting and removing such text is relatively trivial (in a programmatic sense), although it will require working not on the text but on the container level. This technique (called Content Disarm & Reconstruction, by the way) has been around for quite some time. Honestly, I am surprised that it is not implemented everywhere. It wouldn’t prevent all attacks, but it would make an unsophisticated attacker’s life harder.

And to those who insist that 99% in security is a failing score, I would like to remind them of a fundamental security principle: no system is 100% secure. The role of security is to make an attack more costly than the potential benefit gained from it (see the Gordon–Loeb model).

With this paradigm in mind, we can build reliable and secure AI systems, even with insecure individual components.

Some of the things you need to know about the latest GPT-5 release that evangelists don’t talk about:

GPT-5 is not one model. What they call GPT-5 outside of API context is a router that sends your request to a model that it thinks would work most efficiently on it. You need to look at the OpenAI’s promise to provide access to everyone in this light. They provide access to the router, and you don’t know the specific configuration it applies to you and whether you will actually be able to test the most powerful model. That would undoubtedly cause completely different experiences for different users.
GPT-5 is not a PhD. It is a pretty capable model (at least, one of them) that excels at some tasks. You can expect improvements in:
- coding capabilities (they are impressive according to some vibe tests, but according to the SWE-bench Verified benchmark, it has just a minor lead over Claude Opus 4.1);
- tool calling capabilities, which are the most important for agentic workloads;
- other tasks where OpenAI had access to immediate feedback.
While those are important improvements for a lot of areas, they don’t make the model PhD-level. The hallucination about the airfoil during the demo perfectly demonstrates that it still internalizes the most common belief, not the most current. It is a very hard problem to solve and actually a major roadblock on the way to AGI.
Diminished hallucinations are a double-edged sword. On the one hand, the less a model hallucinates, the better, as you can trust it more. On the other hand, the more you trust the model, the more likely you are to miss actual hallucinations. In the real world, the model that never hallucinates is the best. The model that hallucinates in 0.01% of cases can be more dangerous than one that hallucinates in 10% of cases.

My personal impression so far is that it still has the same issue as the previous models from OpenAI, namely that it is really superficial without careful prompting. It provides you with the most shallow analysis it can get away with and hides this fact by using the very well-structured responses.

Google Opal

Google has started a public preview for its new tool for graphical creation of multi-step AI workflows. While not a tool for production use, it is a great helper for building personal tools (and everybody should build personal tools, really, it is the biggest differentiator now).

It works on the Gemini platform with Gemini models (well, of course), and since it is a preview tool, it is free for now. In the future it will most likely use the Gemini Plan if the user has it. I am not sure about custom API keys, as this looks like a general public tool, but we’ll see.

Right now it is available only in the US.

X is buzzing with a new Horizon Alpha model that is beating all previous models on various vibe tests (read: unicorns on bicycles and so on) singlehandedly. This model has 256k context window, which is a solid, albeit not the most impressive, number.

Most probably it is a new OpenAI model (GPT-5?), as they already did the same trick before. You can try it on openrouter.ai completely for free, but remember not to provide it with any private data, as it is collected and used for model improvement.

Claude Code Sub Agent

Anthropic has added the ability to create and use specialized sub-agents in Claude Code. These sub-agents use a separate context window, which allows you to run separate tasks without polluting the main context, limiting the context rot effect. You can run the created sub-agents manually or let Claude Code decide when to use them.

What can be a good sub-agent? Anything that needs to be an expert in its area, doesn’t need to share the context with the main agent, can have dedicated tools, and can be designed to run self-sufficient tasks. To give you a taste of what can be made into a sub-agent, here are a couple of examples:

Git sub-agent to which you can offload various git-related operations. As it can use just the local git state, and doesn’t need to have access to the global context, it is a good choice for a tool with its own context window.
A book-writing sub-agent (I know, but I create book-like documents in a specific style for self-education). You provide it with a high-level plan, a set of materials, and one or two sections as examples, and let it work on a section separately from other sub-agents.

The second example would really benefit from the ability to run several sub-agents in parallel, but it looks like I’m asking too much.

Quoting Arvind Narayanan:

If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much about AI’s economic impacts. We are nothing without our tools.

TwelveLabs video understanding models are now available in Amazon Bedrock

AWS adds native video embeddings and video understanding models to Amazon Bedrock. It opens a lot of potential use cases for which I previously reached for Gemini models. One example of such a case is an educational system that watches how the learner performs the task and provides feedback based on the educational materials.

Bedrock had workflows to do video understanding, but it was exactly that: workflows, not native models. You can imagine what they looked like—take a video, split to frames, feed frames to VLM, try to maintain temporal consistency, despair, come to terms with the system’s performance, and go on vacation.

Now, however, there are not one, but two different native video models:

TwelveLabs Marengo, for creating video embeddings;
TwelveLabs Pegasus, for video-based text generation.

Pricing of the models depends on whether your video has an audio track or not, but you should expect $2.5-$3/hour of video for Marengo and $1.8/hour for Pegasus.

One form of context rot is what I call self-reinforced structure. When you accept a long-form model answer, you signal that this structure is acceptable, and so it tries to generate subsequent responses in a similar way. It can be destructive for any long-form creative work.

The only real defense is ensuring that the history the model receives doesn’t contain such replies. So it should be either prevented early or later fixed by providing a summary of the previous conversation instead of the actual history.

Introducing ChatGPT agent: bridging research and action

OpenAI released an agent that can control your own computer. It uses its advanced reasoning capabilities to plan and solve tasks in applications like Excel and PowerPoint.

While Sam Altman “feels the agi” looking at how the system works, I find it incredibly clunky. Instead of concentrating on providing the models with native tools (MCP is a good step forward, although not without its problems), they try to emulate hands and eyes for them, so models can do the same things we do, but slowly and awkwardly.

So I would consider this type of agent a temporary workaround until we develop better machine-to-machine communication mechanisms. After this, it will be used to serve an increasingly long tail of legacy systems that will not have such machine-usable interfaces.

P.S. Gemini mentioned a point of view I didn’t consider, namely that such systems can collect data necessary to train better embodied intelligence, meaning one that can act in the real world. It’s a perfectly valid point that shouldn’t be left without attention.

Stanford’s 2025 AI Index Report

Stanford published its annual report. It’s pretty important, because it separates speculation from pure numbers. Along with some obvious things (AI is getting better, cheaper, widespread, duh), there are some very interesting facts:

While almost every organization is using AI now (78% in 2024, although no doubt, for most of them it boils down to using chatbots to compose emails), the actual results are somewhat modest. The productivity increase is on the scale of 10% (to be honest, such an increase in one year is kinda unprecedented), but the increase in revenue for most industries is just about 5%. Why? Because as with any general purpose technology, realization of full benefit would require complete rebuilding the organizational structures and processes. The problem is that no one knows how these new processes would look like, and we will have to learn from our own mistakes.
Maybe old news, but AI provides more leverage to less experienced employees. The great equalizer of modern times. Again, that means that we need to reformulate our approach to team staffing. I would only add that it can help only if you have some remote understanding of what you’re doing, so those who apply for entry positions, do your homework well.
The number of AI-related incidents continues to rise. We see a twofold increase in 2024 vs 2023, and this is before frantic adoption of Agents and MCPs we see in 2025. So, we need to brace ourselves and be ready for more and more data leaks and integrity breaches.

The report contains a lot more nuggets, but it’s almost 500 pages long, so I would really recommend to use AI to extract what you fancy.