Voxtral

Mistral introduces Voxtral, a family of open-source speech recognition and understanding models. It’s about time. We haven’t seen a comparable open-source model since OpenAI’s Whisper, and that was quite a while ago.

The models are provided in 3B and 24B sizes and outperform Whisper on most benchmarks. However, they require more powerful hardware, as the largest Whisper variant is just 1.5B. This is a direct consequence of it also being a regular language model. Another consequence is that controlling them in a pure transcription setting would be harder.

The models are available on Hugging Face as well as through the Mistral API and their LeChat.

What they also currently lack is diarization (speaker recognition) support. It’s on the roadmap, but in the meantime, we still have to use somewhat clunky pyannote-audio for this purpose.

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

The paper shows that different models behave completely differently when placed in game theory settings. What that means is that testing and evals are playing an increasingly critical role in developing agentic systems, as updating or changing the underlying model will lead to unpredictable changes in an agent’s behaviour.

Introducing Kiro

AWS jumps into the agentic IDEs bandwagon with Kiro. To separate itself from vibe-coding approach, which accumulated a considerable amount of ill repute, they emphasize the “spec-driven development” method. That means that the agent first helps the user to create a full requirements document for the feature, then it analyzes the existing code base, and only after that it starts implementing.

This approach definitely makes sense, and it’s a step forward from blindly running into the fray that is vibe-coding. The fact that those specs are updating along with the code changes makes them even more valuable, minimizing the problem of stale documentation. Hooks can run repeated agentic tasks, such as making sure the new feature has sufficient tests, automatically.

It is interesting to watch how different tools adopt different methodologies, as it allows the developers community to find and disseminate the techniques that really work.

TIL: ccusage — a nice tool to track and analyze the Claude Code usage.

Anthropic released 4 new courses in its academy:

  1. Claude Code in Action with practical advice on using the CLI agent.
  2. Claude with the Anthropic API, a comprehensive course on using all current API capabilities, from single-shot text generation to agents.
  3. Introduction to Model Context Protocol and Model Context Protocol: Advanced Topics for those interested in MCP.

Each course comes in video and text formats and provides a certificate of completion.

TIL: DBML - Database Markup Language

A markup language for DB schema description that can be useful to provide it to AI tools.

The AI Productivity Paradox

Research shows that experienced developers can actually be slowed down by modern AI agents by ~19%.

It is very important to understand that AI models are not deterministic and they cannot be made deterministic without severely restricting the environment they run in. Fixing seeds doesn’t help. Setting temperature to 0 doesn’t help. Every small floating point rounding error can ultimately lead to drastically different results.

The complex models are, in essence, chaotic systems and should be treated as such.

AI coding agents are just another tool in a good developer’s toolbox. They start with a slab of stone, and use those agents as a metaphorical sledgehammer to give it the rough form they envision. Then, they reach for AI-assisted coding tools, such as GitHub Copilot, to work with more precision, akin to a smaller hammer. And finally, they use the smallest chisel to carve out the finest details by hand.

How funny it is to hear that the art of programming is dead because we no longer carve the slabs with those small chisels alone.

A funny thing: Gemini Deep Research is programmatically discouraged from finishing early. I found that out when I tried to use it to extract information from unstructured text and fill in a template. Despite its being a research tool, it has everything necessary for such a task: access to Google Docs, ability to create long-form documents and the meticulous agentic flow. Of course, if we want to just restructure the document, it is important that the model does not use internet search at all.

The agent did the work quite well and quickly. However, when it was about to finish, it received several “continue research” urges from the programmatic orchestrator, and guess what? It started browsing the internet for the missing information.

The moral is: you can be creative with such systems, but you should expect the scaffolding to throw a wrench in the works.