Notatki

Some of the things you need to know about the latest GPT-5 release that evangelists don’t talk about:

GPT-5 is not one model. What they call GPT-5 outside of API context is a router that sends your request to a model that it thinks would work most efficiently on it. You need to look at the OpenAI’s promise to provide access to everyone in this light. They provide access to the router, and you don’t know the specific configuration it applies to you and whether you will actually be able to test the most powerful model. That would undoubtedly cause completely different experiences for different users.
GPT-5 is not a PhD. It is a pretty capable model (at least, one of them) that excels at some tasks. You can expect improvements in:
- coding capabilities (they are impressive according to some vibe tests, but according to the SWE-bench Verified benchmark, it has just a minor lead over Claude Opus 4.1);
- tool calling capabilities, which are the most important for agentic workloads;
- other tasks where OpenAI had access to immediate feedback.
While those are important improvements for a lot of areas, they don’t make the model PhD-level. The hallucination about the airfoil during the demo perfectly demonstrates that it still internalizes the most common belief, not the most current. It is a very hard problem to solve and actually a major roadblock on the way to AGI.
Diminished hallucinations are a double-edged sword. On the one hand, the less a model hallucinates, the better, as you can trust it more. On the other hand, the more you trust the model, the more likely you are to miss actual hallucinations. In the real world, the model that never hallucinates is the best. The model that hallucinates in 0.01% of cases can be more dangerous than one that hallucinates in 10% of cases.

My personal impression so far is that it still has the same issue as the previous models from OpenAI, namely that it is really superficial without careful prompting. It provides you with the most shallow analysis it can get away with and hides this fact by using the very well-structured responses.