What on Earth is Codex?
Good question, right? The thing is, until recently, OpenAI had a model called Codex, which was used as the foundation for autocompletion in GitHub Copilot. Then, OpenAI released a console agent for development, which they named, so no one would get confused, Codex. 1 Everyone had a laugh at OpenAI’s naming skills 2, and life went on. Until the fateful day when a tweet like this appeared from Sam Altman:

I was intrigued. Firstly, by the hype-building phrase “low-key research preview” 3, and secondly, by that very name. However, I wasn’t disappointed:

With this post, the family of entities named Codex was expanded by two new members:
- An o3 model variant, specifically fine-tuned for programming, named codex-1.
- A cloud-based agent capable of autonomously performing several different tasks on a GitHub repository 4.
It’s the latter we’ll be talking about today.
How does it work?
The sequence of actions required to use the agent is quite simple:
Step 1. We go to https://chatgpt.com/codex. There, we see a list of tasks the agent is executing/has executed, and an input window where we can select the repository we want to work with and the branch. There are two buttons - “Ask,” which analyzes the code before automatically suggesting specific sub-tasks, and “Code,” which will directly edit the code.
Step 2. To add a new repository, we go to the Environments menu and create a new environment there. We can specify the repo itself, configure the workspace settings, and test it out.
Step 3. We return to the main screen and write our request.
Step 4, the most interesting part. The environment deploys an Ubuntu-based container, installs the necessary packages, applies custom environment settings, and clones the repository inside. After this, the internet is disconnected 5 (not always anymore, which is what this post is about), and the agent begins its work. It does this long and diligently, as the scheduler is from o3, and it’s a good scheduler. This process can be observed in the Logs window:

Step 5. After several minutes of work, we are presented with a diff, which we can review and either create a pull-request on GitHub or steer the process back on the right track and repeat the iteration.
Steps 3-5 can be run in parallel, allowing you to go about your business while the agent works. The diffs it produces are very compact and pleasant to read, which increases confidence in the correctness of the result.
Internet Access
If you look closely at the process, you can see a certain limitation. The lack of internet access during execution breaks many build and testing processes, which limits the agent’s capabilities. This usually leads to correct results with the note “Running tests: failed.” This happens for various reasons, but primarily because it’s often necessary to download various libraries during the build. Of course, this can be circumvented during the environment setup stage when access is still available, but this process is far from always trivial.
This limitation was quite painful, which is why OpenAI recently announced that internet access can now be left on for the model. Of course, letting the model run wild in the pasture onto the internet without any restrictions is dangerous, so we were given a choice of access modes.

Firstly, we can leave everything as is and not allow internet access. Secondly, we can grant access, but to a limited number of resources 6. Additionally, we can allow the model to perform only read operations. The most daring and brave are allowed to give the agent full and unrestricted access and hope for the best.
Security
Why hope? Because direct access to unverified resources threatens a whole host of problems, about which OpenAI persistently warns.

In this warning, various concerns are all jumbled together, mentioning an attack and its consequences in the same list:
- Prompt injection: The model downloads a webpage, sees text that looks like an instruction, and happily executes it. The instruction might very well ask it to…
- Exfiltration of code or secrets: …upload the repository contents and secrets to a third-party resource;
- Inclusion of malware or vulnerabilities: …or use a malicious library instead of a legitimate one.
- Use content with license restriction: However, the model might decide on its own that the code it found in a repository under a GPL license is the best thing to add to our repository, which could lead to certain legal problems.
Therefore, it is strictly recommended to limit the model’s access only to trusted resources, and only when necessary.
A Small Example and Conclusion
Of course, internet access opens up a lot of interesting possibilities beyond simplifying builds. For example, the first thing I asked the model to do was to check this website for potential SEO issues. The result can be seen below.

To sum up, I can say that I generally like the tool. The small size of the diffs and the ability to run multiple tasks in parallel allow for small refactorings and improvements on a fire-and-forget basis, without having to dive deep into the context or get distracted from other tasks. This saves a lot of time and reduces cognitive load.
It will be interesting to see what competitors offer in response. 7
With the CLI suffix, although it’s referred to almost everywhere simply as Codex. ↩︎
After all, compared to their model naming, such a mix-up is child’s play. ↩︎
Sam is a master of mutually exclusive statements. ↩︎
Others are not (yet?) supported. ↩︎
For security reasons, which we’ll discuss further. ↩︎
In this case, the list of resources with various repositories is already pre-defined. ↩︎
A few days later at Google I/O 2025, Jules was introduced, working on a very similar principle, but due to the large size of the diffs generated by Gemini 2.5 Pro, it’s significantly harder to use. ↩︎