Your starting mental model matters more than any prompting trick you'll ever learn. Codex is not Copilot with a bigger context window. It's not a chat assistant that can read your repo. Those comparisons feel intuitive and are wrong — they lead people to write tasks that are shaped for the wrong thing.
The same is true of Claude Code, GitHub Copilot Workspace, and other modern agentic coding tools: all of them follow the same pattern. They are autonomous software engineering agents. When you give one a task, the following happens — every time, in this order:
- It receives your task description and any context you have given it (repo contents, convention file, prior examples).
- It spins up a fresh, isolated sandbox with your repo cloned into it.
- It plans a series of steps to complete your task.
- It executes those steps — reading files, editing code, running tests, reading the output, revising.
- When it thinks it's done, it produces a patch — a diff of the changes it wants to make.
- That patch lands as a pull request (or a diff, depending on the tool) for you to review and merge.
None of this is magic, and none of it is interactive. You hand over a task. Some minutes later, a PR shows up. In between, the agent was alone in a box.
This framing fixes most failure modes. A junior engineer with a vague ticket flails and delivers the wrong thing. A junior engineer with no acceptance criteria ships something that kind of works. A junior engineer with no access to your test suite guesses at correctness. The fixes for all three are obvious when you name them — and we'll spend the next eleven lessons doing exactly that.
One more mental model worth pinning: think of the agent's context window as a blackboard. Everything the agent knows about your task, your codebase, and your conventions must be written on that blackboard before the run starts. Nothing persists between runs. When the session ends, the blackboard is erased. Your AGENTS.md, your task spec, and your test suite are how you write on that blackboard — reliably, every time.
The three things in the contract
Every Codex run is a small contract between you and the agent. Three things are always in play. If you can name them, you can reason about why runs succeed or fail.
What you're asking for
Goal, constraints, acceptance criteria, out-of-scope. This is the entire situational brief. If a requirement isn't here, it doesn't exist to Codex.
What the agent can see
The full codebase, the tests, the AGENTS.md conventions, configured lint and CI commands. Whatever the sandbox has installed.
What the agent can do
Run commands, read files, write files, execute tests, install dependencies. No network unless you grant it. Ephemeral — every run starts clean.
AGENTS.md) → generic PR. Broken sandbox (tests don't run) → unverifiable changes. Every technique in this course tightens one of these three.What it looks like, actually
Words are cheap. Here's a condensed replay of what Codex does when you give it the task "add rate limiting to the /login endpoint". This is the real shape of a run — plan, probe, try, test, revise.
Quick check
The three failure modes, named
Most bad Codex runs trace to one of these three. Learn to name them and the fix is usually obvious.
Vague spec
The agent interprets an ambiguous goal, picks the most plausible interpretation, and commits to it. PR arrives solving the wrong problem. Fix: tighten goal and acceptance criteria.
No conventions
Without AGENTS.md, Codex writes reasonable generic code. It won't match your style, your error-handling pattern, your test layout. Fix: commit conventions to the repo.
Broken feedback loop
Tests don't run in the sandbox, or the command isn't documented. Agent can't verify its own work. Ships hopeful code. Fix: make tests runnable with one command.
One exercise before you move on
Here are three questions about Codex. Your answer doesn't matter — this is a self-check. Read each, say the answer out loud, then flip the card.
Try it: bespoke interactive
A hands-on simulation for this lesson. Click around, drag things, and feel the shape of the concept.