lesson 01 · track 01 — fundamentals

What Codex
Actually Is.

An autonomous agent, not an autocomplete. Get this wrong and every task spec after is working against the model's actual shape.

10 min read 3 interactives +55 XP available

Your starting mental model matters more than any prompting trick you'll ever learn. Codex is not Copilot with a bigger context window. It's not a chat assistant that can read your repo. Those comparisons feel intuitive and are wrong — they lead people to write tasks that are shaped for the wrong thing.

The same is true of Claude Code, GitHub Copilot Workspace, and other modern agentic coding tools: all of them follow the same pattern. They are autonomous software engineering agents. When you give one a task, the following happens — every time, in this order:

  1. It receives your task description and any context you have given it (repo contents, convention file, prior examples).
  2. It spins up a fresh, isolated sandbox with your repo cloned into it.
  3. It plans a series of steps to complete your task.
  4. It executes those steps — reading files, editing code, running tests, reading the output, revising.
  5. When it thinks it's done, it produces a patch — a diff of the changes it wants to make.
  6. That patch lands as a pull request (or a diff, depending on the tool) for you to review and merge.

None of this is magic, and none of it is interactive. You hand over a task. Some minutes later, a PR shows up. In between, the agent was alone in a box.

Think "junior engineer you handed a Jira ticket," not "assistant you're pair-programming with."

This framing fixes most failure modes. A junior engineer with a vague ticket flails and delivers the wrong thing. A junior engineer with no acceptance criteria ships something that kind of works. A junior engineer with no access to your test suite guesses at correctness. The fixes for all three are obvious when you name them — and we'll spend the next eleven lessons doing exactly that.

One more mental model worth pinning: think of the agent's context window as a blackboard. Everything the agent knows about your task, your codebase, and your conventions must be written on that blackboard before the run starts. Nothing persists between runs. When the session ends, the blackboard is erased. Your AGENTS.md, your task spec, and your test suite are how you write on that blackboard — reliably, every time.

The three things in the contract

Every Codex run is a small contract between you and the agent. Three things are always in play. If you can name them, you can reason about why runs succeed or fail.

01 · the task

What you're asking for

Goal, constraints, acceptance criteria, out-of-scope. This is the entire situational brief. If a requirement isn't here, it doesn't exist to Codex.

02 · the repo

What the agent can see

The full codebase, the tests, the AGENTS.md conventions, configured lint and CI commands. Whatever the sandbox has installed.

03 · the sandbox

What the agent can do

Run commands, read files, write files, execute tests, install dependencies. No network unless you grant it. Ephemeral — every run starts clean.

The contract rule. A Codex run's quality is bounded by these three things. Vague task → flaky run. Missing repo context (no AGENTS.md) → generic PR. Broken sandbox (tests don't run) → unverifiable changes. Every technique in this course tightens one of these three.

What it looks like, actually

Words are cheap. Here's a condensed replay of what Codex does when you give it the task "add rate limiting to the /login endpoint". This is the real shape of a run — plan, probe, try, test, revise.

Quick check

The three failure modes, named

Most bad Codex runs trace to one of these three. Learn to name them and the fix is usually obvious.

Vague spec

The agent interprets an ambiguous goal, picks the most plausible interpretation, and commits to it. PR arrives solving the wrong problem. Fix: tighten goal and acceptance criteria.

No conventions

Without AGENTS.md, Codex writes reasonable generic code. It won't match your style, your error-handling pattern, your test layout. Fix: commit conventions to the repo.

Broken feedback loop

Tests don't run in the sandbox, or the command isn't documented. Agent can't verify its own work. Ships hopeful code. Fix: make tests runnable with one command.

One exercise before you move on

Here are three questions about Codex. Your answer doesn't matter — this is a self-check. Read each, say the answer out loud, then flip the card.

Try it: bespoke interactive

A hands-on simulation for this lesson. Click around, drag things, and feel the shape of the concept.