DEV Community

Cover image for I run Claude Code and Codex side by side. Here's the division of labor that actually works.
Rapls
Rapls

Posted on

I run Claude Code and Codex side by side. Here's the division of labor that actually works.

For a while I felt slightly embarrassed about keeping two agentic coding tools open at once. Claude Code in one terminal, Codex in another. It looked like I couldn't commit to one. Then I noticed I was reaching for each of them at different moments, on purpose, and the embarrassment turned into a workflow.

The short version: one of them is for building and exploring, the other is for running the boring, repeatable work. This post is the division of labor I landed on, built around the routine automation that made it obvious, plus the cost logic underneath it. I build WordPress plugins, so my examples lean that way, but the split is general.

The split that took me a while to see

Some tasks are a conversation. You poke at the problem, change your mind, follow a thread, back up. Other tasks are a straight line. You know exactly what you want done, you just don't want to do it by hand for the fortieth time.

I use Claude Code for the first kind. It holds the whole project in its head and is comfortable going back and forth while a design takes shape. I use Codex, specifically its non-interactive mode, for the second kind: the straight-line, do-this-exact-thing work that I want to fire from a script.

Once I framed it as conversation versus straight line, the choice of tool stopped being a vibe and became a question I could answer in a second.

Codex in non-interactive mode is the automation workhorse

The piece that made the split practical is codex exec. Instead of opening a chat, you hand Codex one instruction and it runs once and prints the result to stdout. That is the part you can put in a script.

codex exec "summarize the structure of this repo in one paragraph"
Enter fullscreen mode Exit fullscreen mode

I set the model and reasoning once, in ~/.codex/config.toml:

model = "gpt-5.5"
model_reasoning_effort = "medium"
approval_policy = "on-request"
sandbox_mode = "workspace-write"
Enter fullscreen mode Exit fullscreen mode

Medium reasoning is a deliberate choice. Routine work is not hard design thinking, it's mechanical edits and summaries, and pointing heavy reasoning at it just makes the run slower and pricier without changing the output. GPT-5.5 at medium is plenty for this, and I bump it up in the moment only when a task actually turns hard. approval_policy = "on-request" makes Codex ask before it writes files or runs commands, and sandbox_mode = "workspace-write" keeps it from touching anything outside the working folder. Both are safety rails I leave on by default.

Project conventions go in AGENTS.md, which is Codex's version of a CLAUDE.md. Codex reads it before each task, so the output stays consistent with how the project wants things done.

The routine work I actually automate

Here is the boring stuff that used to nibble at my day.

Commit messages, from the staged diff:

git add -A
codex exec "read git diff --staged and output a single-line commit message that summarizes the change. No preamble, message only."
Enter fullscreen mode Exit fullscreen mode

Version bumps, which is the one that earns its keep. A WordPress plugin keeps its version in two places that have to match: the Version: header in the main PHP file and the Stable tag: in readme.txt. Miss one and the release breaks. By hand, I get this wrong often enough to dread it.

codex exec "bump this plugin's version from 1.0.9.10 to 1.0.9.11. Change two places: the Version: header in the main PHP file and the Stable tag in readme.txt. Change nothing else."
Enter fullscreen mode Exit fullscreen mode

With on-request, Codex shows me the diff before applying it, so I confirm the two changes are exactly what I asked for. Then I wrap the release chores into one script:

#!/usr/bin/env bash
set -euo pipefail
NEW_VERSION="$1"

codex exec "bump the plugin version to $NEW_VERSION in the PHP header and readme.txt Stable tag, nothing else."
codex exec "add a $NEW_VERSION section to the top of CHANGELOG.md from recent commits, matching the existing format."
git diff   # I read this before anything ships
Enter fullscreen mode Exit fullscreen mode
bash release-prep.sh 1.0.9.11
Enter fullscreen mode Exit fullscreen mode

The thing that used to be a careful five-minute ritual is now one command and a diff review.

Where the second tool earns its place

If codex exec handles the straight-line work, why keep Claude Code in the loop at all? Because the two are good at different things, and a few patterns only work when you have both.

The one I use most is cross-model review. I build something with Claude Code, then have Codex review the diff:

codex exec "review git diff for security issues and bugs. Cite file and line for each problem. Give findings only, not praise or general impressions."
Enter fullscreen mode Exit fullscreen mode

A model reviewing its own output tends to like what it wrote. Hand the diff to a different model and it trips on things the first one walked past as obvious. The instruction to skip praise matters more than it looks. Without it you get "this looks solid" followed by a soft non-answer. Ask for problems and locations, nothing else, and the review gets useful.

The second pattern is extract-the-repeat. I explore a new feature interactively in Claude Code, and somewhere in that mess I notice a step I'm going to do every time. That step gets pulled out into a codex exec line and added to a script. The thinking stays in the conversational tool, the repetition moves to the straight-line one.

The third I save for changes I can't afford to get wrong: run the same request through both and compare.

claude -p "propose a refactor for this function" > claude.txt
codex exec "propose a refactor for this function" > codex.txt
diff claude.txt codex.txt
Enter fullscreen mode Exit fullscreen mode

If both land in the same place, I relax. If they diverge, that gap is exactly where a human decision is needed. It's too heavy to do constantly, so it's reserved for the scary diffs.

Handing context between them

Switching tools has a small tax, and how you pay it matters. Claude Code reads CLAUDE.md, Codex reads AGENTS.md. I keep both in the repo with the same conventions so either tool behaves the same way. The trap is updating one and forgetting the other, so changing a convention means editing both, every time.

When I move a long task from one tool to the other, I don't dump the whole history across. I have the first tool summarize where things stand, and hand over the summary. These tools can only hold so much at once, so moving the gist instead of the full transcript keeps the second tool sharp.

The cost logic, including the June 15 change

Money is part of why two tools beats one here. My rough rule: do the long, exploratory work where it's flat-rate, and the short, mechanical work where metered is cheap anyway. Interactive Claude Code runs inside the subscription. A codex exec call is small, so even metered it costs little per run.

This got sharper on June 15, 2026, when Anthropic moved programmatic Claude use, the claude -p headless path and the Agent SDK, off the subscription and onto separate metered credit. Interactive Claude Code in the terminal stayed on the plan. So scripting Claude with claude -p is no longer a flat-rate move. Which lines up neatly with the split I already had: explore interactively in Claude Code on the flat plan, run short automation through codex exec where metered is cheap. Pricing and terms shift, so check the current numbers on each vendor, but the shape of the logic holds.

The loop, end to end

Put together, a small feature looks like this:

  1. Build it interactively in Claude Code.
  2. Have Codex review the diff.
  3. Fold in the findings, read the diff myself, commit.
  4. Run release-prep.sh for the version bump and changelog.
  5. Read git diff one more time, then push.

Build, check, tidy, each handed to the tool that's good at it, with judgment and the final read kept in my hands.

What goes wrong with two tools

Running two has its own friction, and pretending it doesn't is how you lose the benefit.

  • Two convention files drift. CLAUDE.md and AGENTS.md falling out of sync means the tools start behaving differently. Edit both.
  • Don't point both at the same files at once. One tool's edit can stomp the other's. Work one at a time.
  • Reviewers drift into agreement. Without "findings only," the review turns into polite approval.
  • Don't over-tool. Two tools on a job that needs one is just more setup and more cost.

Two is not always better than one. If one tool covers it, use one. Reach for both only when the split pays: a separate reviewer, a flat-versus-metered cost difference, a build phase and a repeat phase that genuinely want different strengths.

One rule I don't break

The speed is real, and it makes a bad habit tempting: approving diffs without reading them, or running automation with the approval prompt turned off. Treat anything either tool writes as untrusted input until you've read it. Version numbers, config, anything touching user input, get a human diff review before they commit or ship, no matter which model produced them. Keep the approval prompt on outside of contexts you fully understand. The point of automating the boring work is to free up attention, so spend some of it on the review.

A note to my next self

The division held up because it maps to something real: some work is a conversation and some work is a straight line, and the tools are honestly better at one or the other. Build and explore in the conversational one, run and repeat in the straight-line one, let a different model check the first model's work, and put the boring release chores behind a single command. Keep judgment and the last diff for yourself. That's the whole system, and the day one tool covers the job, I'll happily use one.

References

I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.

Top comments (1)

Collapse
 
mehmetcanfarsak profile image
Mehmet Can Farsak

Great breakdown of the conversation vs straight-line split. That got me thinking about a third mode I keep running into: brainstorming/ideation. Both Claude Code and Codex tend to jump straight to implementation when asked to explore ideas — that execution drift is a real productivity killer.

I put together a small hook-based plugin (Brainstorm-Mode by mehmetcanfarsak on GitHub) that uses PreToolUse hooks to keep agents in ideation mode and block premature tool calls. Three modes (divergent, actionable, academic) so you pick the right headspace. Plugs right into the hook system if anyone's interested in the thinking behind it.