AI Verified

Codex Review Gate

B7.0

Request structured code review from Codex CLI, evaluate pass/fail verdict, and manage the review-improve-retry workflow automatically.

intermediateCoding & Developmentcodingclaude-skill
Get This Skill on GitHub

Overview


name: codex-review-gate description: Automated code review gate using Codex CLI. Use when completing a task and needing external AI review before proceeding. Requests structured review from Codex, determines pass/fail verdict, auto-retries on failure with improvements, and reports to user if gate fails twice. Triggers on phrases like "review my work", "check this code", "run review gate", "get codex review", or when a significant implementation task is completed.

Codex Review Gate

Request structured code review from Codex CLI, evaluate pass/fail verdict, and manage the review-improve-retry workflow automatically.

Prerequisites

  • Codex CLI installed and in PATH
  • Valid credentials at ~/.codex/config.toml
  • Verify: codex --version

Review Gate Workflow

┌─────────────────────────────────────────────────────────────┐
│                       REVIEW GATE                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. PREPARE CONTEXT                                         │
│     └─ Summarize: what was done, what changed, why          │
│                                                             │
│  2. REQUEST REVIEW (scripts/codex_review.py)                │
│     └─ sandbox: read-only (safe, no file modifications)     │
│     └─ reasoning: high (default, configurable via env)      │
│                                                             │
│  3. EVALUATE VERDICT                                        │
│     ├─ PASS → Proceed to next task                          │
│     └─ FAIL → Go to step 4                                  │
│                                                             │
│  4. IMPROVE (Claude fixes based on review feedback)         │
│     └─ Address issues raised in review                      │
│                                                             │
│  5. RETRY REVIEW (2nd attempt)                              │
│     ├─ PASS → Proceed to next task                          │
│     └─ FAIL → Go to step 6                                  │
│                                                             │
│  6. REPORT TO USER                                          │
│     └─ Summarize issues, explain what was tried             │
│     └─ Ask for guidance                                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Context Preparation (Critical)

Good context = good review. Structure the review request clearly:

Context Template

## Task Completed
[What was the goal? What feature/fix was implemented?]

## Changes Made
[List of files modified, functions added/changed]

## Key Decisions
[Any architectural choices, tradeoffs made]

## Areas of Concern
[Anything you're uncertain about, edge cases]

Example Context

## Task Completed
Implemented user authentication with JWT tokens.

## Changes Made
- Added auth/jwt.py: JWT token generation and validation
- Modified routes/user.py: Added login/logout endpoints
- Updated models/user.py: Added password hashing

## Key Decisions
- Used HS256 algorithm for JWT (simpler, sufficient for single-server)
- Token expiry set to 24 hours
- Refresh tokens not implemented yet

## Areas of Concern
- Password validation regex might be too strict
- Need to verify token invalidation on logout works correctly

Running the Review

Basic Usage

python scripts/codex_review.py \
  --context "Implemented feature X with files A, B, C" \
  --code "path/to/file_or_diff.py" \
  --cd "/project/root"

With Model Selection

python scripts/codex_review.py \
  --context "..." \
  --code "..." \
  --cd "/project" \
  --model "gpt-5" \
  --reasoning-effort "high"

Parameters

ParameterRequiredDefaultDescription
--contextYes-Task description and changes made
--codeYes-File path (relative to --cd) or inline code/diff
--cdYes-Project root directory
--modelNoCodex defaultModel. Env: CODEX_MODEL
--reasoning-effortNohighnone/low/medium/high/xhigh. Env: CODEX_REASONING_EFFORT
--log-dirNo./codex-review-logsLog directory
--session-idNo-Resume previous session
--return-all-messagesNofalseInclude full reasoning trace
--timeoutNo1800 (30min)Timeout in seconds. Env: CODEX_REVIEW_TIMEOUT

Output Format

{
  "success": true,
  "passed": true,
  "verdict": "PASS",
  "session_id": "thread_abc123",
  "review": "Full review text from Codex...",
  "issues": [],
  "summary": "Code looks good, no critical issues found",
  "suggestions": ["Consider adding input validation"],
  "log_file": "/project/codex-review-logs/review_20250101_120000.json"
}

Gate Logic Implementation

When invoking the review gate, follow this logic:

Step 1: First Review

RESULT=$(python scripts/codex_review.py \
  --context "$CONTEXT" \
  --code "$CODE_PATH" \
  --cd "$PROJECT_DIR")

if echo "$RESULT" | jq -e '.passed == true' > /dev/null; then
  echo "✅ Review PASSED - proceeding to next task"
else
  echo "❌ Review FAILED - attempting improvements"
  # Go to Step 2
fi

Step 2: Improve Based on Feedback

Read the issues and review from the result:

issues = result.get("issues", [])
review = result.get("review", "")

Address each issue systematically:

  1. Parse the issues list
  2. Fix each issue in the code
  3. Verify fixes locally if possible

Step 3: Retry Review

RESULT2=$(python scripts/codex_review.py \
  --context "Addressed review feedback: $ISSUES_FIXED" \
  --code "$UPDATED_CODE_PATH" \
  --cd "$PROJECT_DIR" \
  --session-id "$SESSION_ID")

if echo "$RESULT2" | jq -e '.passed == true' > /dev/null; then
  echo "✅ Review PASSED on retry"
else
  echo "⚠️ Review still FAILING - escalating to user"
  # Go to Step 4
fi

Step 4: Report to User

When the gate fails twice, report clearly:

## Review Gate Failed

### Summary
The code review gate failed after 2 attempts.

### Original Issues (1st review)
- [List issues from first review]

### Remaining Issues (2nd review)
- [List issues from second review]

### Actions Taken
- [What was fixed between reviews]

### Recommended Next Steps
1. [Specific suggestion]
2. [Specific suggestion]

### Review Logs
- 1st review: /path/to/review_1.json
- 2nd review: /path/to/review_2.json

Logging

All reviews are automatically logged to codex-review-logs/:

codex-review-logs/
├── review_20250101_120000.json
├── review_20250101_121500.json
└── ...

Each log contains:

  • Timestamp and duration
  • Context provided
  • Code reviewed (truncated if large)
  • Model and settings used
  • Full result with verdict

Safety Configuration

The review runs with these safe defaults:

SettingValueReason
sandboxread-onlyCannot modify files
-a neverenabledFull automation without approval prompts
skip-git-repo-checkenabledWorks outside git repos

Note: --full-auto is intentionally NOT used because it overrides sandbox to workspace-write. Instead, we use -a never separately to achieve automation while keeping read-only sandbox.

For HPC/Slurm environments, add --yolo flag if Landlock errors occur.

Model Selection Guide

Review TypeRecommended ModelReasoning
Quick sanity checkgpt-5-minilow
Standard code reviewgpt-5medium
Security auditgpt-5high
Complex algorithmgpt-5-codexhigh
Architecture reviewgpt-5xhigh

Troubleshooting

No Response from Codex

  1. Verify Codex CLI: codex --version
  2. Check credentials: cat ~/.codex/config.toml
  3. Test minimal command: codex exec "hello world"

Verdict Not Parsed

The script attempts multiple parsing strategies:

  1. JSON block in response
  2. Verdict markers (PASS/FAIL, LGTM, etc.)
  3. Issue count heuristics

If parsing fails, check verdict_parsed: false in output and review the raw review text.

Review Too Slow

  • Lower --reasoning-effort to "medium" or "low"
  • Use faster model (gpt-5-mini)
  • Reduce code size (review diffs instead of full files)

Best Practices

  1. Be specific in context: The more detail, the better the review
  2. Review diffs, not full files: For changes, provide unified diff
  3. Set appropriate reasoning: Use "high" for important code, "low" for minor changes
  4. Check logs: Review logs help track patterns and improve process
  5. Trust but verify: Use review as input, not absolute truth

What This Skill Can Do

AI-generated examples showing real capabilities

Ready to use this skill?

Visit the original repository to get the full skill configuration and installation instructions.

View on GitHub

Related Skills