Why agent-first Link to heading
Most teams still treat AI agents like fancy autocomplete: open IDE, ask Copilot, paste, repeat. That’s a tool, not a workflow. We saw the hype around tools like OpenClaw and Hermes Agent, but many users install them, poke around, and then get lost because they do not have an operating loop for using agents effectively.
As a Forward Deployed Engineer, my job is to redesign the workflow itself: agents handle the repeatable execution, humans review the highest-leverage decisions, and the system keeps shipping.
Fleet Link to heading
| Role | Agent | Model | Job |
|---|---|---|---|
| Master PM | Droid | Opus | Breaks goals into tickets, assigns work, tracks state |
| Backend / Platform | Claude Code | Opus | API, services, infra, CI/CD, deploys |
| Reviewer | Codex | GPT-5.5 | Checks risk, tests, correctness, business intent |
| Junior Dev | Pi Coding | Qwen3.6-27B local | Cheap long-tail work, scaffolding, docs, small refactors |
Pi Coding runs the junior dev loop on my DGX Spark, RTX Pro 6000, and RTX 3090. The point is cost, latency, and privacy: simple work stays local; sensitive context does not need to leave the machine.
Strategy 1: human-reviewed agent fleet Link to heading
Best for ambiguous, high-risk, or product-sensitive work.
flowchart LR
H[Human goal] --> PM[Droid PM]
PM --> BE[Backend / Platform]
PM --> JR[Local Junior]
JR -.asks when stuck.-> BE
BE --> PR[PR]
JR --> PR
PR --> CR[Codex review]
CR --> HR[Human review]
HR --> Merge[(Merge)]
Loop:
- Human gives the goal.
- PM turns it into tickets.
- Agents ship PRs.
- Codex reviews for correctness, tests, risky diffs, and intent mismatch.
- Human merges or sends it back.
The human enters only at goal-setting and final review. Everything else is agents coordinating through tickets.
Strategy 2: two-reviewer auto-merge loop Link to heading
Best for repeatable batches with clear acceptance criteria and reliable tests.
flowchart TD
You[You: @CEO start batch for feature X] --> CEO[CEO creates issues + queue]
CEO --> Confirm[You confirm batch]
Confirm --> Dev[Blockchain Dev takes next issue]
Dev --> PR[Open PR]
PR --> R1[Reviewer 1]
PR --> R2[Reviewer 2, different LLM]
R1 --> Gate{Both approve?}
R2 --> Gate
Gate -->|yes| Head[Head merges]
Gate -->|no| Fix[Back to dev]
Fix --> Dev
Head --> Dev
Reviewer diversity matters. One reviewer can be strict on tests/security; the other can focus on product intent, integration risk, and maintainability. I do not use this for taste-heavy or roadmap-sensitive work. It is better for small protocol changes, CRUD flows, test coverage, migrations, docs, and isolated feature slices.
CI/CD is the safety rail Link to heading
Auto-merge without CI/CD is automated risk. At minimum, I want:
- Required tests, lint, typecheck, and build checks on every PR
- A staging/dev branch before production
- A protected production branch with stricter rules
- Preview deployments for UI/product changes
- Rollback paths and observability
- Clear ownership for secrets, migrations, infra, and irreversible actions
With those gates, the system becomes a factory line: agents produce, reviewers inspect, CI enforces the baseline, staging catches integration issues, and production stays protected.
What changed Link to heading
Before this setup, I was the bottleneck on every PR. Now the loop is risk-based:
- Ambiguous work → human-reviewed agent fleet
- Clear batch work → two-reviewer auto-merge
- Long-tail/simple work → local junior model
The hard failure mode is rarely “agent wrote bad code.” It is “agent wrote technically correct code that missed the business intent.” That is why FDE review still matters.
What I measure Link to heading
- Cycle time from ticket to reviewed PR
- Useful PRs per human review hour
- Rework caused by misunderstood intent
- Premium API spend vs local LLM work
- Review quality: tests, risk notes, missed edge cases
This is the practical version of my Forward Deployed Engineer Malaysia/Singapore work: embed with a team, find the real bottleneck, and redesign the operating loop so agents execute while humans review what matters. If you want to build something similar, reach me on Twitter/X or LinkedIn.
