Why agent-first Link to heading

Most teams still treat AI agents like fancy autocomplete: open IDE, ask Copilot, paste, repeat. That’s a tool, not a workflow. We saw the hype around tools like OpenClaw and Hermes Agent, but many users install them, poke around, and then get lost because they do not have an operating loop for using agents effectively.

As a Forward Deployed Engineer, my job is to redesign the workflow itself: agents handle the repeatable execution, humans review the highest-leverage decisions, and the system keeps shipping.

Fleet Link to heading

RoleAgentModelJob
Master PMDroidOpusBreaks goals into tickets, assigns work, tracks state
Backend / PlatformClaude CodeOpusAPI, services, infra, CI/CD, deploys
ReviewerCodexGPT-5.5Checks risk, tests, correctness, business intent
Junior DevPi CodingQwen3.6-27B localCheap long-tail work, scaffolding, docs, small refactors

Pi Coding runs the junior dev loop on my DGX Spark, RTX Pro 6000, and RTX 3090. The point is cost, latency, and privacy: simple work stays local; sensitive context does not need to leave the machine.

Strategy 1: human-reviewed agent fleet Link to heading

Best for ambiguous, high-risk, or product-sensitive work.

flowchart LR
    H[Human goal] --> PM[Droid PM]
    PM --> BE[Backend / Platform]
    PM --> JR[Local Junior]
    JR -.asks when stuck.-> BE
    BE --> PR[PR]
    JR --> PR
    PR --> CR[Codex review]
    CR --> HR[Human review]
    HR --> Merge[(Merge)]

Loop:

  1. Human gives the goal.
  2. PM turns it into tickets.
  3. Agents ship PRs.
  4. Codex reviews for correctness, tests, risky diffs, and intent mismatch.
  5. Human merges or sends it back.

The human enters only at goal-setting and final review. Everything else is agents coordinating through tickets.

Strategy 2: two-reviewer auto-merge loop Link to heading

Best for repeatable batches with clear acceptance criteria and reliable tests.

flowchart TD
    You[You: @CEO start batch for feature X] --> CEO[CEO creates issues + queue]
    CEO --> Confirm[You confirm batch]
    Confirm --> Dev[Blockchain Dev takes next issue]
    Dev --> PR[Open PR]
    PR --> R1[Reviewer 1]
    PR --> R2[Reviewer 2, different LLM]
    R1 --> Gate{Both approve?}
    R2 --> Gate
    Gate -->|yes| Head[Head merges]
    Gate -->|no| Fix[Back to dev]
    Fix --> Dev
    Head --> Dev

Reviewer diversity matters. One reviewer can be strict on tests/security; the other can focus on product intent, integration risk, and maintainability. I do not use this for taste-heavy or roadmap-sensitive work. It is better for small protocol changes, CRUD flows, test coverage, migrations, docs, and isolated feature slices.

CI/CD is the safety rail Link to heading

Auto-merge without CI/CD is automated risk. At minimum, I want:

  • Required tests, lint, typecheck, and build checks on every PR
  • A staging/dev branch before production
  • A protected production branch with stricter rules
  • Preview deployments for UI/product changes
  • Rollback paths and observability
  • Clear ownership for secrets, migrations, infra, and irreversible actions

With those gates, the system becomes a factory line: agents produce, reviewers inspect, CI enforces the baseline, staging catches integration issues, and production stays protected.

What changed Link to heading

Before this setup, I was the bottleneck on every PR. Now the loop is risk-based:

  • Ambiguous work → human-reviewed agent fleet
  • Clear batch work → two-reviewer auto-merge
  • Long-tail/simple work → local junior model

The hard failure mode is rarely “agent wrote bad code.” It is “agent wrote technically correct code that missed the business intent.” That is why FDE review still matters.

What I measure Link to heading

  • Cycle time from ticket to reviewed PR
  • Useful PRs per human review hour
  • Rework caused by misunderstood intent
  • Premium API spend vs local LLM work
  • Review quality: tests, risk notes, missed edge cases

This is the practical version of my Forward Deployed Engineer Malaysia/Singapore work: embed with a team, find the real bottleneck, and redesign the operating loop so agents execute while humans review what matters. If you want to build something similar, reach me on Twitter/X or LinkedIn.