My Multica workflow: a human + local-LLM agent fleet

Why agent-first Link to heading

Most teams still treat AI agents like fancy autocomplete: open IDE, ask Copilot, paste, repeat. That’s a tool, not a workflow. We saw the hype around tools like OpenClaw and Hermes Agent, but many users install them, poke around, and then get lost because they do not have an operating loop for using agents effectively.

As a Forward Deployed Engineer, my job is to redesign the workflow itself: agents handle the repeatable execution, humans review the highest-leverage decisions, and the system keeps shipping.

Hand-drawn wizard penguin operating an agent fleet loop from goal to PR, Codex review, and human merge

Fleet Link to heading

Role	Agent	Model	Job
Master PM	Droid	Opus	Breaks goals into tickets, assigns work, tracks state
Backend / Platform	Claude Code	Opus	API, services, infra, CI/CD, deploys
Reviewer	Codex	GPT-5.5	Checks risk, tests, correctness, business intent
Junior Dev	Pi Coding	Qwen3.6-27B local	Cheap long-tail work, scaffolding, docs, small refactors

Pi Coding runs the junior dev loop on my DGX Spark, RTX Pro 6000, and RTX 3090. The point is cost, latency, and privacy: simple work stays local; sensitive context does not need to leave the machine.

Strategy 1: human-reviewed agent fleet Link to heading

Best for ambiguous, high-risk, or product-sensitive work.

flowchart LR
    H[Human goal] --> PM[Droid PM]
    PM --> BE[Backend / Platform]
    PM --> JR[Local Junior]
    JR -.asks when stuck.-> BE
    BE --> PR[PR]
    JR --> PR
    PR --> CR[Codex review]
    CR --> HR[Human review]
    HR --> Merge[(Merge)]

Loop:

Human gives the goal.
PM turns it into tickets.
Agents ship PRs.
Codex reviews for correctness, tests, risky diffs, and intent mismatch.
Human merges or sends it back.

The human enters only at goal-setting and final review. Everything else is agents coordinating through tickets.

Strategy 2: two-reviewer auto-merge loop Link to heading

Best for repeatable batches with clear acceptance criteria and reliable tests.

flowchart TD
    You[You: @CEO start batch for feature X] --> CEO[CEO creates issues + queue]
    CEO --> Confirm[You confirm batch]
    Confirm --> Dev[Blockchain Dev takes next issue]
    Dev --> PR[Open PR]
    PR --> R1[Reviewer 1]
    PR --> R2[Reviewer 2, different LLM]
    R1 --> Gate{Both approve?}
    R2 --> Gate
    Gate -->|yes| Head[Head merges]
    Gate -->|no| Fix[Back to dev]
    Fix --> Dev
    Head --> Dev

Reviewer diversity matters. One reviewer can be strict on tests/security; the other can focus on product intent, integration risk, and maintainability. I do not use this for taste-heavy or roadmap-sensitive work. It is better for small protocol changes, CRUD flows, test coverage, migrations, docs, and isolated feature slices.

CI/CD is the safety rail Link to heading

Auto-merge without CI/CD is automated risk. At minimum, I want:

Required tests, lint, typecheck, and build checks on every PR
A staging/dev branch before production
A protected production branch with stricter rules
Preview deployments for UI/product changes
Rollback paths and observability
Clear ownership for secrets, migrations, infra, and irreversible actions

With those gates, the system becomes a factory line: agents produce, reviewers inspect, CI enforces the baseline, staging catches integration issues, and production stays protected.

What changed Link to heading

Before this setup, I was the bottleneck on every PR. Now the loop is risk-based:

Ambiguous work → human-reviewed agent fleet
Clear batch work → two-reviewer auto-merge
Long-tail/simple work → local junior model

The hard failure mode is rarely “agent wrote bad code.” It is “agent wrote technically correct code that missed the business intent.” That is why FDE review still matters.

What I measure Link to heading

Cycle time from ticket to reviewed PR
Useful PRs per human review hour
Rework caused by misunderstood intent
Premium API spend vs local LLM work
Review quality: tests, risk notes, missed edge cases

This is the practical version of my Forward Deployed Engineer Malaysia/Singapore work: embed with a team, find the real bottleneck, and redesign the operating loop so agents execute while humans review what matters. If you want to build something similar, reach me on Twitter/X or LinkedIn.