Inside the work

The case studies state what each system does. This page is for diving deeper into the agentic process itself: the routing, the parallel waves, the auditor gates, the autonomy dials, and the safety floor that makes the whole thing trustworthy. Start with the console below. Configure a run, then watch the engine work.

The Orchestrator Console

Pick a mission, set how much autonomy the system has and how far it can reach, then press Run. The engine steps through waves of parallel agents, gates each wave through an auditor, catches a contradiction and routes it back, and stops cleanly at the definition of done or the run budget.

How this works An interactive model of the orchestration engine I build. The control logic, gating, autonomy dials, run budget, and safety floor are real and run in your browser. The agent outputs shown are illustrative, not live model calls. Same inputs, same run, every time. Nothing is sent anywhere.

What you are seeing The mechanics this console models, named honestly. The control logic is real; the agent text is illustrative.

  • Stop-reason loop The agent loop advances on the model's stop reason, tool use versus end turn, not by parsing the text or hitting an arbitrary step cap.
  • Isolated subagents Each agent in a wave runs with its own context. The coordinator passes an explicit handoff in and reads a result out; subagents do not inherit the coordinator's history.
  • Parallel within a wave Independent agents in the same wave are dispatched in parallel. The next wave waits on the gate, not on any single agent.
  • Programmatic floor and gates The safety floor and the prerequisite gates are enforced in code, not requested in a prompt. They hold even at the highest authority tier.
  • Tool choice: auto, any, or forced A step can let the model choose a tool freely, require it to call some tool, or force one specific tool, depending on how tightly that step is constrained.
What the system is being asked to deliver.
Execution autonomy
How much a human stays in the loop, per step.
External authority
How far the system can reach into the real world.

Wave 0 of 4
Dispatches 0 of 14
State Idle

Wave plan

Press Run to dispatch the first wave.

Run log Illustrative output

The run log streams here. Lines are illustrative, not live model calls.

The real rollout

The clearest proof of the method is this site. It was built by orchestrating my own agents, with a human owning the decisions. Here is the actual rollout: the real waves, in order, and the real calls made along the way.

How to read this Each step below is a wave the orchestrator actually ran, gated by an auditor before the next began. Expand any wave to see what happened inside it and the decisions it forced. This is the rollout's own record, not a business metric.

  1. Ideation think-tank, parallel Pass

    Agents dispatched3Gate verdictPass

    Several ideation agents explored different framings at the same time: one led on hands-on experience, one on credibility and proof, one on narrative. They worked independently, then their directions were compared side by side.

    They converged on the same answer without being told to: lead with the operator identity. Someone who builds and runs real systems against real stakes, not someone who only describes them. Independent agents landing on the same lead is a strong signal, so that became the spine of the site.

  2. Synthesis and gate orchestrator composes, desk-check Pass

    Agents dispatched1Gate verdictPass

    The orchestrator selected the composing concepts from the ideation wave and shaped them into one direction. Before anything was built, a desk-check went and read the actual public repos to confirm a claim the draft wanted to make.

    Verified A claim was checked against the repos, and it held

    The draft claimed "16+ skills and agents." The desk-check read the public repos and confirmed the number was real, and if anything understated. The claim stayed because it was true and verifiable, not because it sounded good.

  3. Build specialists, parallel Pass

    Agents dispatched5Gate verdictPass

    Build specialists each took a slice and implemented in parallel: the operator hero, the guardrail panel, the verify links to the real repos, the two production case studies, and the teardown that explains how the site holds together.

    Promoted Real proof was elevated from tiles to full case studies

    The two production agents, with the +25% and +30% results, started as small tiles. They were promoted to full case studies because they are the strongest, most verifiable proof on the site. The strongest evidence earns the most room.

    Reframed Claims were framed honestly, not inflated

    Fleet work was framed as instrumented operations rather than a side hustle. Illustrative charts were labeled as illustrative. Only real, measured metrics were presented as fact. Where something was an estimate, it was shown as an estimate with its assumptions.

  4. Auditor gate pass, warn, or block Warn

    Agents dispatched1Gate verdictWarn

    Every wave was reviewed before the next one began. The auditor returns one of three verdicts: pass, warn, or block. Nothing moved forward on an unreviewed change. This wave is where two of the sharper calls on the whole build were made.

    Rejected A false credential was turned down

    A tester suggested adding a "CISA cert" for extra credibility. The gate rejected it because it was not true. Honesty over polish: a portfolio that bends the truth to look stronger is worth less than one that holds up to a reference check.

    Measured An accessibility issue was caught by measuring, not guessing

    A contrast problem was found by measuring the real ratios in the browser, not eyeballing them. One text color sat at 4.07 to 1, under the 4.5 to 1 AA bar. It was fixed to 6.32 to 1. Measured, not asserted.

  5. Persona testing recruiter and engineering-lead personas Warn

    Agents dispatched2Gate verdictWarn

    Two personas scored the result and clicked through it like real visitors: a recruiter and an engineering lead. Their first pass surfaced concrete fixes rather than vague impressions, which is the point of testing against a persona.

    First pass, then after the fixes were applied: the recruiter persona moved from 6 to 8.5, and the engineering-lead persona from 7.5 to 8.4. These are this project's own scores, shown as its record, not a benchmark.

  6. Iterate apply fixes, re-gate Pass

    Agents dispatched2Gate verdictPass

    The fixes from the audit and the persona tests were applied, then the work was re-gated. Review and testing are part of the loop, not a phase bolted on at the end, so a finding turns into a fix and a re-check in the same cycle.

    Caught A determinism check caught a bug before it shipped

    A headless determinism check on the Orchestrator Console caught that the run budget would clip before reaching the scripted safety-floor step. It was fixed before shipping, so the console you can run on this page always reaches the floor.

  7. Deploy preview, verify, then promote Pass

    Agents dispatched1Gate verdictPass

    The site went to a preview first, was verified there, and only then was promoted to the live domain. Promoting to production is irreversible, so it stays a human decision: the run prepares it and a person makes the call.

Triage Desk

A use-it-yourself tool. Paste a batch of lines, one per line, and a real client-side classifier proposes a first-pass label for each, then a human confirms or overrides every call. This is the same first-pass-then-human pattern behind a production labeling pipeline I built, where a confirmed first pass lifted data-grading output by 25 percent. That number is from the real system, not this page. The counts below are this tool's own.

How this works This is a demonstrator of the first-pass-then-human pattern. It uses a transparent rule-based classifier, a set of signal terms per category, not a live model and not AI. It runs entirely in your browser. Nothing you paste is sent anywhere or leaves this page. The same input always produces the same result, and every proposal is yours to confirm or override.

One item per line. Paste real feedback, tickets, or notes, or load a labeled sample.

Proposed labels Rule-based, low confidence floats up

    Press Triage to label your lines. Each row shows the proposed category, the confidence, and the signals that fired. You confirm or override every call.