Infrastructure for Multi-Agent Systems

In this lesson: set the rules and let the agent finish, treat the state as the hard part, and build swarms with a watchdog.

Matt Wood · 60 min · May 2026
Released May 20, 2026

Top 3 takeaways

01

Define the rules and let the agent finish the job

Managed agents move you away from chatting back and forth. You define the primitives, the agent, the environment, and the session, and the work then runs on hosted infrastructure until it is done.

02

State is the hard part

A demo becomes a product once sessions, memory, and environments are managed for you. A finished session goes idle so you pay nothing for compute while it sits, and anything you want to keep lives in a memory store.

03

Build swarms with a watchdog

A coordinator plus specialist sub-agents keeps context isolated while work runs in parallel, and a sentinel agent watches long-running jobs so they do not break. You can reuse the same pattern inside Claude Code.

Matt Wood

Matt Wood

Teacher, Gauntlet AI

Matt Wood is an AI instructor at Gauntlet and a veteran software builder with more than 20 years of experience creating products for companies ranging from JPMorgan to NBC. He specializes in AI agents, analytics, and turning complex systems into practical tools that teams can use every day.

Lesson notes

A written walkthrough of the lecture, covering the patterns, the code, and the things that trip people up.

Managed Agents Move Work Off Your Machine

Claude Managed Agents represent a shift away from the traditional AI workflow where engineers run agents on their own machines and stitch together infrastructure with frameworks like LangChain or the AI SDK.

Instead, Anthropic hosts and manages the infrastructure. You define the rules, tools, and environment, then the agent executes the work on your behalf. The focus moves from continuously interacting with an agent to designing a system and letting it run.

The key concept is primitives: reusable building blocks that can be combined to create workflows declaratively.

The Core Primitives

Managed Agents are built from a handful of foundational components:

  • Agents define the model, instructions, and available tools.
  • Environments provide the compute environment where work runs.
  • Sessions combine agents and environments to execute tasks.
  • Skills package prompts, scripts, and reusable workflows.
  • Vaults securely store credentials and external connections.

Together, these primitives make it possible to build sophisticated workflows without managing servers or custom infrastructure.

New Capabilities After Launch

Anthropic later introduced additional primitives that expanded what managed agents can do.

Outcomes allow developers to define success criteria and evaluation rubrics. Sandboxes let sensitive workloads run on infrastructure you control while still leveraging Claude's reasoning capabilities. MCP Connectors make it possible to connect agents to external tools and internal systems.

The result is a workflow that feels much closer to a serverless application than a traditional chatbot.

Building Agent Workforces

These primitives make it possible to build multi-agent systems where a coordinator agent delegates work to specialized sub-agents.

A single workflow might include planners, researchers, designers, reviewers, and watchdog agents running in parallel. Breaking work into specialized agents helps reduce context overload and improves reliability.

The lecture also demonstrates Braid, a tool that generates managed-agent workflows from natural-language descriptions. Rather than manually configuring every component, users can describe the workflow they want and generate the underlying agents, environments, and evaluation criteria automatically.

The Tradeoffs and Costs

Managed Agents simplify infrastructure but introduce new considerations.

They require API usage rather than standard Claude subscriptions, meaning costs can range from a few dollars to hundreds of dollars depending on workload. Environments are ephemeral, so anything that needs to persist must be stored in dedicated memory systems rather than on the underlying machine.

Failures should be expected during development. The platform provides visibility into tool calls, reasoning steps, and execution history, making it easier to debug workflows and improve reliability over time.

The Bigger Trend

The broader takeaway is that agent infrastructure is becoming a managed service. Instead of building and operating custom agent systems from scratch, developers can increasingly focus on defining workflows, evaluation criteria, and business logic while infrastructure providers handle the execution layer.

As these platforms mature, the challenge shifts away from running agents and toward designing effective systems of agents that can operate reliably at scale.

FAQ

What is a multi-agent system? +
A multi-agent system is an application where several AI agents coordinate, sharing memory, state, and environments, to do work that a single agent cannot. The hard part is not the agents themselves, it is managing state reliably across the steps.
What does it take to ship a multi-agent app? +
It takes managed state, since a demo becomes a real product once agent sessions, memory, and environments are handled reliably. Braid supplies that infrastructure layer, so you build the front and backend while it runs the sessions and environments for you.
What is Braid? +
Braid is an infrastructure layer for multi-agent applications that wraps Claude's APIs. You build the front and backend, and Braid runs the agentic sessions and environments on top of Anthropic's managed-agents infrastructure.
Why is state the hard part of multi-agent apps? +
Agents need persistent memory and consistent environments across steps, so without managed state a multi-agent demo falls apart the moment it has to remember something or coordinate with another agent.
What can you build with it? +
You can build web-based, multi-agent multimedia products. One example stitches prompts, models, and generative sessions with persistent memory into an interface made for creatives.
How does this fit with my existing stack? +
You keep your own front and backend, and the infrastructure layer slots in to run the sessions and environments, so you do not have to rebuild agent infrastructure yourself.
What separates a multi-agent demo from a shipped app? +
A demo runs once in ideal conditions, while a shipped app has to handle persistent state, recovery, and real users, and closing that gap is the infrastructure question.

What's next?

Keep building with the rest of Night School, or apply to Gauntlet โ€” twelve weeks of technical intensity with the best AI engineers we can find.