← Blog

How to Orchestrate AI Agents in Production

Jon Mandraki

I'm going to walk you through building a real agent system. Not a toy. Three agents in a pipeline: a researcher that gathers information, an analyst that evaluates it, and a writer that produces a report. With governance so the agents can't do things they shouldn't. With cost controls so you know what you're spending.

By the end, you'll have a working system you can extend.

What you need

  • Orloj installed (go install github.com/OrlojHQ/orloj/cmd/...@latest or grab a release binary)
  • An API key for at least one model provider (OpenAI, Anthropic, or any OpenAI-compatible endpoint)
  • About 20 minutes

Step 1: Define your agents

Create a directory for your system. Inside it, create agents.yaml:

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: researcher
spec:
  description: "Gathers information on a given topic from available tools"
  model:
    endpoint: openai
    model: gpt-4o
  tools:
    - web-search
    - read-url
  limits:
    max_steps: 8
    timeout: 60s
---
apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: analyst
spec:
  description: "Evaluates research findings for accuracy and relevance"
  model:
    endpoint: openai
    model: gpt-4o
  tools:
    - none
  limits:
    max_steps: 4
    timeout: 30s
---
apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: writer
spec:
  description: "Produces a structured report from analyzed findings"
  model:
    endpoint: openai
    model: gpt-4o-mini
  tools:
    - none
  limits:
    max_steps: 4
    timeout: 30s

A few things to notice. Each agent has explicit tool access. The researcher can search the web and read URLs. The analyst and writer get no tools. They work only with what the previous agent passes them. This is the principle of least privilege applied to agents.

The writer uses gpt-4o-mini because report formatting doesn't need the expensive model. Small decision, real cost savings over thousands of runs.

Step limits and timeouts prevent runaway agents. If the researcher can't find what it needs in 8 steps, something is wrong. Fail fast.

Step 2: Compose agents into a system

Create system.yaml:

apiVersion: orloj.dev/v1
kind: AgentSystem
metadata:
  name: research-pipeline
spec:
  agents:
    - researcher
    - analyst
    - writer
  graph:
    researcher:
      next: analyst
    analyst:
      next: writer

This is a pipeline. Researcher feeds analyst, analyst feeds writer. Orloj handles message passing, error propagation, and lifecycle management.

You could change the topology without touching agent code. Want the researcher to fan out to multiple analysts? Change the graph. Want a review loop where the writer sends back to the analyst? Add a cycle with an exit condition. The agents themselves don't change.

Step 3: Add governance

This is where most frameworks stop. You have agents. They run. But can the researcher access any website? Can the analyst's prompts leak sensitive data to a model you don't control? In production, these questions matter.

Create governance.yaml:

apiVersion: orloj.dev/v1
kind: AgentPolicy
metadata:
  name: research-pipeline-policy
spec:
  scope:
    system: research-pipeline
  rules:
    max_tokens_per_task: 50000
    max_cost_per_task_usd: 0.50
    allowed_models:
      - gpt-4o
      - gpt-4o-mini
    denied_tools:
      - execute-code
      - file-write
  audit:
    log_all_tool_calls: true
    log_all_model_calls: true
---
apiVersion: orloj.dev/v1
kind: AgentRole
metadata:
  name: research-role
spec:
  agents:
    - researcher
  permissions:
    tools:
      - web-search
      - read-url
    max_steps: 8
---
apiVersion: orloj.dev/v1
kind: ToolPermission
metadata:
  name: web-search-permission
spec:
  tool: web-search
  allowed_agents:
    - researcher
  denied_agents:
    - analyst
    - writer

The policy caps the entire pipeline at 50,000 tokens and $0.50 per run. It blocks code execution and file writing entirely. Every tool call and model call gets logged for audit.

The role and permission resources enforce that only the researcher can search the web. If the analyst somehow tries to call web-search, the runtime blocks it. Fail-closed. Logged.

This is governance as code. It lives in version control next to your agent definitions. It deploys with the same orlojctl apply command. It's not a checkbox in a dashboard somewhere. It's infrastructure.

Step 4: Create a model endpoint

You need to tell Orloj where to send model requests. Create endpoint.yaml:

apiVersion: orloj.dev/v1
kind: ModelEndpoint
metadata:
  name: openai
spec:
  provider: openai
  api_key_env: OPENAI_API_KEY

Orloj reads the API key from the environment variable. If you want to switch to Anthropic later, you add another endpoint and change the agent manifests. The agents themselves don't know or care which provider they're using.

Step 5: Deploy and run

Start Orloj in development mode (single process, in-memory storage):

export OPENAI_API_KEY="your-key-here"
orlojd --embedded-worker --storage-backend=memory

In another terminal, apply your manifests:

orlojctl apply -f agents.yaml
orlojctl apply -f system.yaml
orlojctl apply -f governance.yaml
orlojctl apply -f endpoint.yaml

Now run a task:

orlojctl run research-pipeline \
  --input "Analyze the current state of AI agent governance in enterprise. What are the main approaches and what's missing?"

Check status:

orlojctl get task <task-id>
# Status: Running | Succeeded | Failed

Get the output:

orlojctl get task <task-id> --output

You should see a structured report that went through all three agents. The researcher gathered information, the analyst evaluated it, the writer formatted it.

Step 6: See what happened

This is where orchestration earns its keep. Check the audit log:

orlojctl logs <task-id>

You'll see every agent action: which models were called, how many tokens were used, which tools were invoked, how long each step took, and whether any governance policies were triggered.

If an agent hit a token limit, you'll see it. If a tool call was denied by policy, you'll see it. If the total cost approached the budget cap, you'll see it.

This is the information you need to debug, optimize, and trust your agent system. Without it, you're guessing.

Going further

This tutorial used a pipeline. Orloj also supports:

Hierarchical systems where a supervisor agent delegates to workers:

graph:
  supervisor:
    delegates_to:
      - worker-a
      - worker-b
      - worker-c

Fan-out/fan-in where one agent's output is processed by multiple agents in parallel:

graph:
  researcher:
    fan_out:
      - analyst-finance
      - analyst-legal
      - analyst-tech
    join: report-writer

Swarm loops where agents iterate until a condition is met:

graph:
  drafter:
    next: critic
  critic:
    next: drafter
    exit_when: approved
    max_iterations: 3

For production deployment with Postgres state and NATS messaging:

# Server
orlojd --storage-backend=postgres

# Workers (run as many as you need)
orlojworker --agent-message-bus-backend=nats-jetstream

The blueprints in the Orloj repo (examples/blueprints/) have working configurations for each pattern. Clone the repo and try them.

What you just built

A 3-agent production pipeline with:

  • Explicit agent definitions with tool access controls
  • Pipeline orchestration with automatic message routing
  • Governance policies: token budgets, cost caps, tool restrictions
  • Role-based permissions: per-agent tool access
  • Full audit logging of every action
  • Development-to-production deployment path

The total YAML was about 80 lines. The runtime handles scheduling, execution, governance enforcement, failure recovery, and observability.

That's what agent orchestration gives you. Not more code. Less code, with more control.

Related posts