I'm going to walk you through building a real agent system. Not a toy. Three agents in a pipeline: a researcher that gathers information, an analyst that evaluates it, and a writer that produces a report. With governance so the agents can't do things they shouldn't. With cost controls so you know what you're spending.
By the end, you'll have a working system you can extend.
What you need
- Orloj installed (
go install github.com/OrlojHQ/orloj/cmd/...@latestor grab a release binary) - An API key for at least one model provider (OpenAI, Anthropic, or any OpenAI-compatible endpoint)
- About 20 minutes
Step 1: Define your agents
Create a directory for your system. Inside it, create agents.yaml:
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: researcher
spec:
description: "Gathers information on a given topic from available tools"
model:
endpoint: openai
model: gpt-4o
tools:
- web-search
- read-url
limits:
max_steps: 8
timeout: 60s
---
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: analyst
spec:
description: "Evaluates research findings for accuracy and relevance"
model:
endpoint: openai
model: gpt-4o
tools:
- none
limits:
max_steps: 4
timeout: 30s
---
apiVersion: orloj.dev/v1
kind: Agent
metadata:
name: writer
spec:
description: "Produces a structured report from analyzed findings"
model:
endpoint: openai
model: gpt-4o-mini
tools:
- none
limits:
max_steps: 4
timeout: 30s
A few things to notice. Each agent has explicit tool access. The researcher can search the web and read URLs. The analyst and writer get no tools. They work only with what the previous agent passes them. This is the principle of least privilege applied to agents.
The writer uses gpt-4o-mini because report formatting doesn't need the expensive model. Small decision, real cost savings over thousands of runs.
Step limits and timeouts prevent runaway agents. If the researcher can't find what it needs in 8 steps, something is wrong. Fail fast.
Step 2: Compose agents into a system
Create system.yaml:
apiVersion: orloj.dev/v1
kind: AgentSystem
metadata:
name: research-pipeline
spec:
agents:
- researcher
- analyst
- writer
graph:
researcher:
next: analyst
analyst:
next: writer
This is a pipeline. Researcher feeds analyst, analyst feeds writer. Orloj handles message passing, error propagation, and lifecycle management.
You could change the topology without touching agent code. Want the researcher to fan out to multiple analysts? Change the graph. Want a review loop where the writer sends back to the analyst? Add a cycle with an exit condition. The agents themselves don't change.
Step 3: Add governance
This is where most frameworks stop. You have agents. They run. But can the researcher access any website? Can the analyst's prompts leak sensitive data to a model you don't control? In production, these questions matter.
Create governance.yaml:
apiVersion: orloj.dev/v1
kind: AgentPolicy
metadata:
name: research-pipeline-policy
spec:
scope:
system: research-pipeline
rules:
max_tokens_per_task: 50000
max_cost_per_task_usd: 0.50
allowed_models:
- gpt-4o
- gpt-4o-mini
denied_tools:
- execute-code
- file-write
audit:
log_all_tool_calls: true
log_all_model_calls: true
---
apiVersion: orloj.dev/v1
kind: AgentRole
metadata:
name: research-role
spec:
agents:
- researcher
permissions:
tools:
- web-search
- read-url
max_steps: 8
---
apiVersion: orloj.dev/v1
kind: ToolPermission
metadata:
name: web-search-permission
spec:
tool: web-search
allowed_agents:
- researcher
denied_agents:
- analyst
- writer
The policy caps the entire pipeline at 50,000 tokens and $0.50 per run. It blocks code execution and file writing entirely. Every tool call and model call gets logged for audit.
The role and permission resources enforce that only the researcher can search the web. If the analyst somehow tries to call web-search, the runtime blocks it. Fail-closed. Logged.
This is governance as code. It lives in version control next to your agent definitions. It deploys with the same orlojctl apply command. It's not a checkbox in a dashboard somewhere. It's infrastructure.
Step 4: Create a model endpoint
You need to tell Orloj where to send model requests. Create endpoint.yaml:
apiVersion: orloj.dev/v1
kind: ModelEndpoint
metadata:
name: openai
spec:
provider: openai
api_key_env: OPENAI_API_KEY
Orloj reads the API key from the environment variable. If you want to switch to Anthropic later, you add another endpoint and change the agent manifests. The agents themselves don't know or care which provider they're using.
Step 5: Deploy and run
Start Orloj in development mode (single process, in-memory storage):
export OPENAI_API_KEY="your-key-here"
orlojd --embedded-worker --storage-backend=memory
In another terminal, apply your manifests:
orlojctl apply -f agents.yaml
orlojctl apply -f system.yaml
orlojctl apply -f governance.yaml
orlojctl apply -f endpoint.yaml
Now run a task:
orlojctl run research-pipeline \
--input "Analyze the current state of AI agent governance in enterprise. What are the main approaches and what's missing?"
Check status:
orlojctl get task <task-id>
# Status: Running | Succeeded | Failed
Get the output:
orlojctl get task <task-id> --output
You should see a structured report that went through all three agents. The researcher gathered information, the analyst evaluated it, the writer formatted it.
Step 6: See what happened
This is where orchestration earns its keep. Check the audit log:
orlojctl logs <task-id>
You'll see every agent action: which models were called, how many tokens were used, which tools were invoked, how long each step took, and whether any governance policies were triggered.
If an agent hit a token limit, you'll see it. If a tool call was denied by policy, you'll see it. If the total cost approached the budget cap, you'll see it.
This is the information you need to debug, optimize, and trust your agent system. Without it, you're guessing.
Going further
This tutorial used a pipeline. Orloj also supports:
Hierarchical systems where a supervisor agent delegates to workers:
graph:
supervisor:
delegates_to:
- worker-a
- worker-b
- worker-c
Fan-out/fan-in where one agent's output is processed by multiple agents in parallel:
graph:
researcher:
fan_out:
- analyst-finance
- analyst-legal
- analyst-tech
join: report-writer
Swarm loops where agents iterate until a condition is met:
graph:
drafter:
next: critic
critic:
next: drafter
exit_when: approved
max_iterations: 3
For production deployment with Postgres state and NATS messaging:
# Server
orlojd --storage-backend=postgres
# Workers (run as many as you need)
orlojworker --agent-message-bus-backend=nats-jetstream
The blueprints in the Orloj repo (examples/blueprints/) have working configurations for each pattern. Clone the repo and try them.
What you just built
A 3-agent production pipeline with:
- Explicit agent definitions with tool access controls
- Pipeline orchestration with automatic message routing
- Governance policies: token budgets, cost caps, tool restrictions
- Role-based permissions: per-agent tool access
- Full audit logging of every action
- Development-to-production deployment path
The total YAML was about 80 lines. The runtime handles scheduling, execution, governance enforcement, failure recovery, and observability.
That's what agent orchestration gives you. Not more code. Less code, with more control.
Related posts
Orloj vs. LangGraph vs. CrewAI: 2026 Update
Six months since our original comparison. All three frameworks shipped major updates. Here's what changed and what didn't.
Orloj vs. Microsoft Semantic Kernel Agent Framework
Microsoft's Agent Framework brings .NET, Python, and Java support with deep Azure integration. Orloj is language-agnostic and cloud-agnostic. Different trade-offs for different teams.
Orloj vs. LangGraph vs. CrewAI: When to Use What
Three different tools for agent workflows. LangGraph for stateful conversation. CrewAI for simple teams. Orloj for production operations. Here's an honest breakdown of what each is designed for, and when to use something else.