← Blog

Fail-Closed vs Fail-Open: Why Your Agent's Default Matters More Than Its Logic

Jon Mandraki

If your AI agent hits an authorization rule it doesn't understand, what happens?

In most frameworks, the agent tries the action anyway. Unless you explicitly say "no," the agent proceeds. It fails silently, succeeds, or hangs. Either way, you've crossed the boundary.

That's fail-open governance. It's standard in LangChain, CrewAI, and AutoGen because it's easier to build. You write an agent, add tools, and test. Constraints look like overhead.

In production, your agent calls an API it shouldn't know about. Or reads files from the wrong directory. Or transfers funds in test mode.

Fail-open is incompatible with production.

What Fail-Closed Actually Means

Fail-closed is simpler than it sounds. It means: by default, everything is denied. An agent can only call a tool if you've explicitly granted it permission. No tool access without a policy. No data access without authorization. No exceptions.

When an agent tries to do something unauthorized, fail-closed systems don't guess. They fail fast, they fail loud, and they stop the operation.

Here's the operational difference:

Fail-open scenario:

  1. Agent tries to read /etc/passwd
  2. Framework checks: "Did someone explicitly say no to this?"
  3. No explicit denial found
  4. Agent reads the file
  5. You find out in logs 6 hours later (if you're monitoring)

Fail-closed scenario:

  1. Agent tries to read /etc/passwd
  2. Runtime checks: "Is this tool in the agent's manifest?"
  3. Tool is not listed
  4. Operation is denied
  5. Agent gets an error
  6. Error is logged and visible in real time

The difference is in how they handle uncertainty. Fail-open assumes "not forbidden means allowed." Fail-closed assumes "not permitted means forbidden."

In infrastructure, fail-closed is standard. Your database doesn't assume queries are safe because you didn't deny them. Your cloud provider doesn't assume users can delete because you forgot a policy. Kubernetes doesn't assume pods can mount volumes because you didn't forbid it.

Agent systems should work the same way.

Why Fail-Open Became the Default

Fail-open dominates because it's simpler to build. No permission system. No upfront authorization thinking. You write an agent, give it tools, run it. Frameworks starting permissive don't transition well to restrictive. Authorization added later is clearly an afterthought.

Agent developers often aren't security engineers. They're researchers or product engineers. Permissiveness feels flexible; policies feel bureaucratic. But production needs both: flexibility to iterate and governance to scale safely.

The Operational Cost of Fail-Open

This matters less for a single agent in a sandbox. It matters enormously when you have multiple agents, multiple models, and multiple tools in a shared runtime.

You have five agents. One shouldn't call the payment API. Another can't read customer data. A third runs only on Mondays. With fail-open, you document this in comments or a policy file. Nothing enforces it. If the author forgets to check or the file isn't updated, the agent acts anyway. You might catch it in review. Might not.

With fail-closed, the runtime enforces it. The agent can't call the payment API because it's not in the manifest. Period. No exceptions. This eliminates entire categories of surprises in production.

How Orloj Enforces Fail-Closed Governance

Orloj's core design assumes fail-closed. When you define an agent in a manifest, you explicitly list what that agent can do. Its tools. Its models. Its permissions. The runtime doesn't infer anything beyond that.

Here's what a basic agent manifest looks like:

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: customer-support-bot
spec:
  model: gpt-4
  tools:
    - search-knowledge-base
    - create-ticket
    - read-customer-data
  policies:
    - effect: allow
      resource: "knowledge-base/*"
      action: "read"
    - effect: deny
      resource: "payment-api/*"
      action: "*"

The agent can do exactly what's in the manifest. Nothing more. If the manifest says it can read customer data but not modify it, the agent can read but cannot modify. If the manifest doesn't list the payment API, the agent can't touch it—not even query it, not even to check if it's available.

When an agent calls an unlisted tool, the runtime stops it immediately. No silent failure. The error is logged. This happens at the execution layer, not the model layer. The model never sees unauthorized tools. The runtime intercepts before the tool runs, preventing accidental calls and timing-based information leaks.

Real-World Examples of What Changes

Let me walk through three scenarios where fail-closed governance prevents production incidents.

Scenario 1: New Tool, Wrong Permissions

A developer adds a tool for a specific agent. In fail-open, they add the code but forget auth. Other agents call it anyway. With fail-closed, the tool goes in the manifest and deploys with it. Other agents can't call it unless they declare it.

Scenario 2: Model Hallucination + Unauthorized Tool

An agent's model starts hallucinating. It tries to call a tool that doesn't exist: update-admin-password. In a fail-open system, the tool call fails, but the agent might retry, and there's ambiguity about why it failed. Was the API down? Did the agent not have permission?

With fail-closed, the runtime rejects it immediately because update-admin-password was never in the manifest. The agent gets a clear error: "Tool not available." No ambiguity.

Scenario 3: Blast Radius Control

An agent gets compromised or breaks. What damage can it do? With fail-open, it depends on what you forbade. The agent might access any system tool. With fail-closed, the agent only touches tools in its manifest. Blast radius is bounded by default.

Governance Beyond Tool Access

Fail-closed extends beyond tools. It applies to models, to resources, to data access patterns, to rate limits.

In Orloj, an agent doesn't just declare what tools it can use. It declares what models it can call, what data stores it can access, and under what constraints.

apiVersion: orloj.dev/v1
kind: Agent
metadata:
  name: research-agent
spec:
  models:
    - gpt-4
    - gpt-4-turbo
  rateLimit:
    maxTokensPerDay: 100000
    maxCallsPerMinute: 10
  dataAccess:
    - resource: "internal-docs/*"
      action: "read"
    - resource: "external-apis/*"
      action: "read"
      rateLimitedBy: "callingAgent"

The agent can use GPT-4 or GPT-4-turbo. It can't use Claude. It can make 10 calls per minute and burn 100k tokens per day. It can read internal docs and hit external APIs, but nothing else.

If the agent tries to call a model that's not in the manifest, it fails closed. If it hits its token limit, the runtime stops it. If it tries to write to a data store, the runtime rejects it.

All enforced at the runtime layer. All fail-closed.

The Transition Problem

One legitimate concern: if you're running fail-open systems now and want to move to fail-closed, the transition can be painful.

You have to audit what each agent actually needs. You have to build the policy framework. You have to test that you didn't accidentally forbid something critical.

Orloj is designed for new systems or systems where you're willing to take the time to do this right. If you have a large existing system with undocumented agent behavior, the transition requires actual work.

But the alternative is worse. Staying fail-open in production means you're accepting a known security risk indefinitely.

Why This Matters Now

As agents move from research to production, governance shifts from optional to mandatory. Single agents with fail-open permissions are manageable. Fleets with fail-open are disasters. You can't audit them, control blast radius, or prove security to compliance.

Fail-closed lets you run agents the same way you run databases and orchestrators: with clear boundaries and predictable behavior. Orloj enforces it at the runtime layer. The manifest declares what's allowed. The runtime ensures nothing else happens. Everything else is hope.

Related posts