AI Agent Governance for Financial Services
SEC, OCC, and FINRA-compliant AI agent orchestration. Model risk management, audit trails, and human-in-the-loop approvals for finserv agents.
The SEC's examination priorities explicitly call out AI governance. The OCC's guidance on AI in banking describes expectations for control frameworks, audit trails, and human oversight. An ungoverned AI system in financial services is a risk engine.
Firms are deploying agents for trading recommendations, customer service, portfolio rebalancing, and regulatory reporting. These agents call external systems, make decisions that affect customer money, and create audit obligations that traditional ML governance doesn't cover.
Why agents change the compliance surface
Agents act, not just predict
A traditional ML model predicts something: market direction, default probability, customer churn. The prediction feeds a human decision. An investment agent is different. It analyzes market data, evaluates positions, calls a trading tool, and records the decision.
The SEC wants to know: What data did it access? What decision rules did it follow? Was there human review? Can you reproduce the decision? Who is responsible?
Model risk management applies to agents too
SR 11-7 (Federal Reserve guidance on model risk) requires an independent model inventory, validation before deployment, ongoing monitoring, and escalation procedures. A single agent might call five different models. You need to:
- Pin model versions and prevent silent upgrades
- Route decisions to appropriate models by risk level
- Monitor performance and alert on degradation
- Enforce model-specific constraints (token budgets, confidence thresholds)
Audit trails are compliance obligations
FINRA requires audit trails sufficient to reconstruct activity. The OCC expects documented decision chains. The SEC will ask for the logs.
With Orloj, every agent action creates a structured, queryable audit record: what the agent intended, what tools it called, what data it accessed, whether humans approved it, and what the outcome was.
Declarative governance in practice
Governance in Orloj uses multiple resource kinds that work together. Define them as version-controlled YAML manifests; diff in PRs, roll back safely.
Constrain models and block dangerous tools with an AgentPolicy:
apiVersion: orloj.dev/v1
kind: AgentPolicy
metadata:
name: portfolio-governance
spec:
apply_mode: scoped
target_systems:
- portfolio-rebalancing-system
allowed_models:
- gpt-4-finance
- gpt-3.5-turbo
blocked_tools:
- account-close
- wire-transfer
max_tokens_per_run: 50000Grant scoped permissions with AgentRoles:
apiVersion: orloj.dev/v1
kind: AgentRole
metadata:
name: trading-role
spec:
description: Can invoke trading and market data tools.
permissions:
- tool:market-data-api:invoke
- tool:trading-execution:invoke
- capability:market.readRequire human approval before executing trades with ToolPermission:
apiVersion: orloj.dev/v1
kind: ToolPermission
metadata:
name: trading-execution-permission
spec:
tool_ref: trading-execution
match_mode: all
required_permissions:
- tool:trading-execution:invoke
operation_rules:
- operation_class: write
verdict: approval_requiredWhen a trade triggers approval_required, the task pauses and a ToolApproval resource is created. A trading desk head or risk officer reviews and approves or denies via the API or web console. If the approval TTL expires, the task fails with approval_timeout.
What this governance model enforces:
- Model pinning prevents silent upgrades. Only listed models are allowed; agents with unlisted models are denied.
- Dangerous tools are blocked outright.
wire-transferandaccount-closecan never be invoked, regardless of permissions. - Trade execution requires human approval. The
ToolApprovalworkflow pauses the task until a human signs off. - Token budgets are enforced.
max_tokens_per_runstops runaway chains of thought. - Unauthorized calls fail closed. Denied with
tool_permission_deniedand logged.
Regulatory requirement mapping
| Requirement | How Orloj addresses it |
|---|---|
| Decision rationale documentation | Full audit trail with model inputs, confidence scores, tool calls |
| Segregation of duties | Tool permissions by role; human approval gates |
| Reconstructable audit trails | Immutable ledger of agent actions, approvals, outcomes |
| Model inventory and versioning | Model endpoint pinning; version tracking in every decision |
| Performance monitoring | Confidence thresholds; alerting on approval rejections |
| Human oversight | Approval gates enforced at the execution layer |
| Exposure controls | Transaction value limits; blacklists; rate limits per tool |
| Business continuity | Lease-based task ownership; automatic failover |
Example: portfolio rebalancing agent
An investment firm runs an agent that monitors client portfolios and recommends rebalancing.
- Portfolio drifts 5% from target allocation
- Agent fetches current prices via
market-data-api(sampled audit) - Agent analyzes portfolio using pinned
gpt-4-financemodel (full audit) - Agent determines: sell tech-heavy positions, buy bonds to restore target
- Agent attempts to call
trading-execution
Orloj intervenes:
- The tool requires human approval (defined in policy)
- Request routes to the trading desk head and risk officer
- Both must approve within 5 minutes or the action is rejected
- The trading desk head reviews the order, rationale, and confidence score
- They approve with a modification: "Execute at 50% of recommended position size"
Orloj logs the agent's recommendation, the human override, and executes the modified order.
| Time | Event |
|---|---|
| 10:23:45 | Agent: portfolioOptimizer |
| 10:23:46 | Model: gpt-4-finance v1.2.3 |
| 10:23:47 | Decision: Sell 1000 NVDA, Buy 2000 BND |
| 10:23:48 | Confidence: 0.82 |
| 10:24:30 | Approval: portfolioManager APPROVED (partial) |
| 10:24:31 | Approval: riskOfficer APPROVED |
| 10:24:32 | Execution: Sell 500 NVDA, Buy 1000 BND (modified) |
If regulators ask "Why did this trade happen?", the firm produces: the agent's analysis, model version and confidence score, human reviews, the actual executed order, and the modification applied.
Compliance checklist
Model risk management (SR 11-7)
- Maintain an inventory of all models agents use
- Enforce validation requirements before deployment
- Pin model versions and prevent automatic upgrades
- Monitor performance and alert on degradation
Audit and regulatory reporting
- Export audit logs in formats suitable for SEC examination
- Reconstruct any agent decision with full context and approvals
- Prove transactions were authorized before execution
- Identify all customer accounts affected by a specific agent
Human oversight
- Define which actions require approval
- Enforce time limits on approval windows
- Route approvals to the right role by transaction type and size
- Log all modifications and rejections
Risk controls
- Enforce transaction size limits
- Implement security and counterparty blacklists
- Prevent agents from operating outside business hours
- Auto-disable agents that violate policies repeatedly
Getting started
Phase 1: Define compliance boundaries. Work with legal and compliance to define approval requirements, audit evidence needs, retention periods, and applicable frameworks (SEC, OCC, FINRA).
Phase 2: Write policy manifests. Translate those requirements into Orloj policies: tool permissions, model pinning, rate limits, and audit levels.
Phase 3: Deploy and validate. Run in observation mode. Verify audit logs capture what compliance needs. Validate with your examination team.
Phase 4: Enforce and monitor. Flip gates to enforcement. Track agent behavior, approval latency, and policy violations.
Most finserv firms move through these phases in 6–8 weeks. The investment in writing policies pays off the first time an examiner asks for evidence.
Frequently asked questions
No. Orloj is software you run on your infrastructure. The SEC regulates your use of AI; you configure the governance, Orloj enforces it.
Orloj is designed for semi-autonomous systems that benefit from human oversight and governance. Pure algorithmic systems with microsecond latency and no human involvement have different requirements.
Define that in your AgentPolicy. A $100K trade might go to a junior trader. A $1M trade escalates to the portfolio manager and risk officer.
Orloj captures model version and confidence scores in every decision. Query for low-confidence decisions over time. If drift is detected, pin to an older model version while you investigate.
Orloj logs the rejection and returns an error to the agent. The agent can retry with different parameters, escalate, or fail gracefully.
Export audit logs for the period under examination. Provide AgentPolicy manifests showing your governance rules. Examiners can verify that policies are enforced and decisions are documented.