AI Agent Governance for Healthcare
HIPAA-compliant AI agent orchestration. Control tool access to patient data, enforce audit trails, and govern clinical AI models with Orloj.
Healthcare organizations are deploying AI agents for clinical decision support, administrative workflows, and patient communication. Unlike batch prediction pipelines, these agents make decisions in real time, calling external tools, accessing patient records, and potentially influencing clinical care.
That changes everything about governance.
Why healthcare agents are different
Clinical agents face regulatory surfaces that traditional ML systems don't
Clinical decision support falls under FDA guidance. Agents that recommend clinical actions are regulated. You need to document:
- What data the agent accessed and when
- Who authorized each action
- What model version made each decision
- Whether a human was involved and what they changed
- Whether the action fell outside approved boundaries
Agents don't just predict; they act. Fetching a patient's medication history is a data access event. Updating a patient record is a clinical action. Both need audit trails. Both need authorization gates.
PHI handling during inference is stricter than training
During inference, agents work with identified patient data flowing through model endpoints, tool calls, external APIs, and logs. Each is a potential exposure point. Weak governance makes it easy to accidentally log full patient records, send PHI to the wrong service, or grant a tool access it shouldn't have.
Human-in-the-loop is harder than it sounds
Teams want to automate low-risk tasks (scheduling follow-ups) while keeping high-risk decisions (adjusting medications, releasing test results) in human hands. That requires:
- Knowing which actions need human review
- Blocking agents from taking high-risk actions directly
- Routing decisions to the right person
- Tracking approvals, modifications, and rejections
Declarative governance in practice
Define which agents can access which data, which models are approved, and what human approvals are required, all as version-controlled manifests. Governance in Orloj uses multiple resource kinds that work together.
Constrain models and block dangerous tools with an AgentPolicy:
apiVersion: orloj.dev/v1
kind: AgentPolicy
metadata:
name: clinical-governance
spec:
apply_mode: scoped
target_systems:
- clinical-decision-support
allowed_models:
- gpt-4-healthcare
- gpt-3.5-administrative
blocked_tools:
- ehr-delete-record
- ehr-update-medications
max_tokens_per_run: 50000Grant scoped permissions with AgentRoles:
apiVersion: orloj.dev/v1
kind: AgentRole
metadata:
name: clinical-reader
spec:
description: Read-only access to patient records and lab results.
permissions:
- tool:ehr-fetch-patient:invoke
- tool:ehr-fetch-recent-labs:invoke
- capability:phi.readRequire human approval for sensitive tool calls with ToolPermission:
apiVersion: orloj.dev/v1
kind: ToolPermission
metadata:
name: ehr-labs-permission
spec:
tool_ref: ehr-fetch-recent-labs
match_mode: all
required_permissions:
- tool:ehr-fetch-recent-labs:invoke
- capability:phi.read
operation_rules:
- operation_class: read
verdict: approval_requiredWhen a tool call triggers approval_required, the task pauses and a ToolApproval resource is created. A clinician reviews and approves or denies via the API or web console before the task can continue.
What this governance model enforces:
- Only approved models are used. Agents configured with unlisted models are denied at execution time.
- Dangerous tools are blocked outright.
blocked_toolsprevents record deletion regardless of agent permissions. - PHI access requires the right role. Only agents with
clinical-readerpermissions can fetch patient data. - Sensitive operations require human approval. Lab results trigger a
ToolApprovalthat a clinician must resolve. - Token budgets are enforced.
max_tokens_per_runstops runaway execution. - Unauthorized calls fail closed. Denied with a
tool_permission_deniederror and logged.
Example: cardiology agent in production
A cardiology clinic runs an agent that reviews test results and flags abnormalities.
- New troponin result arrives
- Agent fetches patient demographics (approved, logged)
- Agent fetches historical labs (requires human approval)
- Agent checks clinical guidelines via external API (no PHI, sampled audit)
- Agent generates a summary: "Troponin elevated 3x baseline. Consistent with myocardial injury."
- Summary held pending; requires cardiologist approval before patient notification
The cardiologist reviews, confirms, and approves. Orloj logs every step:
| Time | Event |
|---|---|
| 14:23 | Agent request with full context |
| 14:23 | Tool calls: ehr-fetch-patient, ehr-fetch-recent-labs |
| 14:24 | Human approval gate: cardiologist reviewed and approved |
| 14:24 | Patient notification sent |
| 14:24 | Audit entry: PHI accessed, action completed, approval recorded |
If something goes wrong (say the agent pulls the wrong patient's records), the clinic can immediately disable the agent, query what data it accessed, identify affected patients, and generate a HIPAA breach assessment report.
Governance checklist
Data access controls
- Restrict which agents access which data classifications (PHI, SENSITIVE, public)
- Create agent-specific data views that filter results
- Audit every data access with timestamp, user, and agent context
- Enforce encryption for data in transit and at rest
Model governance
- Pin model versions and prevent automatic upgrades
- Restrict which models different agents use
- Enforce token budgets per request
- Track which model version made each decision
Human-in-the-loop enforcement
- Define which actions require human approval
- Route approvals to the right role (clinician, compliance officer)
- Enforce timeouts; if no approval within 4 hours, the action is rejected
- Track what humans changed or rejected
Audit and compliance
- Export audit logs for compliance reviews
- Prove retention and access patterns over time
- Reconstruct incidents in chronological order
- Generate reports without manual log parsing
Getting started
Phase 1: Define governance. Write AgentPolicy manifests with your compliance and clinical teams. Define which agents do what, what data they access, and which actions require human review.
Phase 2: Deploy and observe. Run in observation mode for the first two weeks. Agents attempt actions, Orloj logs them, but gates aren't enforced yet. Review logs and tune your policies.
Phase 3: Enforce. Flip gates to enforcement. Unauthorized tool calls are denied. Approval gates become active.
Phase 4: Respond. When incidents happen, use the audit trail to understand what happened. Disable agents, revoke tasks, and generate compliance reports.
Most healthcare teams move through these phases in 4–6 weeks.
Frequently asked questions
No. HIPAA compliance is your responsibility. Orloj provides the governance infrastructure: audit trails, access controls, fail-closed enforcement. You configure the policies; Orloj enforces them.
Yes. FDA guidance on AI in healthcare requires documented decision-making processes. Orloj's audit trails and version control provide that documentation. You still need internal validation and clinical review.
Define that in your AgentPolicy. Set approval gates, time limits, and escalation paths. Overrides are logged with identity and timestamp.
HIPAA doesn't specify a retention period for agent logs. Most healthcare orgs use 7–10 years to match medical record retention. Configure `dataRetention` in your policy accordingly.
Yes. Define tools for each EHR. Agents call the appropriate tools. Orloj enforces the same governance across all of them.
Orloj uses lease-based task ownership. If a worker loses its lease, the task returns to the queue and retries on a healthy worker.