NEW: New Research: AI Agents and Algorithmic Redlining

Read Now

Trinitite

Tool GovernanceResearchBlog

AGRC Framework / Domain 5

05

HUM

Human Factors & Cognitive Social Engineering

MITRE ATLAS AML.T0051 (Prompt Injection), The Anthropic GTG-1002 "Persona Loophole".

Domain Objective

Adversaries no longer hack the code; they hack the alignment. State-sponsored actors bypass AI safety constraints by adopting authorized personas (e.g., posing as "CTF Researchers"), weaponizing the agent's RLHF "Helpfulness" training against the enterprise. This domain establishes Context-Blind Execution Verification, shifting the evaluation from who the user claims to be to what the vector actually does.

Controls

8

HUM-5.1

Strict Persona Constriction

The Rule — Control Statement

Corporate policies and system prompts shall algorithmically forbid agents from adopting user-defined personas or deviating from a rigid "Corporate Persona."

The Why — Fiduciary Rationale

Closes the "Persona Loophole" exploited in GTG-1002. Attackers bypass safety filters by placing the model in a hypothetical context (e.g., "Act as a penetration tester" or "We are playing a game").

The How — Implementation Standard

The orchestration layer must prepend absolute, non-overrideable behavioral constraints to every context window. The Governor must geometrically reject any output vector where the agent acknowledges or accepts a context-shift request from the user.

The Proof — Continuous Attestation Evidence

Version-controlled system prompts demonstrating mandatory persona-lock directives, supported by State-Tuple Ledger logs showing the deterministic blocking of role-play initiation.

HUM-5.2

Context-Blind Action Governance

The Rule — Control Statement

The enterprise GRC policy must operate on the physical absolute that Intent Does Not Excuse Action. Execution must be evaluated blindly.

The Why — Fiduciary Rationale

Relying on a model's "intent classification" to decipher if a user is lying is statistically flawed. Even if the human operator claims to be conducting "authorized emergency maintenance," the system must evaluate the physics of the payload, not the story.

The How — Implementation Standard

The Governor must evaluate the physical payload of the action (e.g., the JSON parameters of an API call) independently of the conversational context window that generated it.

The Proof — Continuous Attestation Evidence

State-Tuple hashes showing the rejection of prohibited actions despite the presence of "override," "authorized," or "emergency" narrative strings in the user input prompt.

HUM-5.3

Out-of-Band (OOB) Human-in-the-Loop (HITL)

The Rule — Control Statement

For Tier 3 high-impact agentic actions, the system must halt autonomous execution and require HITL approval via a mathematically distinct, Out-of-Band (OOB) channel.

The Why — Fiduciary Rationale

An attacker who has compromised the agent's context window can trivially forge a user saying "I approve" within the chat interface. Approval must happen outside the compromised vector space.

The How — Implementation Standard

Execution must trigger a cryptographic MFA push to a secure mobile app or FIDO2 hardware key, physically separated from the orchestration server and chat UI.

The Proof — Continuous Attestation Evidence

Execution logs for Tier 3 actions matched 1:1 with OOB Identity Provider (IdP) authentication timestamps.

HUM-5.4

Inter-Agent Protocol (IAP) Authentication

The Rule — Control Statement

All communication between disparate agents (e.g., Agent A delegating a sub-task to Agent B) must be mutually authenticated and cryptographically bound.

The Why — Fiduciary Rationale

Prevents "Laundering Attacks" where an untrusted external agent passes poisoned instructions directly to a highly privileged internal agent to execute a task it could not do itself.

The How — Implementation Standard

Agent-to-agent communication requires strict mTLS handshakes and must pass through a data sanitization boundary (The Governor) exactly as if it were external user input.

The Proof — Continuous Attestation Evidence

Inter-agent orchestration logs showing mTLS certificate validation for all Handoff/Delegation events paired with Governor inspection hashes.

HUM-5.5

Deepfake & Synthetic Identity Defense for HITL

The Rule — Control Statement

Voice and video-based approvals shall not be accepted as authorization for Tier 3 agentic overrides.

The Why — Fiduciary Rationale

Synthetic manipulation (audio/video deepfakes) renders biometric or visual confirmation obsolete for high-stakes override commands.

The How — Implementation Standard

Executive overrides must rely strictly on cryptographic assertions bound to hardware enclaves (e.g., FIDO2 hardware tokens, YubiKeys).

The Proof — Continuous Attestation Evidence

Identity Provider logs proving WebAuthn/FIDO2 hardware-rooted attestation for any manual override of a Tier 3 Governor block.

HUM-5.6

Approval Fatigue Mitigation (Contextual Step-Up)

The Rule — Control Statement

The system shall deterministically manage human alert fatigue by escalating approval friction dynamically based on the volume and anomaly-delta of the requests.

The Why — Fiduciary Rationale

Human operators rapidly succumb to alert fatigue, rubber-stamping malicious requests hidden in a flood of benign approvals authored by a persuasive agent.

The How — Implementation Standard

The approval prompt sent to the human must be authored deterministically by the Governor (not the Agent), highlighting the exact Delta of Anomaly. If a user approves consecutive requests rapidly, the system must trigger a "Step-Up" event (e.g., forcing manual typing of a randomized confirmation string).

The Proof — Continuous Attestation Evidence

Audit trails of the approval UI flow demonstrating dynamic Step-Up friction events following rapid succession approvals.

HUM-5.7

The "Confused Deputy" DLP Check

The Rule — Control Statement

All outbound communications and external tooling executed by an agent must be routed through traditional corporate Data Loss Prevention (DLP) engines.

The Why — Fiduciary Rationale

Employees will intentionally use agents as "Confused Deputies" to bypass corporate controls (e.g., asking an agent to summarize a classified internal doc and email it to a personal Gmail account).

The How — Implementation Standard

The enterprise network must treat the agent exactly as if it were a human employee, proxying its egress traffic directly into the corporate DLP/CASB inspection pipeline.

The Proof — Continuous Attestation Evidence

Corporate DLP system logs demonstrating the scanning, tagging, and blocking of anomalous agentic traffic targeting unauthorized external domains.

HUM-5.8

Insider Threat Collusion Monitoring

The Rule — Control Statement

Internal, authenticated user prompts shall be subjected to the exact same rigorous Governor evaluations, Glass Box logging, and alerting as unauthenticated external inputs.

The Why — Fiduciary Rationale

The greatest threat to an aligned model is a rogue employee continuously prompting the agent with role-playing/CTF scenarios to map its boundaries and discover a bypass payload. Trust is not a control.

The How — Implementation Standard

High frequencies of blocked internal prompts must trigger automated alerts to Human Resources and Insider Threat teams.

The Proof — Continuous Attestation Evidence

Governor alert routing configurations proving that repeated blocks from internal corporate IAM identities trigger automated SIEM alerts to Security Operations.

Ready to implement this domain?

See how Trinitite delivers continuous cryptographic attestation for Human Factors controls out of the box.

Book a Demo