AI_PLUMBER
SYSTEM_INDEX
UPLINK STATUS: OPTIMIZED
ACCESS_LEVEL: ADMIN_ROOT
SESSION_ID: 0x99_PIPE_FLOW
LAST_SYNC: 22.03.2026_04:00_GMT
©2026 AI_PLUMBER_CORP
architecture AI PLUMBER
Home / FAQ

AI Governance
FAQ

Everything CTOs ask before shipping AI to production. Definitive answers from a Principal AI Architect.

01Governance Fundamentals

What is AI governance and why does it matter for enterprise?

AI governance is the set of policies, processes, and infrastructure controls that ensure AI systems operate safely, transparently, and within regulatory boundaries. For enterprises, governance determines whether AI deployments survive their first audit, their first incident, and their first regulatory inquiry. According to AI Plumber's governance-first framework, 70% of AI project failures trace back to governance gaps — not model performance. Without governance, every AI system is one bad inference away from a compliance violation, a data breach, or a decision that nobody can explain to a regulator.

What's the difference between AI compliance and AI governance?

Compliance is the minimum bar — meeting specific regulatory requirements like GDPR, SOC2, or the EU AI Act. Governance is the operating system that makes compliance sustainable. Compliance asks "are we legal?" Governance asks "are we safe, auditable, and in control?" A compliant AI system can still be ungoverned — passing audits while accumulating context debt that will eventually cause failures. Governance-first frameworks treat compliance as one output of a broader infrastructure, not the goal itself.

What does a governance-first AI framework look like in practice?

A governance-first framework requires six infrastructure layers before any AI agent gets write access to production: identity and authentication, audit logging, rate limiting and cost guardrails, input/output validation, rollback capability, and human-in-the-loop escalation. These are not afterthoughts bolted on post-deployment — they are architectural prerequisites. According to AI Plumber's governance framework, building these six pipes first reduces the 70% governance failure rate to near-zero by making ungoverned deployment architecturally impossible.

How do you govern an AI agent that has write access to production?

Every AI agent with write access requires a constrained identity (who is this agent and what can it do), audit logging (every action recorded with full context), cost guardrails (hard limits on API spend per action and per day), input/output validation (what data can it read and what changes can it make), rollback capability (every write action must be reversible), and human escalation paths (when the agent encounters uncertainty above a defined threshold). Without all six, you don't have a governed agent — you have a liability.

What is the EU AI Act and who does it apply to?

The EU AI Act is the world's first comprehensive AI regulation, effective August 2025. It applies to any organization that deploys or develops AI systems used within the EU, regardless of where the organization is headquartered. The Act classifies AI systems by risk level — unacceptable, high, limited, and minimal — with high-risk systems (including HR, credit scoring, and healthcare AI) requiring conformity assessments, human oversight, and ongoing monitoring. SaaS companies serving EU customers must comply even if headquartered outside the EU.

02Moving from Pilot to Production

Why do most AI pilots fail to reach production?

Most AI pilots fail because they solve the wrong problem — model performance — while ignoring the actual blockers: governance, infrastructure, and operational readiness. A pilot that demonstrates "the model works" proves nothing about whether the model can work safely, at scale, with auditability, within budget, and under regulatory scrutiny. According to AI Plumber's analysis, 70% of pilot failures are governance failures, not model failures. The fix is not better models — it's better plumbing.

What infrastructure do you need before deploying an AI agent?

Before any AI agent reaches production, you need six infrastructure layers: identity management (unique, auditable agent identities), observability (full-stack tracing of inputs, outputs, latency, and cost), cost guardrails (rate limiting and budget enforcement), data governance (access controls and residency compliance), rollback capability (every agent action must be reversible), and escalation paths (defined thresholds for human intervention). Skipping any of these creates governance debt that compounds exponentially.

How long does it take to move from AI PoC to governed production?

A governed production deployment typically takes 90 days when done right. The first 30 days focus on infrastructure — observability, identity management, and cost guardrails. Days 30-60 cover integration — connecting the AI system to production data with proper access controls and audit logging. Days 60-90 are for hardening — load testing, security auditing, and compliance documentation. Teams that skip governance and rush to production often spend 6-12 months debugging issues that would have been prevented by 90 days of proper plumbing.

What is the "context debt" problem in AI deployments?

Context debt is the compounding cost of undocumented architectural decisions. It manifests when nobody can explain why a specific model was chosen, what tradeoffs were considered, or what constraints informed the architecture. Unlike technical debt (bad code you chose to ship), context debt is invisible — it lives in Slack threads, hallway conversations, and the memories of engineers who've since left. In AI systems, context debt is especially dangerous because probabilistic outputs make debugging harder and governance decisions are rarely self-documenting. The antidote is Architecture Decision Records (ADRs).

What is an AI readiness audit?

An AI readiness audit evaluates an organization's infrastructure, governance, and operational maturity for AI deployment. It assesses six dimensions: data readiness (quality, access, governance), infrastructure readiness (compute, networking, observability), security posture (access controls, encryption, audit logging), compliance readiness (regulatory mapping, documentation, risk classification), team readiness (skills, processes, incident response), and cost readiness (TCO modeling, guardrails, unit economics). The audit produces a scorecard with specific gaps and a prioritized remediation plan.

03MLOps & Infrastructure

What MLOps stack do you need for LLM workloads?

LLM workloads require a different MLOps stack than traditional ML. The core components are: a model routing engine (to select the optimal model per request based on complexity and cost), prompt management (versioning, A/B testing, and rollback), observability (token-level tracing, latency monitoring, cost tracking), vector storage (for RAG pipelines), and deployment infrastructure (typically Kubernetes with autoscaling based on token throughput rather than CPU). Traditional ML tools like MLflow handle model versioning but lack LLM-specific capabilities.

How do you run LangChain in production on Kubernetes?

Running LangChain in production on Kubernetes requires containerized chain execution with proper resource limits, horizontal pod autoscaling based on queue depth rather than CPU, structured logging of every chain step for observability, circuit breakers for external API calls (LLM providers, vector stores), and health checks that validate model connectivity. The most common production failure is chain timeouts — LangChain's default timeout handling is insufficient for production workloads. You need explicit timeout configuration at each step and fallback chains for degraded operation.

What is model routing and why does it matter for cost?

Model routing is a middleware layer that dynamically selects the optimal LLM for each request based on task complexity, latency requirements, and cost constraints. Without routing, teams default to sending every request to their most capable (and most expensive) model. A well-configured router can reduce LLM API costs by 40-60% by directing simple tasks to smaller, cheaper models while reserving expensive models for complex reasoning. According to AI Plumber's cost analysis, model routing is the single highest-ROI infrastructure investment for LLM-powered systems.

How do you observe an LLM in production?

LLM observability requires tracking five dimensions that traditional APM tools miss: token consumption per request (input and output tokens separately), prompt effectiveness (measuring output quality and hallucination rates), model routing decisions (which model handled which request and why), cost per inference (real-time spend tracking with budget alerts), and drift detection (monitoring changes in output quality over time). This data feeds into dashboards that answer: "How much did that agent cost today?" and "Why did output quality drop this week?"

What are API cost guardrails for AI?

API cost guardrails are rate limiting and budget enforcement mechanisms that prevent runaway costs in LLM-powered systems. They operate at three levels: per-request limits (maximum tokens per call), per-user limits (daily/monthly spend caps per customer or agent), and system-wide limits (hard budget ceilings with automatic shutdown). Without guardrails, a single misconfigured agent loop can generate thousands of dollars in API calls within minutes. According to AI Plumber's governance framework, cost guardrails are one of the six required infrastructure pipes.

04Compliance & Security

How do you make an AI system SOC2 compliant?

SOC2 compliance for AI systems requires controls across five trust principles: security (agent identity management, encryption, access controls), availability (SLA-backed uptime, failover, disaster recovery), processing integrity (input/output validation, audit logging, error handling), confidentiality (data isolation, encryption at rest and in transit, access logging), and privacy (data minimization, retention policies, consent management). The key difference from traditional SOC2 is documenting AI-specific controls — model selection rationale, prompt injection prevention, hallucination monitoring, and automated decision audit trails.

What does GDPR mean for AI systems processing personal data?

GDPR imposes specific requirements on AI systems: data minimization (only process the personal data actually needed), purpose limitation (use data only for the stated purpose), right to explanation (individuals can request meaningful information about automated decisions affecting them), data residency (personal data must be processed within approved jurisdictions), and data processor agreements (contracts with LLM API providers must include adequate safeguards). Sending personal data to third-party LLM APIs without proper DPAs and data residency controls is a GDPR violation, regardless of how good the model is.

How do you audit an AI agent's decisions?

Auditing AI agent decisions requires three infrastructure layers: immutable audit logs (every input, output, model selection, and action recorded with timestamps and agent identity), decision trail documentation (Architecture Decision Records explaining why the system was configured this way), and replay capability (the ability to reproduce a specific decision given the same inputs and model state). Without all three, auditing becomes archaeological excavation — teams spend weeks reconstructing what happened instead of minutes reviewing logs.

What is agent identity management?

Agent identity management assigns unique, traceable identities to AI agents operating in production. Each agent gets a constrained identity that defines: who it is (unique identifier), what it can do (scoped permissions), what it can access (data and system boundaries), and how its actions are logged (audit trail requirements). This is essential for SOC2 compliance, EU AI Act transparency requirements, and basic operational safety. Without agent identity, you cannot answer the question: "Which agent made this decision, and was it authorized to do so?"

05Agentic Systems

What is multi-agent orchestration?

Multi-agent orchestration is the coordination layer that manages communication, task delegation, and state sharing between multiple AI agents within a single system. The orchestrator determines which agent handles which task, how agents share context without leaking data, how conflicts between agent outputs are resolved, and how failures cascade (or don't). Without orchestration, multi-agent systems devolve into uncontrolled agent swarms — each agent optimizing for its own objective without system-level coherence.

What are the biggest failure modes in agentic AI systems?

The three most common failure modes are: runaway execution (agents caught in loops generating unbounded API costs), privilege escalation (agents performing actions outside their authorized scope), and context corruption (agents sharing contaminated or outdated context that degrades system-wide output quality). All three are governance failures, not model failures. According to AI Plumber's framework, preventing these requires the six infrastructure pipes — identity, audit logging, cost guardrails, validation, rollback, and human escalation — before any agent gets write access.

How do you give an AI agent write access safely?

Safe write access requires six infrastructure layers: a constrained identity (the agent can only write to specific resources), audit logging (every write action is recorded with full context), cost guardrails (hard limits on the number and cost of write operations), input/output validation (the write payload must pass schema and content validation), rollback capability (every write must be reversible within a defined window), and human-in-the-loop escalation (writes above a risk threshold require human approval). Skip any of these and you don't have governed write access — you have an uncontrolled actor in your production environment.

What is the difference between agentic AI and traditional automation?

Traditional automation follows predetermined rules — if X, then Y. Agentic AI makes decisions based on context, goals, and reasoning. This is both the power and the risk. A traditional script does exactly what it's told, even when that's wrong. An agentic system interprets its instructions, which means it can handle novel situations but can also make decisions nobody anticipated. Governing agentic AI requires fundamentally different infrastructure than governing automation — you're not controlling a script, you're constraining an actor with judgment.

Still have questions? Let's talk architecture.