AI Governance Security Leadership

AI Governance Security Leadership | NIST AI RMF Series

A practitioner's deep dive into building a real generative AI governance program — from policy to controls to board reporting

If you read my earlier post, Generative AI Governance: Using the NIST Framework to Build Trust, Reduce Risk, and Lead Secure AI Adoption, you got a solid introduction to why the NIST AI Risk Management Framework (AI RMF) matters and how its four core functions — Govern, Map, Measure, and Manage — provide a structure for responsible AI adoption. That post was intentionally high-level. This one is not.

Over the past two-plus decades in security leadership, I have watched organizations repeatedly make the same mistake with emerging technology: they adopt first and govern later. We did it with cloud. We did it with mobile. We are doing it right now with generative AI — and the consequences are more significant than most leadership teams realize.

Generative AI is not just another SaaS tool your employees are using without IT approval. It is a fundamentally different class of technology. It processes natural language, which means sensitive data flows through it in ways that do not look like traditional data movement. It generates probabilistic outputs, which means errors and hallucinations can propagate into decisions, documents, and customer-facing communications without any obvious signal that something went wrong. And it is moving faster than any previous technology cycle I have experienced — including the early days of the internet.

This post is the playbook. We are going to go deep on each function of the NIST AI RMF, translate it into actionable governance architecture, and cover what actually works in practice — including the pitfalls that will quietly undermine your program if you do not watch for them.

Why the NIST AI RMF Is the Right Foundation

Before we go function by function, it is worth spending a moment on why NIST's AI RMF is the right foundation for this work — because I have heard the pushback: "We already have NIST CSF. Do we really need another framework?"

The answer is yes, and here is why. The NIST CSF was designed around traditional information systems. It is excellent for governing technology with deterministic behavior — systems that do what they are programmed to do, fail in predictable ways, and can be tested against a defined specification. Generative AI does not work that way. Its outputs are probabilistic, context-dependent, and influenced by training data in ways that are often opaque even to the organizations deploying it.

The NIST AI RMF was specifically designed to address this. It incorporates dimensions like trustworthiness, fairness, explainability, and human oversight that simply do not appear in traditional security frameworks — because they were not needed before. They are needed now.

Importantly, the AI RMF is not a replacement for your existing framework. It is a complement. Organizations running NIST CSF, ISO 27001, or SOC 2 should layer the AI RMF on top, mapping AI-specific risks into their existing risk register and governance structures. The four functions are designed to be integrated with existing practice, not to stand alone.

There is also a regulatory dimension worth naming directly. The EU AI Act, state-level AI legislation in the US, and emerging guidance from financial and healthcare regulators are all converging on the same expectation: organizations that deploy AI systems must demonstrate they have assessed the risks, implemented proportionate controls, and established clear accountability structures. The NIST AI RMF gives you a defensible foundation for that demonstration. Organizations that cannot show structured AI governance will find themselves on the wrong side of regulatory examinations within the next two to three years. That timeline is not speculative — it is already unfolding in the EU.

The GOVERN Function: Building the Foundation That Everything Else Runs On

The Govern function is where most organizations underinvest, and it is the most consequential mistake they make. Technical controls applied without governance are just controls — disconnected from accountability, inconsistently enforced, and invisible to leadership. Governance is what turns a collection of security measures into a program.

For generative AI, governance has to answer five questions clearly and in writing before your organization deploys anything at scale.

Who approves AI use cases? This is not a rhetorical question. In most organizations I have observed, the honest answer is that nobody formally does. Individual teams evaluate tools, make purchasing decisions, and begin using AI in production workflows without a structured review process. That is not a hypothetical risk — it is the default state of most enterprises right now. Your governance structure must define a specific approval authority and a review process that happens before deployment, not after.

Who owns AI risk? Risk ownership for AI is legitimately complex because AI cuts across multiple domains — technology, legal, privacy, operations, and increasingly finance and HR. The CISO does not own all of this. The right model is a shared accountability structure with a named AI risk owner at the enterprise level — often a Chief AI Officer or a senior executive in the business — with the CISO owning the security-specific dimensions.

What policies govern AI use? If your organization's AI policy is a paragraph appended to your acceptable use policy, you do not have an AI policy. A functioning AI governance policy framework should address at minimum: permissible and prohibited use cases, data classification rules for what can and cannot be input into AI systems, third-party AI tool approval requirements, output validation requirements for high-stakes use cases, and human oversight requirements. Each of these deserves its own dedicated policy section, not a bullet point.

How does AI governance connect to existing frameworks? Your AI governance structure should not exist in isolation. Every AI use case should be reviewable within your existing risk register. AI-related controls should map to your existing control framework. AI incidents should flow through your existing incident response process, with AI-specific playbooks as extensions. Organizations that build AI governance as a parallel, siloed structure create coordination problems that compound over time.

How are AI governance decisions documented and communicated? Board-level AI risk reporting is no longer optional for most organizations. Directors are asking about AI risk, and the regulatory trend is toward formal disclosure of material AI risks. Your governance structure must produce documentation that is defensible to regulators, auditors, and board members — not just internal tracking that lives in a spreadsheet someone maintains manually.

Building the AI Governance Committee

The governance committee is the operational body of your AI governance program, and its composition matters more than its name. The effective AI governance committee includes representation from Security, Legal and Compliance, Privacy, Engineering or Technology, the Business Lines deploying AI, and HR for workforce implications. Executive sponsorship at the C-suite level is not optional — without it, the committee becomes an advisory body with no authority to slow down business unit AI adoption when risk thresholds are exceeded.

The committee should meet at a defined cadence — monthly is appropriate for most organizations in active AI adoption phases — and its decision-making authority should be explicitly documented. Who can approve a new AI use case? What risk level requires escalation beyond the committee? What constitutes a prohibited use case that cannot be approved regardless of business case? These boundaries must be defined in writing before the committee faces its first real decision.

 Pro Tip

When you establish your AI governance committee, resist the temptation to make it purely a compliance function. The committee's value is highest when it operates as an enabler — helping business units deploy AI faster by providing a clear, fast approval process for lower-risk use cases while applying rigorous review to high-risk ones. A committee perceived as a barrier will be worked around. A committee perceived as a resource will be engaged early, which is exactly when you want to be involved.

The MAP Function: You Cannot Govern What You Cannot See

The Map function is where governance becomes operational. Its purpose is to build a complete, accurate picture of how AI is being used in your organization, what data flows through those systems, and what the risk context looks like for each use case. This sounds straightforward. In practice, it is one of the most operationally challenging things a security leader will undertake — because the answer to "where is generative AI being used in our organization?" is almost always more complicated than the business units have reported.

Shadow AI is real, it is widespread, and it is more dangerous than shadow IT in most respects. When an employee connects to an unauthorized SaaS application, the primary risk is typically unauthorized data storage. When an employee uses an unauthorized generative AI tool with corporate data in the prompt, the risk includes unauthorized data storage, potential training data exposure, intellectual property disclosure, and outputs that may influence business decisions without appropriate validation. The attack surface of shadow AI is broader and harder to detect than traditional shadow IT.

Building Your AI Asset Inventory

Your AI asset inventory is the foundational artifact of the Map function. It should capture every AI system in use across the organization — sanctioned and unsanctioned — along with key attributes for each: the business function it serves, the data it processes, who has access, what outputs it produces, and how those outputs are used in decision-making.

Building this inventory requires a multi-channel approach. Procurement and vendor management data will surface sanctioned AI tools. Network traffic analysis and proxy logs will begin to reveal unsanctioned usage patterns. Employee surveys, conducted as part of a positive awareness campaign rather than an audit, will surface use cases that do not generate network traffic — local model deployments, for example, or AI features embedded in desktop applications. Business unit interviews are often the most valuable source: department leaders know what their teams are using, and a direct conversation in the context of enabling safe AI adoption yields significantly better information than a top-down audit.

Data Flow Mapping for AI Systems

Once you know what AI systems are in use, the next step is mapping data flows through each system. For generative AI, this is more complex than traditional data flow mapping because inputs are often unstructured — natural language prompts that may contain sensitive information without any formal data classification attached. An employee asking an AI assistant to summarize a customer contract and identify the key renewal terms has just passed potentially confidential commercial information into the model. That interaction does not look like a file upload or a database query — it looks like a text input — but the data sensitivity is identical.

For each AI system in your inventory, document: what categories of data are likely to appear in prompts based on the use case, where the model is hosted and what the vendor's data handling policies are for prompt content, whether prompt content is used for model training, and what logging and retention policies apply to interactions.

 Example: Mapping a Copilot Deployment

A legal team deploys an AI writing assistant to help draft contracts and correspondence. The Map function analysis reveals: prompt inputs regularly include confidential client information and draft agreement terms; the model is cloud-hosted by a third-party vendor; the vendor's default terms include a clause permitting use of prompt data for model improvement unless enterprise licensing with data protection addenda is in place; and interaction logs are retained by the vendor for 30 days by default.

This mapping immediately surfaces two governance actions: verify enterprise data protection terms are in place, and implement a data classification reminder at the prompt entry point so users understand what they are inputting and to whom.

 Pro Tip

When conducting your AI inventory, look specifically for AI features embedded in tools your organization already uses. Microsoft Copilot features in Office 365, AI assistants in collaboration platforms, AI-powered features in HR systems and CRM tools — these are frequently deployed by IT or business units as part of broader platform upgrades without specific security review of the AI capabilities. The AI risk in your environment is almost certainly larger than what has been formally reviewed, and a significant portion of it lives inside products you already approved for non-AI purposes.

The MEASURE Function: Turning AI Risk Into a Number the Board Understands

The Measure function is where AI risk becomes defensible. Its purpose is to evaluate AI risk using consistent, documented methodologies that produce outputs you can track over time, report to leadership, and use to prioritize remediation. This is where security leaders often struggle — because traditional risk measurement methodologies do not translate cleanly to generative AI's unique risk profile.

The core challenge is that generative AI introduces risk dimensions that do not have direct analogs in traditional security risk frameworks. Hallucination risk — the probability that the model produces plausible-sounding but incorrect output — is not a concept that exists in vulnerability management. Bias risk — the possibility that model outputs systematically disadvantage certain groups — falls outside the scope of most security programs but has real legal and reputational consequences. Explainability risk — the difficulty of understanding why a model produced a particular output — affects both incident response and regulatory compliance in ways that require entirely new measurement approaches.

The AI Risk Dimensions You Need to Measure

Accuracy and Reliability: For each AI use case, what is the acceptable error rate? What testing methodology validates that the model is performing within that tolerance? How are errors detected when they occur in production? This dimension is particularly important for use cases where AI outputs influence security decisions — alert triage, threat intelligence summarization, vulnerability prioritization — because errors here have direct operational consequences.

Data Privacy and Confidentiality: What sensitive data passes through AI systems? What are the contractual and technical controls that prevent that data from being exposed to third parties or used in model training? Has a formal Privacy Impact Assessment been conducted for high-risk AI use cases? This dimension connects directly to your existing privacy program and should be evaluated in partnership with your privacy function.

Security Vulnerability: AI systems introduce novel attack surfaces. Prompt injection — the manipulation of model behavior through crafted inputs — is a real and actively exploited attack vector. Model inversion attacks, where adversaries attempt to extract training data from a model, are a demonstrated capability. Data poisoning, where adversaries influence model behavior by corrupting training data, is a concern for organizations that fine-tune models on internal data. Each of these requires specific testing approaches well beyond traditional application security assessment.

Bias and Fairness: For AI use cases that influence decisions about people — hiring, performance management, customer service, credit decisions — bias risk is a legal exposure as well as a reputational one. Security leaders are not always positioned to conduct bias assessments independently, but they should ensure that bias risk is evaluated as part of the overall AI risk profile and that the results are documented.

Vendor and Supply Chain Risk: Most enterprise generative AI deployments rely on third-party foundation models and cloud-hosted AI services. The security posture of those vendors — their data handling practices, security certifications, incident history, and contractual commitments — is a direct input to your AI risk profile. Your existing vendor risk management process should be extended to cover AI vendors, with additional evaluation criteria specific to AI risk dimensions.

Red Teaming for Generative AI

Red teaming is one of the most valuable measurement techniques for generative AI, and it is significantly different from traditional penetration testing. AI red teaming involves deliberately attempting to elicit harmful, incorrect, or policy-violating outputs from the model — testing its behavior against edge cases, adversarial inputs, and scenarios that fall outside its intended use envelope.

For internal AI deployments, red teaming should be conducted before go-live and periodically thereafter, because model behavior can change as underlying foundation models are updated by vendors. For AI features embedded in commercial products, you should request vendor documentation of their red teaming practices and evaluate it as part of your vendor risk assessment.

 Example: Prompt Injection Testing

Your organization deploys an AI assistant that processes customer emails and drafts response suggestions for your support team. A red teaming exercise reveals that a carefully crafted customer email can cause the assistant to include confidential internal process instructions in its draft response — effectively leaking internal documentation to customers. This is a prompt injection vulnerability.

The finding drives two remediation actions: input validation to detect and block injection attempts, and mandatory output review requirements for the support team before any response is sent. Without red teaming, this vulnerability would have been discovered by an attacker or a customer — not by your security team.

 Pro Tip

When you conduct AI risk assessments, document not just the risk ratings but the methodology behind them. Regulators and auditors examining your AI governance program will ask how you assessed risk, not just what the ratings are. A risk rating produced by a documented, repeatable methodology is defensible. A risk rating produced by someone's judgment call is not. The investment in documented methodology pays dividends every time you face external scrutiny.

The MANAGE Function: Operationalizing Controls and Closing the Loop

The Manage function is where governance becomes operational security. It encompasses the technical and procedural controls you implement based on the risk profile established in the Measure function, along with the monitoring, incident response, and continuous improvement processes that keep the program current as AI systems and threats evolve.

Technical Controls for Generative AI

Data Loss Prevention for AI: Traditional DLP tools were not designed with AI prompt inputs in mind, and most organizations are running DLP that cannot inspect or enforce policy on the natural language content that flows into AI systems. This is a significant gap. Modern AI-aware DLP solutions can analyze prompt content, classify data sensitivity, and either block policy-violating inputs or alert on them for review. If your DLP strategy does not address AI prompt content, it has a material blind spot.

API Monitoring and Access Controls: Most enterprise AI deployments use API-based access to foundation models. API calls should be logged, attributed to specific users or service accounts, and monitored for anomalous usage patterns — unusual volumes, off-hours access, inputs that match patterns associated with data exfiltration. Access to AI APIs should follow the same least-privilege principles as any other sensitive system: access provisioned based on demonstrated business need, reviewed periodically, and revoked promptly when no longer required.

Human-in-the-Loop Requirements: For high-risk AI use cases — those where AI outputs influence consequential decisions — mandatory human review should be a control requirement, not an optional step. This means designing workflows so that AI output is explicitly labeled as AI-generated, the human reviewer is required to positively affirm their review before the output is acted upon, and that affirmation is logged. This creates both an accountability record and a dataset that can be used to measure the accuracy of AI outputs over time.

Output Watermarking and Attribution: For organizations that publish AI-generated content or use AI outputs in external-facing documents, watermarking or metadata tagging of AI-generated content provides both accountability and auditability. This is increasingly an expectation in regulated industries where the provenance of analytical outputs matters for compliance purposes.

AI Incident Response

Your existing incident response process handles incidents where something bad happens to your systems. AI incidents are often different: they may involve something bad that the AI system did — generated harmful content, produced an incorrect output that influenced a significant decision, or leaked sensitive information through a prompt injection attack. Your incident response playbooks need to account for this category specifically.

An AI incident playbook should address at minimum: criteria for declaring an AI-related incident, the process for isolating or disabling the AI system while the incident is investigated, the methodology for assessing what outputs were affected and who was impacted, notification requirements when AI incidents affect customers or regulated data, and post-incident review to identify whether the root cause was in the model, the integration, the input data, or the governance controls.

Model drift — the gradual degradation of model performance as real-world data shifts away from the training distribution — is an operational risk that requires monitoring even in the absence of a discrete incident. Establish baseline performance metrics for each AI system at deployment and monitor them continuously. Significant drift should trigger a model review, and depending on the risk profile of the use case, may require disabling the AI feature until the model is retrained or replaced.

Continuous Monitoring and Program Maturity

AI governance is not a point-in-time exercise. It is a continuous operational function, because both AI technology and the threat landscape are evolving faster than any other domain in enterprise security. Your Manage function needs to include a defined review cadence: AI use case inventory reviewed quarterly, vendor risk assessments updated annually or when significant vendor changes occur, AI-specific red teaming conducted at least annually and after major model updates, and AI risk ratings updated to reflect new threats and control improvements.

 Key Takeaways

A formal AI use case approval process must be in place before any AI system reaches production, with documented criteria for approval, conditional approval, and rejection.
An AI asset inventory needs to be maintained and reviewed quarterly — capturing both sanctioned and detected unsanctioned AI usage.
Data flow maps should exist for every AI system that processes sensitive data, with documented vendor data handling commitments.
Risk assessments for high-risk AI use cases should cover AI-specific dimensions — hallucination, bias, prompt injection, and model drift — not just traditional security risk factors.
AI-aware DLP controls must cover prompt input content for cloud-hosted AI systems. Traditional DLP has a significant blind spot here.
Human-in-the-loop requirements need to be implemented and enforced for consequential AI use cases, not just recommended in policy.
AI incident response playbooks should exist and be tabletop-tested before you need them.
Board-level AI risk reporting should be produced on a regular cadence with metrics that track program maturity over time.

Pitfalls to Watch Out For

Treating AI governance as a one-time project. Organizations frequently approach AI governance as a project with a start date and an end date — get the policy written, complete the inventory, build the committee, declare victory. AI governance is an operational discipline. The threat landscape, the technology, and your organization's AI usage profile will all change continuously. Build for sustainability from the beginning: defined owners, defined cadences, defined metrics. A governance program that peaks at launch and atrophies over the following year is worse than no program, because it creates a false sense of coverage.

Building AI governance in isolation from the business. Security-only AI governance programs fail because they generate friction without generating value. The business will work around a governance program it perceives as purely obstructive. Build your program in partnership with the business lines adopting AI, and design it to answer their questions — what can we safely use, how do we get approval, what are the guardrails — rather than simply imposing restrictions. Governance that enables faster, safer AI adoption has stakeholders who defend it. Governance that only slows things down gets defunded.

Underestimating the data exposure surface in AI prompts. Security leaders who have dealt with DLP for years understand data classification and structured data movement. AI prompts are different — they are natural language, they are conversational, and employees do not think of them as data transmissions. The employee who pastes a draft board presentation into an AI tool to clean up the language has just transmitted board-level confidential information to a third party. This happens constantly, in every organization, and it will not stop through policy alone. It requires technical controls that can detect sensitive data in unstructured text inputs.

Ignoring the embedded AI in products you already approved. The AI governance review that focused on ChatGPT and Copilot may have missed the AI features in your HR platform that now automatically generate performance review language, the AI-powered threat intelligence summarization in your SIEM, and the AI writing assistant built into your customer service platform. Every major software vendor is embedding AI capabilities into their products on rapid release cycles. Your governance program needs a mechanism to detect new AI features in products you have already approved, not just to evaluate new AI tool requests.

Failing to document the reasoning behind governance decisions. When your AI governance committee declines to approve a use case or requires additional controls, document the reasoning specifically. "Security concerns" is not documentation. The specific risk dimension, the specific control gap, and the specific threshold that needs to be met for approval — those are documentation. When the business comes back with a revised proposal, or when a regulator asks why a particular AI system was not deployed, you need the reasoning on record.

Assuming your AI vendor's security is sufficient. Cloud AI vendors have security programs. They also have breach histories, data handling practices that may not align with your requirements, and contractual terms that may or may not provide meaningful protection for your data. The fact that a vendor is large and well-known is not a substitute for vendor risk assessment. Conduct formal AI vendor risk assessments, review their data handling terms, ensure data protection addenda are in place for any vendor handling sensitive data, and track your vendor risk findings over time.

 Final Thought

After more than two decades in this field, I have learned that the technologies that move the fastest are the ones that demand the most governance discipline — not because governance slows things down, but because without it, you eventually face a crisis that slows everything down. Generative AI is the fastest-moving technology I have seen in my career, and organizations treating governance as an afterthought are accumulating risk that will materialize in ways they are not prepared for. The NIST AI RMF gives you the structure. The work is in the execution — building the committee, conducting the inventory, doing the assessments, implementing the controls, producing the reporting, and doing it all again on the next cycle. That work is not glamorous. But it is the work that determines whether your organization's AI adoption becomes a competitive advantage or a liability event. Govern it deliberately. Measure it rigorously. Manage it continuously.

Continue the Conversation

This post is part of the InfoSec Made Easy series on AI governance and security leadership. For the introductory post in this series, see Generative AI Governance: Using the NIST Framework to Build Trust, Reduce Risk, and Lead Secure AI Adoption.

Search This Blog