10 Key Insights into the Battle Over AI Model Evaluation Leadership

By ✦ min read

As artificial intelligence reshapes industries and national security, a quiet but fierce turf war has erupted within the U.S. government. At the heart of the conflict: which agency should lead the evaluation of AI models for cybersecurity risks? The White House's Office of the National Cyber Director (ONCD) and the Commerce Department's Center for AI Security and Innovation (CAISI) are locked in a struggle for control, while intelligence officials push for greater influence. Here are ten critical things you need to know about this bureaucratic showdown.

1. The Core Dispute: Who Evaluates AI Models?

The primary disagreement centers on AI model evaluations—tests to identify vulnerabilities in large language models and other AI systems. ONCD, which coordinates federal cybersecurity strategy, argues it should lead because AI threats directly impact national cyber defense. In contrast, CAISI, housed under Commerce, contends that its expertise in AI standards and industry partnerships makes it the natural fit. This clash reflects deeper questions about how the U.S. government will regulate a technology that evolves faster than policy.

10 Key Insights into the Battle Over AI Model Evaluation Leadership

2. The White House's Office of the National Cyber Director (ONCD)

Established in 2021, ONCD works under the Executive Office of the President to develop national cybersecurity strategy. It oversees incident response, threat intelligence, and interagency coordination. ONCD fears that handing AI evaluation to a commerce-focused agency will prioritize economic growth over security. Officials argue that only a cybersecurity-led approach can anticipate adversarial uses of AI, such as automated hacking or disinformation. The office sees this as an existential issue for federal cyber resilience.

3. Commerce's Center for AI Security and Innovation (CAISI)

CAISI, part of the National Institute of Standards and Technology (NIST), spearheads AI risk management. It published the AI Risk Management Framework and runs pilot programs for model evaluation. Commerce officials insist that CAISI's industry ties and technical rigor make it best suited to set evaluation benchmarks. They argue that ONCD's focus on threat intelligence is too narrow—missing the broader societal impacts of AI, like bias and transparency. CAISI wants to build consensus with tech companies, not just issue security mandates.

4. Intelligence Community's Uneasy Role

Behind the scenes, intelligence officials are angling for sway in AI policy. Agencies like the CIA and NSA have unique access to adversary capabilities and can assess how nation-states might weaponize open-source AI models. They worry that a purely cyber-focused evaluation might overlook intelligence-gathering applications—or, conversely, share sensitive findings too broadly. Their involvement adds a third dimension to the power struggle, complicating any quick resolution.

5. Why This Fight Matters for National Security

The outcome will determine how the U.S. defends against AI-enabled cyberattacks. If ONCD leads, evaluation protocols will likely be classified and tightly controlled. A CAISI-led approach would be more transparent, allowing companies to self-assess. The wrong choice could leave critical gaps: an overly secretive system may alienate industry, while a public one might tip off adversaries about detection methods. The stakes are high—AI models are already being used to write malicious code and craft convincing phishing emails.

6. Industry Reactions and Concerns

Tech firms like OpenAI, Google, and Microsoft are watching closely. Many prefer voluntary guidelines from CAISI over mandatory rules from ONCD. However, some cybersecurity startups worry that Commerce's approach lacks teeth to stop real threats. A survey of AI developers cited in policy debates shows that 60% believe federal leadership is unclear—slowing investment in safety tools. The uncertainty could push companies to adopt foreign standards, such as the EU's AI Act, weakening U.S. influence.

7. Historical Precedent: The Cybersecurity vs. Commerce Rivalry

This isn't the first bureaucratic turf war. During the 2016 election hacking crisis, the Department of Homeland Security and the FBI clashed over attributing attacks. More recently, the Commerce Department's Bureau of Industry and Security squared off against State over export controls on AI chips. In each case, the resolution required a presidential directive. Analysts predict a similar top-down decision here, likely after the next White House cybersecurity posture review.

8. Potential Compromise: A Joint Task Force

To break the impasse, some in Washington propose a joint task force that includes ONCD, CAISI, and intelligence agencies. This would mirror the AI Safety and Security Board established by the Department of Homeland Security. Under this model, ONCD would handle threat modeling, CAISI would manage testing standards, and intelligence would provide threat intelligence. However, such arrangements often suffer from bureaucratic inertia—each agency still protects its own budget and authority.

9. Legislative Interest and Congressional Pressure

Congress is starting to weigh in. Bipartisan bills in the Senate and House propose creating an independent AI evaluation office, bypassing the executive branch fight. Lawmakers are concerned that delays in setting standards will leave the U.S. vulnerable. Senator Mark Warner (D-VA) has called the squabbling "counterproductive." Hearings are expected in early 2025, which could force ONCD and CAISI to publicly justify their positions—or risk losing the portfolio altogether.

10. What This Means for the Future of AI Governance

The battle over model evaluation is a microcosm of a larger debate: who in the U.S. government is responsible for AI safety? The answer will shape everything from data privacy rules to military AI deployment. A win for ONCD would centralize power in the White House, signaling a security-first posture. A win for CAISI would reinforce a multistakeholder, consensus-driven approach. And the intelligence community's role will test how much secrecy can coexist with public trust. One thing is clear: the era of ad hoc AI regulation is over—the U.S. must pick a leader.

Conclusion: A Pivotal Moment for U.S. AI Policy

As the ONCD-CAISI stalemate continues, the federal government risks falling behind both adversaries and allies. Without clear leadership, AI model evaluations will be fragmented—leaving vulnerabilities unaddressed. The choices made in the next six months will set a precedent for how Washington manages emerging technologies. Whether through executive directive or legislative intervention, a decision is inevitable. For now, the question remains: which agency will take the helm, and will it be ready for the challenge?

Tags: