US Government Pushes Major AI Companies to Submit Models for Pre-Release Safety Testing

--- title: "US Government Pushes Major AI Companies to Submit Models for Pre-Release Safety Testing" slug: us-government-pre-release-ai-model-testing category: policy story_number: "16" date: 2026-05-17 ---

# US Government Pushes Major AI Companies to Submit Models for Pre-Release Safety Testing

The federal government has quietly assembled one of the most significant AI oversight frameworks in American history — and it arrived not through legislation, but through a series of voluntary handshake agreements between Washington and Silicon Valley.

On May 5, 2026, the Commerce Department's Center for AI Standards and Innovation (CAISI), housed inside the National Institute of Standards and Technology (NIST), announced landmark deals with Google DeepMind, Microsoft, and Elon Musk's xAI. Under the agreements, all three companies will provide the US government with early access to their most powerful, unreleased AI models — before any member of the public ever sees them.

The announcement expanded a framework already in place with OpenAI and Anthropic, both of which struck similar arrangements with CAISI roughly two years ago. Taken together, five of the most consequential AI developers in the United States have now formally invited the federal government to scrutinize their frontier models ahead of launch.

What "Pre-Release Testing" Actually Means

The mechanics of the program are worth examining closely. CAISI evaluates models not just in their finished, consumer-ready form — complete with content filters and safety guardrails — but in stripped-down versions where those protections have been deliberately removed. The goal is to measure what the raw underlying system is actually capable of, not just what it will do when properly constrained.

Testing is conducted through the TRAINS Taskforce — short for Testing Risks of AI for National Security — a cross-agency body convened by CAISI in November 2024. The taskforce pulls evaluators from more than ten federal agencies, each assigned to a specific threat domain. The National Institutes of Health handles biosecurity assessments. National laboratories assess chemical and nuclear proliferation risks. The Department of Defense and the Department of Homeland Security lead the cybersecurity work.

The scope of what examiners are looking for reflects the current threat landscape as the government perceives it: Could a hostile actor use this model to design a biological weapon? Could it be used to compromise critical infrastructure? Does it offer meaningful "uplift" — technical assistance beyond what someone could easily find elsewhere — for those seeking to build chemical or radiological weapons?

Microsoft said in a statement that it will work with US government scientists to test AI systems "in ways that probe unexpected behaviors" — language that underscores just how open-ended the evaluation process remains.

40 Evaluations and Counting

CAISI has now completed more than 40 evaluations of frontier AI models, including systems that have not yet been publicly released. That figure represents a significant acceleration of the program and suggests the government is building genuine institutional knowledge about the capabilities of cutting-edge AI — not simply conducting symbolic checkbox reviews.

The agreements were renegotiated to align with the White House's AI Action Plan, the current administration's governing framework for AI policy. That document, which emphasizes American competitiveness alongside national security, reflects a pragmatic rather than precautionary approach: the goal is not to slow AI development, but to ensure the government can identify and manage the highest-risk capabilities before they reach the open market.

A Wrinkle: The Page That Disappeared

The story took a strange turn just days after the announcement. By May 11, the Commerce Department's dedicated webpage describing the testing agreements with Google, Microsoft, and xAI had vanished without explanation. Visitors to the original URL encountered a standard 404 message before being redirected to the general CAISI landing page.

No public statement was issued. No replacement notice appeared. The department has not explained why the page was removed, and as of this writing, the original announcement is no longer accessible through official government channels.

The agreements themselves appear to still be active — the redirect to CAISI suggests the testing program continues as planned — but the episode highlights a persistent tension in the current administration's approach to AI governance: a willingness to build meaningful oversight structures alongside an apparent reluctance to publicize them too prominently.

Voluntary, For Now

The critical word in all of this is "voluntary." None of the five companies are legally required to submit their models for review. There is no statute mandating pre-release testing, no regulatory framework with enforcement teeth, and no penalty for a company that decides to walk away from its agreement.

That distinguishes the US approach sharply from the European Union's AI Act, which imposes binding obligations on developers of high-risk systems and is now being enforced across member states. Critics of the American framework argue that voluntary agreements provide the appearance of oversight without the substance — that companies retain full control over what they share, when they share it, and how they interpret the results.

Proponents counter that voluntary cooperation is achievable quickly, avoids the risk of chilling innovation through heavy-handed regulation, and allows the government to build the technical expertise needed to eventually write informed rules. They also note that CAISI's practice of testing models with safety guardrails removed demonstrates a level of genuine scrutiny that goes beyond optics.

What Comes Next

The expansion to Google, Microsoft, and xAI suggests the model is working well enough to attract new participants rather than drive them away. Whether that momentum translates into something more durable — legislated requirements, standardized evaluation criteria, public disclosure of results — remains an open question Congress has yet to seriously engage.

What is clear is that a small agency inside the Commerce Department has become the de facto front line of the US government's engagement with the most powerful AI systems in the world. For now, Washington is at least looking before anyone else gets to.

---

Sources: CNN Business, Al Jazeera, CNBC, Washington Post, Cybersecurity Dive, The Hill, Euronews, Let's Data Science, Cognativ, technology.org

"In ways that probe unexpected behaviors."

— Microsoft, Statement on AI safety cooperation

Major AI companies under CAISI review

40+

Model evaluations completed

10+

Federal agencies in TRAINS Taskforce

May 5

Announcement date