Codify — Article

TEST AI Act: Pilot AI testbeds for evaluation standards

Creates a NIST-led pilot using testbeds to develop measurement standards for evaluating AI systems used by federal agencies.

The Brief

This Act requires the Director of the National Institute of Standards and Technology (NIST) to establish a pilot program of AI testbeds to develop measurement standards for evaluating AI systems used by federal agencies. It sets up interagency coordination with the Department of Energy and the Department of Commerce, and creates an AI Testing Working Group to advise on standards development.

The bill also requires a strategy for measurement standards within one year, a set of testbeds within two years, and a final report to Congress within a defined timeline. The overarching aim is to advance measurement science for AI, and to provide a transparent, collaborative process involving government, industry, and academia.

At a Glance

What It Does

Directs NIST, in coordination with DOE and a new AI Testing Working Group, to implement a pilot program using testbeds to develop measurement standards for evaluating AI systems used by federal agencies.

Who It Affects

Federal agencies deploying AI, DOE facilities, researchers in academia, industry stakeholders developing government AI tools, and private sector standard-setting bodies.

Why It Matters

Establishes a formal, transparent process to measure AI systems, guiding procurement, risk management, and accountability for federal use while coordinating across multiple agencies and sectors.

More articles like this one.

A weekly email with all the latest developments on this topic.

Unsubscribe anytime.

What This Bill Actually Does

The TEST AI Act would put NIST in charge of a pilot program that uses dedicated testbeds to figure out how to measure AI systems used by federal agencies. It defines testbeds as capable, transparent environments for rigorously testing AI tools to assess reliability, performance, security, and other important attributes.

The bill also creates a Working Group called the Artificial Intelligence Testing Working Group to advise on what standards are needed and how to implement them. The Working Group would include senior representatives from the Department of Commerce, the Department of Energy, and the Institute (NIST), along with private sector and academic participants, but no members from certain foreign countries.

The Five Things You Need to Know

1

The bill requires a pilot of AI testbeds to develop measurement standards for evaluating federal AI systems.

2

A new AI Testing Working Group (up to 10 members) will guide standard development and must exclude citizens from covered foreign countries.

3

A one-year deadline requires a strategy for measurement standards, including metrics and a blueprint for development.

4

A two-year timeline requires development of testbeds for federal AI evaluation, with input from academia, industry, and standards bodies.

5

A report to Congress is due within 180 days after the first testbed demonstration, with findings and recommended actions.

Section-by-Section Breakdown

Every bill we cover gets an analysis of its key sections. Expand all ↓

Section 2

Definitions

This section defines key terms used throughout the Act: AI system, covered foreign country (China, Russia, North Korea, and Iran), Director (head of NIST), Institute (NIST itself), testbed (facilities or mechanisms for rigorous, transparent, and replicable AI testing), and Working Group (AI Testing Working Group). These definitions establish the scope of what is being measured and who may participate in the governance structure.

Section 3(a)

Pilot program on testbeds for measurement standards

Section 3(a) requires the Director to lead a pilot program, in coordination with the Secretary of Energy and after consulting the Working Group, to test the feasibility of developing measurement standards for evaluating AI systems used by federal agencies. The process must be iterative, advancing measurement science through testbeds and reviewing results with government, private sector, and academic stakeholders.

Section 3(b)

Memorandum of Understanding

Section 3(b) mandates a memorandum of understanding within 180 days between the Secretary of Commerce and the Secretary of Energy to coordinate access to DOE resources, personnel, and facilities to support the pilot. It also requires periodic renegotiation (every two years) to ensure the arrangement continues to meet agency needs.

3 more sections
Section 3(c)

Artificial Intelligence Testing Working Group

Section 3(c) Establishes the Working Group, known as the AI Testing Working Group, to advise on measurement standards. It must include senior officials from Commerce and Energy, the Director of NIST (or a designee), and representatives from private sector and academia. No citizen of a covered foreign country may serve. The Group must develop a strategy for measurement standards within one year, detailing necessary standards, a blueprint, initial applications, and metrics.

Section 3(d)

Development of testbeds for measurement standards

Section 3(d) tasks the Director, with DOE collaboration and in line with the Working Group strategy, to develop testbeds within two years to demonstrate measurement standards for federal AI evaluation. The Director may hire external experts from academia, industry, and standards organizations to support this work.

Section 3(e)

Reporting

Section 3(e) requires a report to Congress within 180 days after the first testbed demonstration. The report reviews initial findings, recommends revisions to the pilot plan (including resources and personnel needs), and suggests legislative or administrative actions to advance measurement standards for AI evaluation.

At scale

This bill is one of many.

Codify tracks hundreds of bills on Technology across all five countries.

Explore Technology in Codify Search →

Who Benefits and Who Bears the Cost

Every bill creates winners and losers. Here's who stands to gain and who bears the cost.

Who Benefits

  • Federal agencies deploying AI systems in procurement and operations can rely on standardized evaluation criteria to inform decisions and risk management.
  • NIST and the Institute gain a structured mandate to develop and coordinate measurement standards across agencies.
  • The Department of Energy and its national laboratories provide facilities and expertise essential to building and testing AI measurement testbeds.
  • AI developers, vendors, and tool providers for government customers benefit from common standards and clearer acceptance criteria.
  • Academic researchers in AI measurement and evaluation gain access to government-led testbeds and collaboration opportunities with industry.

Who Bears the Cost

  • Federal agencies must participate in pilot activities and potentially adjust procurement practices to align with new standards.
  • NIST and DOE will need funding and staffing to operate testbeds and manage cross-agency coordination.
  • Private sector and academic participants contribute time and resources to the Working Group and testbed testing without guaranteed reimbursements.
  • Industry participants may incur costs to adapt to new measurement standards and participate in demonstrations and data sharing.
  • Potential overlap with existing standards efforts could add administrative burden and coordination costs across multiple stakeholders.

Key Issues

The Core Tension

Balancing rigorous, transparent measurement standard development with the fast pace of AI innovation, while excluding certain foreign nationals from governance, creates a tension between inclusivity of expertise and national-security concerns, and between ambitious timelines and practical resource constraints.

The act creates a formal, government-wide effort to define how AI should be measured before federal adoption. This requires interagency coordination, including a memorandum of understanding with the Department of Energy to access facilities and cross-cutting R&D programs.

It also imposes a governance constraint—no citizens of covered foreign countries may serve on the AI Testing Working Group—while asking for a careful, staged approach to strategy development and implementation. The timeline is tight: a one-year strategy and a two-year testbed development window, followed by a 180-day reporting period after the first demonstration.

These deadlines raise questions about funding, logistics, and the ability to harmonize this effort with other ongoing AI standards initiatives across government and industry.

Try it yourself.

Ask a question in plain English, or pick a topic below. Results in seconds.