Codify — Article

CLEAR Act requires Copyright Office notices for copyrighted works used to train generative AI

Mandates filing detailed summaries and public indexing of registered copyrighted works used in training datasets, creates private right of action and civil penalties.

The Brief

The CLEAR Act requires people who use training datasets to build or deploy generative AI models to notify the Register of Copyrights with a detailed summary of each copyrighted work in the dataset and, when available, the dataset's URL. The bill limits the covered set of works to those protected under title 17 and either registered under section 408 or scheduled under section 1401, and directs the Copyright Office to publish a public online database of submitted notices.

The statute creates a private enforcement mechanism: copyright owners can sue for failures to notify, seeking civil penalties (floor of $5,000 per missing notice), injunctions to stop use until a notice is filed, and attorneys’ fees, subject to a $2.5 million annual cap on monetary penalties per defendant. The Register must issue implementing regulations within 180 days of the law taking effect (the law itself takes effect 180 days after enactment).

At a Glance

What It Does

Requires filing with the Register a detailed summary of each registered copyrighted work used in training datasets for generative AI models and the dataset URL if publicly available. The Copyright Office must publish filings in a publicly accessible database and issue implementing regulations.

Who It Affects

Developers, researchers, and organizations that train or commercially use generative AI models using datasets that include copyrighted works registered under title 17 (section 408 or scheduled under 1401). It also affects copyright owners who gain a new private right to sue for noncompliance.

Why It Matters

This bill creates the first federal disclosure regime tying copyright registration status to AI training transparency, shifting some compliance burden onto model builders and creating a civil-enforcement pathway that could alter how datasets are assembled, documented, and licensed.

More articles like this one.

A weekly email with all the latest developments on this topic.

Unsubscribe anytime.

What This Bill Actually Does

The CLEAR Act builds a disclosure-first compliance regime around the use of copyrighted material in training generative AI systems. It defines a covered ‘‘generative AI model’’ as a combination of code and numerical values that produces expressive outputs and defines covered copyrighted works narrowly to those protected under title 17 and either registered under section 408 or scheduled under section 1401.

The practical result is that only registered or scheduled copyrights trigger the statute’s notice requirement.

Under the Act, any person who uses a training dataset when training or releasing a generative AI model must provide the Register with a sufficiently detailed summary of each covered copyrighted work in the dataset and the dataset’s URL if it is publicly accessible on the web at filing time. The bill tasks the Register with issuing regulations—within 180 days of the law’s effective date—to set the form, content, and filing procedures for those notices, and requires the Register to maintain a public online database containing every submitted notice.Enforcement is civil and owner-driven: a copyright owner may sue a person who used a covered work without filing the required notice.

Available remedies include civil penalties (with a per-instance floor and an annual cap), injunctions stopping use of the work until a notice is filed, and recovery of attorney’s fees. Penalties are routed to the Copyright Office to offset operating costs.

The Act takes effect 180 days after enactment, and the Register must act promptly to issue implementing regulations so entities can comply within the required windows.

The Five Things You Need to Know

1

The notice must include a ‘‘sufficiently detailed summary’’ of each covered copyrighted work in the training dataset and the dataset’s URL if publicly available at filing time.

2

Covered copyrighted works are limited to works protected under title 17 that are registered under 17 U.S.C. §408 or scheduled under 17 U.S.C. §1401.

3

A copyright owner may sue for failure to file; courts may impose civil penalties of at least $5,000 per instance, subject to a $2,500,000 annual cap per defendant.

4

The Register must issue regulations within 180 days of the law’s effective date and maintain a publicly available online database of all submitted notices.

5

The Act’s effective date is 180 days after enactment; for models first used before that date, notices are due within 30 days after the Register issues its regulations.

Section-by-Section Breakdown

Every bill we cover gets an analysis of its key sections. Expand all ↓

Section 1

Short title

Names the statute the ‘‘Copyright Labeling and Ethical AI Reporting Act’’ or the ‘‘CLEAR Act.’

Section 2(a)

Key definitions

Provides operational definitions for terms that determine scope: ‘‘artificial intelligence’’ (an automated system performing human-like tasks), ‘‘generative AI model’’ (code plus numerical parameters producing expressive outputs), ‘‘training dataset’’ (collections of materials and annotations), ‘‘copyrighted work’’ (works protected under title 17 that are registered under §408 or scheduled under §1401), and ‘‘Register’’ (the Register of Copyrights). The narrow definition of ‘‘copyrighted work’’ focuses enforcement and disclosure obligations on works with formal registration or scheduling status, not on unregistered works.

Section 2(b)

Notice filing requirement and timing

Requires persons who use training datasets in training or releasing generative AI models to file a notice with the Register containing a detailed summary of each covered copyrighted work and the training dataset URL if publicly available. The bill sets a primary filing deadline of 30 days before a model’s first commercial use or release for models first used or released on or after the Act’s effective date; for models already in use or released before enactment, notices are due within 30 days after the Register issues regulations. The Register has 180 days after the effective date to issue rules specifying form and filing procedures.

3 more sections
Section 2(c)

Enforcement: private right of action and remedies

Grants copyright owners a private cause of action to sue users who failed to file the required notice. Remedies the court may order include civil monetary penalties (not less than $5,000 per failure), injunctions stopping use of the work until a notice is filed, and award of attorneys’ fees and costs. Monetary penalties collected are paid to the Register and used to offset Copyright Office operating costs. The provision also caps annual civil penalties at $2.5 million per defendant but preserves the availability of injunctions and fee awards regardless of the cap.

Section 2(d)

Public online database

Directs the Register to establish and maintain a publicly accessible online database containing every notice filed under the statute. The database requirement creates a searchable public record intended to increase transparency about which registered works were included in training datasets and which datasets are publicly accessible.

Section 2(e)

Effective date

Sets the Act’s effective date at 180 days after enactment, which triggers the 180-day window for the Register to issue implementing regulations and the deadlines for retrospective notice filings for existing models.

At scale

This bill is one of many.

Codify tracks hundreds of bills on Technology across all five countries.

Explore Technology in Codify Search →

Who Benefits and Who Bears the Cost

Every bill creates winners and losers. Here's who stands to gain and who bears the cost.

Who Benefits

  • Registered copyright owners — gain a statutory private right to sue when their registered works are used in training datasets without the required notice, plus access to injunctions and fee-shifting to enforce compliance.
  • Researchers and practitioners favoring transparency — obtain a public, centralized database that can be searched to learn which registered works fed particular models or datasets, supporting provenance and auditability.
  • Users and consumers — benefit indirectly from increased traceability about training sources, which may make downstream provenance claims and attribution more reliable.
  • Copyright Office — receives penalty receipts earmarked to offset operating costs and gains a defined role running a public registry that elevates its oversight function over AI training disclosures.

Who Bears the Cost

  • AI developers and dataset curators — must inventory and summarize every registered copyrighted work in training datasets, adapt data management practices, and potentially face sizable penalties for failures to file.
  • Organizations using large-scale or mixed datasets — face operational burdens identifying which items are registered works and producing summaries at scale, with attendant legal and compliance costs.
  • Small research groups and startups — may confront disproportionate compliance costs for drafting notices, developing filing workflows, and legal review, even where licensing status is unclear.
  • Copyright Office operations — must implement and maintain a public database and issue regulations within a statutory deadline, creating workload and resource planning challenges despite penalty-offset language.

Key Issues

The Core Tension

The central dilemma is between transparency and practicability: the bill advances public traceability of registered copyrights used in AI training, but it requires developers to perform costly, potentially infeasible identification and summary work across massive datasets—and leaves the critical details of what constitutes an adequate notice to agency rulemaking, creating operational and legal uncertainty.

The CLEAR Act ties its disclosure obligation to the formal registration or scheduling status of works under title 17, which narrows the statute’s reach but creates a compliance hinge: model builders must identify which items in massive, mixed-source datasets are registered works. That requires provenance systems and legal review; without machine-readable registration metadata, the burden will fall to developers to perform rights assessments at scale.

The bill requires ‘‘sufficiently detailed summaries’’ of each covered work but leaves the content and granularity to forthcoming regulations, creating uncertainty about how granular those summaries must be and whether they will be feasible for datasets containing millions of items.

Private enforcement gives copyright owners a direct route to penalties and injunctions, but the statutory penalties regime contains competing design choices: a $5,000-per-instance floor plus a $2.5 million annual cap per defendant. That structure can produce very different outcomes depending on how ‘‘instance’’ is defined (per work, per dataset, per model, or per use).

The routing of penalties to the Copyright Office to offset operating costs raises administrative questions about fee design and whether penalty receipts will be sufficient or stable for the sustained administrative workload the database and rulemaking require.

Try it yourself.

Ask a question in plain English, or pick a topic below. Results in seconds.