The TRAIN Act adds a new section (§514) to Title 17 that lets a copyright owner (or an authorized agent) ask a U.S. district court clerk to issue a subpoena to an AI developer for copies of, or records sufficient to identify, the copyrighted works the developer used to train a generative artificial intelligence model. The requester must file a proposed subpoena plus a sworn declaration stating a subjective good-faith belief that the owner’s works were used and that the materials will be used only to protect the owner’s rights.
If the request and declaration are in proper form, the clerk must promptly issue the subpoena and the developer must expeditiously disclose the requested records; refusal triggers a rebuttable presumption that the developer made copies.
Practically, the bill creates a fast, statutory pathway for rights-holders to discover whether particular works were incorporated into training datasets. That lowers the evidentiary barrier for copyright investigations but also raises operational, confidentiality, trade-secret, and privacy issues for developers, dataset curators, and public agencies involved in model training.
At a Glance
What It Does
Creates §514 in the Copyright Act allowing a copyright owner to file a proposed subpoena and sworn declaration with a district court clerk to compel a developer to produce training material copies or records identifying works used to train generative AI models. The clerk must issue the subpoena if it is in proper form and the developer must disclose expeditiously.
Who It Affects
Developers of generative AI models (including private companies, third‑party dataset curators, and state or local government agencies), copyright owners and their agents, courts handling subpoena enforcement, and entities that host or supply training datasets.
Why It Matters
This is the first federal statutory subpoena mechanism targeted at AI training datasets under the Copyright Act; it materially lowers the cost and evidentiary hurdle for rights-holders to investigate model training, while pushing confidentiality, trade‑secret, and procedural conflicts into judicial enforcement and protective-order practice.
More articles like this one.
A weekly email with all the latest developments on this topic.
What This Bill Actually Does
The TRAIN Act creates a new statutory tool inside the Copyright Act for rights‑holders who believe a generative AI model was trained on their copyrighted works. It defines key terms — including “generative artificial intelligence model,” “training material,” and “developer” — and then authorizes a legal or beneficial copyright owner (or an authorized representative) to submit a proposed subpoena and a sworn declaration to a federal district court clerk.
The declaration must say the requester has a subjective good‑faith belief the works were used, that the subpoena aims to verify that use, and that the disclosed material will be used only to protect the owner’s rights.
If the paperwork is in proper form the clerk has a ministerial duty to issue the subpoena expeditiously. The subpoena can only seek copies of, or records sufficient to identify with certainty, works likely owned or controlled by the requester; it cannot be used to sweep up other parties’ copyrighted materials.
A developer who receives the subpoena must promptly provide the requested copies or identifying records. The statute incorporates Federal Rules of Civil Procedure subpoena mechanics and remedies “to the greatest extent practicable,” so enforcement disputes will mostly be litigated under familiar subpoena duces tecum procedures.The bill builds in two strong enforcement levers for requesters: a statutory rebuttable presumption that favors the requester if a developer fails to comply, and a sanctions regime that borrows Rule 11 procedures to punish bad‑faith subpoena requests.
The recipient of a subpoena may move for sanctions against a rights‑holder who sought the subpoena in bad faith. Finally, the statute imposes a confidentiality duty on the requester: received copies or records cannot be disclosed to others without proper authorization, though the bill leaves the contours of protective measures and cost allocation to litigation practice.
The Five Things You Need to Know
The statute defines “developer” broadly to include private entities, third‑party training dataset curators, and state or local government agencies that design, substantially modify, curate, or use generative AI models; it expressly excludes noncommercial end users.
A requester must file a proposed subpoena plus a sworn declaration attesting to a subjective good‑faith belief the requester’s copyrighted works were used, that the subpoena’s purpose is rights protection, and that disclosed materials will be used only for that purpose.
If the proposed subpoena and declaration are in proper form the district court clerk must promptly issue and sign the subpoena — the statute imposes a ministerial issuance duty rather than requiring a pre‑issuance merits hearing.
If a developer fails to comply with an issued subpoena, the statute creates a rebuttable presumption that the developer made copies of the copyrighted work, giving the requester evidentiary leverage in downstream litigation.
The bill makes bad‑faith subpoena requests sanctionable and applies Rule 11’s procedures to those sanctions, allowing subpoena recipients to seek penalties against rights‑holders who abused the process.
Section-by-Section Breakdown
Every bill we cover gets an analysis of its key sections.
Definitions — model, developer, training material, substantial modification
This subsection anchors the statute by defining who and what the law covers. “Generative artificial intelligence model” is framed to include models that synthesize content (text, images, audio, video) and any later variations of those models, which means derived or forked versions can fall within the subpoena’s reach. “Developer” covers entities that design, substantially modify, own, or supervise training and explicitly includes dataset curators and State or local government agencies; importantly, it excludes noncommercial end users. “Training material” expressly includes not just raw files but annotations and other expressive components used during training. Those choices expand the universe of potential targets and make it more likely that annotations and curated metadata will be discoverable.
Who may request a subpoena and scope limitation
Only a legal or beneficial copyright owner, or someone authorized to act for that owner, can seek a subpoena under this section. The statutory text limits requests to materials likely owned or controlled by the requester — the requester may not demand records identifying works owned by third parties. The operative evidentiary threshold to request a subpoena is subjective: the requester must have a good‑faith belief that the owner’s works were used. That low threshold makes it easier to seek information but also invites disputes over the sufficiency and motives behind requests.
Filing requirements and what a subpoena must order
A valid filing requires a proposed subpoena plus a sworn declaration containing three attestations: (1) the requester’s subjective good‑faith belief, (2) the subpoena’s purpose to identify training materials used to train the model, and (3) a promise to use the materials only to protect the requester’s rights. The subpoena may demand either copies of the training material or records sufficient to identify with certainty the works used. The developer is ordered to disclose the items expeditiously once served; the recipient’s obligations are practical and time‑sensitive, which has operational implications for locating archived datasets or provenance records.
Clerk issuance, enforcement framework, and confidentiality duty
If the proposed subpoena and declaration are in proper form, the clerk must promptly issue and sign it and return it to the requester for service. The statute instructs that, unless otherwise provided, issuance, service, and enforcement should follow the Federal Rules of Civil Procedure for subpoenas duces tecum “to the greatest extent practicable.” The statute also imposes a duty of confidentiality on requesters who receive disclosed materials, prohibiting further dissemination without proper authorization — but it leaves the procedural mechanics (protective orders, in‑camera review) to courts and the established rules.
Consequences for noncompliance and sanctions for abuse; effective date
Failure to comply with an issued subpoena triggers a statutory rebuttable presumption that the developer made copies of the copyrighted work, giving requesters an evidentiary advantage in litigation unless the developer rebuts that inference. The statute also enables recipients to seek sanctions if a subpoena was requested in bad faith, and it directs courts to apply Rule 11 procedures to those sanctions. The new section takes effect on enactment and appears as §514 in the Title 17 table of sections.
This bill is one of many.
Codify tracks hundreds of bills on Technology across all five countries.
Explore Technology in Codify Search →Who Benefits and Who Bears the Cost
Every bill creates winners and losers. Here's who stands to gain and who bears the cost.
Who Benefits
- Copyright owners and publishers — gain a streamlined, statutory path to discover whether and how their works were used in training generative models, reducing investigative cost and time to evidence.
- Independent creators and small rights‑holders — the clerk‑issued subpoena lowers the barrier to find out whether infringement occurred, making it plausible for smaller rightsholders to pursue enforcement without expensive discovery fights.
- Litigation counsel for plaintiffs — receive a new, predictable procedural tool that creates early leverage (including the rebuttable presumption) to support claims or force settlements.
- Regulators and enforcement agencies — the statute creates clearer legal authority to seek training dataset evidence when investigating potential widespread or systemic uses of copyrighted material.
- Rights management services — organizations that catalog and enforce copyrights can use the subpoena mechanism to verify dataset composition and negotiate licensing or takedown remedies.
Who Bears the Cost
- AI developers, model owners, and dataset curators (including cloud hosts) — face operational and compliance costs to locate, extract, and produce archived training materials, annotations, and provenance logs, and risk exposing proprietary data.
- State and local government agencies involved in model development — the inclusion of public entities as developers creates potential obligations to produce datasets, raising budgetary and sovereignty concerns.
- Startups and small AI vendors — may incur disproportionate compliance burdens and legal costs responding to subpoenas, and face business risk if proprietary pipelines or supplier contracts are exposed.
- Data providers and third‑party licensors — could confront contract enforcement conflicts or demands to produce materials they supplied under confidentiality agreements.
- Courts and litigators — increased motion practice over issuance, protective orders, cost allocation, and enforcement will add docket pressure and time spent resolving trade‑secret, privacy, and foreign‑law conflicts.
Key Issues
The Core Tension
At its core the TRAIN Act balances two legitimate objectives that pull in different directions: enabling copyright owners to learn whether their works were used in model training versus protecting developers’ operational secrecy, trade secrets, and privacy. The statute solves one problem — evidentiary access — by shifting burdens and uncertainty onto developers and courts, leaving little statutory guidance about how to protect proprietary or sensitive information once a subpoena is issued.
The TRAIN Act trades investigatory friction for swift disclosure, but the practical implementation raises knotty questions. The statute uses a subjective good‑faith filing standard and gives clerks a ministerial duty to issue subpoenas if the forms and declarations look proper; that combination lowers the barrier to issuance and puts the onus on developers to resist overbroad demands.
Recipients will therefore face frequent motions over scope, burdensomeness, trade‑secret protection, and cost allocation — issues the statute defers to Federal Rules of Civil Procedure practice rather than resolving on the statute’s face.
The statute’s efficacy depends on how courts interpret several imprecise phrases. “Records sufficient to identify with certainty” is an operationally demanding standard that could require developers to surface provenance metadata, annotation logs, and archived checkpoints — materials that may be decentralized, ephemeral, or governed by third‑party contracts. The rebuttable presumption for noncompliance grants strong evidentiary leverage to requesters but may penalize developers who legitimately cannot locate historical training artifacts (because of model retraining, distributed pipelines, or supplier deletion policies).
Finally, the confidentiality duty on requesters is necessary but thin: enforcement mechanisms and cost‑sharing for protective measures (e.g., in‑camera review, redaction, or retained-counsel arrangements) are left to case law and court rules, creating uncertainty for both rights‑holders and developers.
Try it yourself.
Ask a question in plain English, or pick a topic below. Results in seconds.