The TRAIN Act adds a new Section 514 to chapter 5 of Title 17 to give copyright owners a statutory route to obtain copies of, or records identifying, materials used to train generative artificial intelligence models. A rights holder (or an authorized agent) can ask the clerk of a U.S. district court to issue a subpoena to a ‘developer’—a broadly defined category that includes creators, owners, substantial modifiers, and dataset curators—if the requester has a subjective good-faith belief that the developer used the requester’s copyrighted works in training.
If the request complies with the bill’s formalities (a proposed subpoena plus a sworn declaration), the clerk must promptly issue the subpoena; the developer must then “expeditiously” disclose the copies or identifying records. Noncompliance triggers a rebuttable presumption that the developer copied the work.
The statute also imposes a confidentiality duty on recipients of disclosed materials and applies Rule 11 sanctions to bad-faith subpoena requests. The measure creates an administrative discovery-style tool that prioritizes copyright owners’ ability to investigate training datasets while leaving open important questions about trade secrets, privileged material, and precise procedural protections.
At a Glance
What It Does
Creates 17 U.S.C. §514 permitting a copyright owner (or authorized agent) to request a clerk-issued subpoena forcing a developer of a generative AI model to produce copies of, or records identifying, training material if the requester has a subjective good-faith belief the requester’s works were used. If the request is in proper form the clerk must issue the subpoena; the developer must then expeditiously comply.
Who It Affects
Generative AI developers (including state and local government agencies, third‑party dataset curators, and entities that substantially modify models), authors and copyright owners seeking to verify training use, and courts/clerk offices that must process and enforce the subpoenas.
Why It Matters
The bill creates a low-friction investigatory path that shifts compliance costs toward model developers and dataset curators and erects procedural presumptions that can influence later infringement litigation. It is one of the first statutory efforts to mandate disclosure of AI training datasets at scale, with potential consequences for trade secrets, privacy, and data governance.
More articles like this one.
A weekly email with all the latest developments on this topic.
What This Bill Actually Does
The TRAIN Act adds a single new section to the Copyright Act aimed at discovery of AI training material. It defines key terms narrowly for the statute’s purposes: ‘generative artificial intelligence model’ covers models that generate synthetic content from input data; ‘developer’ covers the parties who design, own, substantially modify, or curate training datasets (explicitly including state and local governments) and excludes noncommercial end users. ‘Training material’ is defined to include individual works or components—text, images, audio—and related annotations.
Under the new process a copyright owner (or someone authorized to act for them) who reasonably believes their work was used can file a proposed subpoena plus a sworn declaration with a district court clerk. The declaration must state the requester’s subjective good-faith belief and that the subpoena’s purpose is to determine whether the developer used the requester’s copyrighted material; it must also promise limited use of any disclosures.
If the paperwork is in proper form, the clerk must promptly sign and issue the subpoena and the requester delivers it to the developer. The statute makes the issuance administrative and clerical rather than adjudicative: there is no textual requirement for a judge to review the merits before issuance.Once a developer receives the subpoena, the bill requires them to expeditiously produce the requested copies or records sufficient to identify the training material “with certainty.” The statute borrows enforcement mechanisms from the Federal Rules of Civil Procedure for service and sanctions, but it adds two notable statutory backstops: (1) a failure to comply creates a rebuttable presumption that the developer made copies of the copyrighted work, and (2) the court may impose Rule 11-style sanctions on requesters who pursue subpoenas in bad faith.
Recipients of disclosed material must keep it confidential absent proper authorization. The statute takes effect on enactment and otherwise relies on standard court rules for implementation.
The Five Things You Need to Know
A district court clerk must promptly sign and return a proposed subpoena if the filing is in proper form and the requester provides a sworn declaration asserting subjective good faith.
The statutory subpoena can seek either copies of training material or records sufficient to identify with certainty the copyrighted works used in training.
If a developer fails to comply with a §514 subpoena, that noncompliance creates a rebuttable presumption that the developer made copies of the copyrighted work.
The bill explicitly defines ‘developer’ to include dataset curators and state or local government agencies that design, own, substantially modify, or supervise training datasets, while excluding noncommercial end users.
A recipient of a subpoena can move for sanctions if the requester acted in bad faith; Rule 11(c) procedures apply to penalties imposed under the statute.
Section-by-Section Breakdown
Every bill we cover gets an analysis of its key sections.
Short title — TRAIN Act
This brief heading clause names the statute the ‘Transparency and Responsibility for Artificial Intelligence Networks Act’ and has no substantive effect on rights or remedies. It secures the bill’s public-branding but does not alter the legal mechanisms introduced in the substantive section that follows.
Who and what the statute covers
The definitions set the scope: generative AI models (those that create synthetic content) are singled out; training material explicitly includes component works and annotations; and a ‘developer’ is broadly drawn to include entities that design, own, or substantially modify models or curate training datasets, and it expressly covers state and local governments. The statute’s definition of ‘substantially modify’ reaches retraining and fine‑tuning, so entities doing routine model updates can fall within the developer label. By excluding noncommercial end users the bill narrows targets, but it still captures most commercial teams, third‑party dataset curators, and cloud-based operators involved in dataset curation or training.
How a rights holder starts the process
A copyright owner or an authorized agent must file with the district court clerk a proposed subpoena and a sworn declaration stating a subjective good-faith belief that the requester’s works were used and that the subpoena’s sole purpose is rights protection. The ‘subjective good-faith’ threshold is lower than probable cause or a judicial finding; compliance with form and declaration is the trigger for clerical issuance. The statute also limits who may be investigated: a requester may only subpoena records relating to their own copyrighted works, preventing multi-owner fishing in a single request.
Clerk-issued subpoenas and developer compliance
If the proposed subpoena and declaration are in proper form, the clerk must promptly sign and return the subpoena for delivery to the developer; the text gives the clerk an administrative role rather than a gatekeeping adjudicatory one. Upon receiving the subpoena the developer must ‘expeditiously’ disclose the requested copies or identifying records. The statute references the Federal Rules of Civil Procedure for service and enforcement ‘to the greatest extent practicable,’ but leaves open what ‘expeditiously’ means in practice and whether protective procedures (in‑camera review, redaction, or protective orders) are mandatory or discretionary.
Limits on use of produced material and consequences of noncompliance
The requester must keep any disclosed training material confidential unless authorized. The statute ties subpoena mechanics and remedies to existing Federal Rules of Civil Procedure but adds a statutory evidentiary benefit to the requester: a developer’s failure to comply creates a rebuttable presumption that the developer copied the work. That presumption can materially shift litigation dynamics by creasing the evidentiary burden on a noncompliant developer.
Penalty tools for overreaching requesters
If a developer alleges the requester sought a subpoena in bad faith, the court may impose sanctions on the requester; the statute makes Rule 11(c) procedures applicable. This provision is the bill’s attempt to deter abusive discovery—consequences include fees and other penalties—but it conditions sanctions on a successful motion by the subpoena recipient and thus may not be an immediate restraint on initial filings.
Immediate effect and statutory placement
The new section takes effect on enactment and the bill updates the chapter 5 table of sections. There is no phased implementation or regulatory rulemaking period, so the procedure would be immediately available once the statute is in force.
This bill is one of many.
Codify tracks hundreds of bills on Technology across all five countries.
Explore Technology in Codify Search →Who Benefits and Who Bears the Cost
Every bill creates winners and losers. Here's who stands to gain and who bears the cost.
Who Benefits
- Individual authors and visual artists — Gives them a concrete, statutory investigatory tool to verify whether their works were used to train generative models, which can be difficult and expensive to discover under existing doctrines.
- Publishers and rights-holders with large catalogs — Enables targeted inquiries across large, dispersed datasets by compelling dataset curators and model owners to produce identifying records.
- Copyright enforcement counsel — Lowers the barrier to obtaining evidence needed to evaluate infringement claims and to decide whether to file suit or seek settlements.
Who Bears the Cost
- AI developers and model owners (including startups and cloud providers acting as dataset curators) — Must respond ‘expeditiously’ to subpoenas, assemble or search potentially massive training datasets, and risk disclosure of sensitive data; they may incur significant forensic, legal, and compliance costs.
- Dataset curators and third‑party data suppliers — Are likely to face subpoenas even if they did not build the final model, creating contractual friction and compliance expenses, plus potential exposure of commercially valuable curation practices.
- Courts and clerk offices — Are assigned an administrative issuance role and may see increased filings; they will also handle enforcement motions, confidentiality disputes, and potentially numerous Rule 11 challenges.
Key Issues
The Core Tension
The central dilemma is balancing rights holders’ legitimate need to discover whether their works helped train commercial generative models against developers’ countervailing interests in protecting trade secrets, contractual confidentiality, user privacy, and the operational burdens of searching massive datasets; the TRAIN Act resolves that tension in favor of facilitated disclosure and investigatory access, but leaves open whether procedural protections and carve-outs are adequate to prevent misuse or undue cost.
The bill pushes an investigatory remedy into a domain dense with competing legal protections—trade secrets, contracts, privacy laws, and national security rules—without spelling out how conflicts should be resolved. While it requires confidentiality of produced material, the statute does not create a detailed protective-order regime, an in‑camera review process, or explicit carve-outs for trade-secret or personally identifiable information.
Developers will confront hard choices about how much to produce, whether to seek court protection, and how to logistically comply with requests that could implicate terabytes of training data.
Procedurally, the clerk’s ministerial duty to issue the subpoena upon a proper showing shifts initial gatekeeping away from judicial scrutiny. That lowers the barrier for rights holders but raises the risk of fishing expeditions and coercive leverage: the rebuttable presumption for noncompliance is particularly strong leverage in subsequent litigation because it flips an evidentiary burden toward the developer.
The statute’s reliance on vague terms—‘expeditiously,’ ‘records sufficient to identify with certainty,’ and the ‘subjective good faith belief’ standard—creates uncertainty that will generate litigation over the meaning of those terms. Finally, including state and local governments as possible ‘developers’ raises potential sovereign‑immunity or public‑records complications that the bill does not address.
Try it yourself.
Ask a question in plain English, or pick a topic below. Results in seconds.