This chapter of the California AI Transparency Act is a definitions section that sets the boundaries for the rest of the law: what counts as “artificial intelligence,” which generative-AI systems are in scope, who is a covered provider, what counts as a large online platform, and how provenance information is categorized. It also defines capture devices and manufacturers, enumerates a short list of "minor modifications," and ties provenance to interoperable standards and cryptographic digital signatures.
Those definitional choices matter because they do the heavy lifting of inclusion and exclusion. The bill hooks coverage to numeric thresholds (1,000,000 monthly users for GenAI creators; 2,000,000 unique monthly users for large platforms), establishes a residency-based reach for hosting platforms that make model weights or source code downloadable, and distinguishes “personal” provenance from “system” provenance — a distinction that will determine when personal data protections apply.
Absent obligations in the text provided, these definitions are the critical levers that will determine which entities must later report, disclose, or embed provenance information and how California law will interact with cross-border platforms and open-source repositories.
At a Glance
What It Does
The chapter supplies precise definitions that determine scope: it defines generative AI, provenance categories (personal vs system), capture devices and manufacturers, and two numeric thresholds for covered entities. It also requires provenance to be in formats interoperable with widely adopted standards and recognizes cryptographic digital signatures.
Who It Affects
The definitions target generative-AI developers with large user bases (over 1,000,000 monthly users), big platforms that distribute third-party content (over 2,000,000 unique monthly users), hosts that make model source code or weights downloadable by California residents, capture-device manufacturers, and anybody responsible for producing or preserving provenance metadata.
Why It Matters
By locking in thresholds, residency hooks, and a standards-first approach to provenance, the chapter shapes the universe of future obligations: who California can regulate, what information must travel with synthetic content, and how privacy-sensitive provenance will be carved out from system-level provenance.
More articles like this one.
A weekly email with all the latest developments on this topic.
What This Bill Actually Does
The text is a focused definitions section rather than an operative compliance regime. Its most consequential moves are threefold: it draws a bright-line definition of generative AI as systems that produce synthetic content by emulating the structure and characteristics of their training data; it erects numeric thresholds that decide which entities become "covered"; and it builds a two-track model for provenance — one track that can contain personal data and another limited to system-level signals.
The chapter treats hosting and distribution as enforcement anchors. A "GenAI hosting platform" is any site or app that makes model weights or source code available for download to a state resident, regardless of whether the model is monetized.
That residency-based hook is novel: it targets the act of making models downloadable into California rather than only looking at where a company is headquartered or where its servers sit. Meanwhile, a "covered provider" is the creator or coder of a GenAI system that reaches over 1,000,000 monthly visitors or users and is publicly accessible in the state — a reach metric that will be key to compliance decisions.Provenance gets technical treatment.
The bill requires provenance data to be embedded in content or included in metadata in formats interoperable with "widely adopted specifications" from an "established standards-setting body." It then splits provenance into "personal" (containing personal information or unique device/service identifiers) and "system" (information about device type or content authenticity that cannot be tied to an individual). Notably, the statute excludes information in a digital signature from the personal-provenance bucket, signaling that cryptographic attestations may be treated differently from device identifiers or PII.Other definitional choices create both limits and gaps.
The bill lists specific image and audio transformations it calls "minor modifications" — common edits like cropping, resizing, format conversion, and denoising — which suggests those operations won't defeat provenance signals or trigger extra obligations. The law also defines "capture device manufacturer" to include producers of devices sold in California but excludes entities that only assemble devices, leaving a potential compliance gap for contract manufacturers or white-label assemblers.
The Five Things You Need to Know
A "covered provider" is a person that creates or codes a GenAI system with more than 1,000,000 monthly visitors or users that is publicly accessible in California.
A "large online platform" is any public-facing social media, file-sharing, mass messaging, or search service that exceeded 2,000,000 unique monthly users in the prior 12 months, with two narrow telecom exceptions.
A "GenAI hosting platform" catches websites or apps that make model source code or model weights available for download to a California resident, and it applies regardless of whether the download is paid or free.
The statute separates "personal provenance data" (PII or unique device/service identifiers) from "system provenance data" (device-type or content-authenticity signals that cannot be linked to a person) and explicitly excludes data inside a digital signature from the personal category.
The bill lists nine "minor modifications" (e.g.
resizing, cropping, denoising, format conversion) that are treated as non-substantive edits for provenance purposes.
Section-by-Section Breakdown
Every bill we cover gets an analysis of its key sections.
Definition of Artificial Intelligence
The bill defines AI broadly as engineered systems that infer from inputs how to generate outputs that can influence environments. That phrasing covers both decision-making models and content-generation models, and it is intentionally agnostic about internal architecture. Practically, the scope will capture systems from rule-based automation to deep-learning models so long as they infer outputs that influence physical or virtual environments.
Covered Provider and Generative AI
A "covered provider" is any person who creates or codes a GenAI system that has over 1,000,000 monthly visitors or users and is publicly accessible within California. "Generative AI" is defined by its ability to produce synthetic content that emulates its training data. Together, these clauses create a usage-driven threshold: small projects and narrowly deployed models fall out of scope, while high-traffic public tools fall in. The public-accessibility and monthly-usage metrics are decisive — but the bill leaves methods for calculating those metrics undefined here.
GenAI Hosting Platform (Residency Hook)
The hosting-platform definition reaches platforms that make model weights or source code downloadable by a California resident, regardless of compensation. This is a jurisdictional design that focuses on the act of distribution into the state rather than the registrant's domicile. It potentially captures open-source repositories and model hubs when downloads are performed by California residents, expanding regulatory reach into decentralized and free software ecosystems.
Large Online Platform Threshold and Exclusions
The bill treats large online platforms separately, setting a 2,000,000 unique monthly user threshold and excluding broadband ISPs and telecommunications services. That draws a line between content-distributing services (social media, search, file sharing) and core connectivity providers, signaling the legislature's intent to target distribution channels rather than pipes.
Provenance: Personal vs System and Standards Requirement
Provenance must be embedded or included in metadata in a format interoperable with widely adopted standards from an established standards body. The statute then bifurcates provenance into "personal" (contains personal info or unique device/service identifiers) and "system" (non-identifying data about device type or authenticity). Excluding digital signatures from the personal category suggests the bill favors cryptographic attestations for authenticity while treating other device-level identifiers as personal data requiring privacy safeguards.
Capture Devices, Manufacturers, and Minor Modifications
The law defines capture devices broadly (phones, cameras, voice recorders) and calls out capture device manufacturers as those who produce devices for sale in California, excluding entities that only assemble devices. The "minor modification" list enumerates common image/audio edits (brightness, cropping, denoising, format changes), likely signaling which edits should not be treated as altering provenance in a way that defeats disclosure or authenticity mechanisms.
This bill is one of many.
Codify tracks hundreds of bills on Technology across all five countries.
Explore Technology in Codify Search →Who Benefits and Who Bears the Cost
Every bill creates winners and losers. Here's who stands to gain and who bears the cost.
Who Benefits
- California residents who consume or are depicted in synthetic content — the distinction between personal and system provenance creates a pathway for provenance tied to privacy protections and clearer attribution when content is generated or altered.
- Digital forensics and media-verification services — the bill’s standards-oriented provenance model and the explicit acceptance of digital signatures make it easier for forensic tools and verification platforms to design interoperable attestations.
- State regulators and compliance teams — numeric thresholds and clear terminology reduce legal ambiguity about who falls under California’s regulatory reach, enabling targeted enforcement and guidance drafting.
Who Bears the Cost
- Large GenAI creators that exceed 1,000,000 monthly users — they will fall into the covered-provider bucket and will need to track exposure metrics, provenance packaging, and potentially conform to standards that are yet to be specified.
- GenAI hosting platforms and open-source repositories — because the hosting definition is triggered by downloads by California residents and applies regardless of payment, these platforms may need to assess residency-based access and apply controls or disclosures.
- Platform compliance and privacy teams at services exceeding 2,000,000 users — they must audit how content is labeled, how provenance flows with shared content, and how minor modifications are treated across their stacks.
Key Issues
The Core Tension
The central tension is between transparency and privacy/compliance burden: the statute aims to make synthetic content verifiable and accountable by embedding provenance and setting clear coverage rules, but doing so risks exposing device-level identifiers, forcing intrusive logging to determine residency, and imposing heavy technical and operational costs on platforms and hosts — all before a single implementation standard exists.
The definitions leave key operational questions unanswered and create trade-offs that will shape compliance costs and privacy outcomes. Measuring "monthly visitors or users" and "unique monthly users" across distributed services is technically fraught: does the bill mean authenticated users, unique IPs, device fingerprints, or some other metric?
Absent measurement rules, providers will face uncertainty and likely conservative approaches that inflate compliance scope.
The residency-based hook for hosting platforms is enforcement-forward but administratively heavy. Determining whether a download was performed by a California resident requires either reliable geolocation, user attestations, or logging that itself may implicate privacy laws.
Requiring hosts to track that could push some operators to geoblock California or impose friction on downloads. Likewise, treating device or service identifiers as personal provenance data risks turning provenance into a vehicle for tracking; the explicit carve-out for digital signatures mitigates this, but it also raises questions about how to preserve content authenticity without exposing device-level identifiers.
Finally, the standards-first approach to provenance points to a dependence on external standards bodies. If those bodies move slowly, the statute’s interoperability objective could create long periods of regulatory uncertainty.
Conversely, if California endorses a particular standard quickly, implementers might face lock-in to a technical approach that other jurisdictions do not follow, creating cross-border friction for multi-jurisdictional platforms.
Try it yourself.
Ask a question in plain English, or pick a topic below. Results in seconds.