Codify — Article

California AI Transparency Act: Definitions that shape scope and compliance

AB 853 sets the vocabulary—who counts as an AI maker, a hosting site, or a platform and what counts as provenance or personal data—deciding who falls inside future transparency rules.

The Brief

AB 853 (California AI Transparency Act) supplies the statutory definitions that will determine who must follow California’s forthcoming AI transparency rules and what evidence those rules will rely on. The section provided focuses on core terms: what counts as an AI or generative AI system, how providers and platforms are categorized, what provenance data is, and how capture devices and signatures are treated.

Why this matters: definitions are law. The choices in AB 853 draw the borders of regulatory reach—who is obligated, what data must be tracked or embedded in content, and which technical practices (embedding provenance, distinguishing identifiable device data, handling downloads of model artifacts) will create compliance workstreams for developers, platforms, and device makers.

Close reading of these definitions reveals likely friction points and enforcement challenges even before any operative duties appear elsewhere in the statute.

At a Glance

What It Does

This section defines key terms that will determine the statute’s scope, including categories for AI systems, generative systems, hosting sites, large platforms, capture devices, and different kinds of provenance data. It separates provenance that is personally identifying from provenance intended to convey system or authenticity details.

Who It Affects

The definitions target creators of generative AI, websites and apps that make models or code available, operators of large public platforms, and manufacturers of cameras and recording devices—alongside any downstream entities that would be required to collect or embed provenance metadata.

Why It Matters

Because these are definitional rules, small changes would expand or shrink the universe of regulated actors. The bill ties regulatory reach to measures like user/visitor thresholds and to technical concepts like embedded provenance and 'manifest' vs 'latent' recognizability, which will shape both legal compliance and engineering design.

More articles like this one.

A weekly email with all the latest developments on this topic.

Unsubscribe anytime.

What This Bill Actually Does

AB 853 is primarily a definitions section that lays the groundwork for later operative obligations. It describes AI and generative AI in functional terms—systems that produce outputs influencing environments and systems that generate synthetic content resembling their training data.

The statute also introduces provenance concepts: metadata or embedded markers intended to establish authenticity, origin, or modification history.

The bill draws a distinction between provenance that can be tied to a person and provenance that cannot. Personal provenance covers either personal information broadly or device/system identifiers that can reasonably be linked to an individual; nonpersonal or system provenance covers technical details about the generating device or indicators meant to vouch for content authenticity.

The text also excludes information placed inside a digital signature from the definition of personal provenance, signaling a narrow role for cryptographic attestation separate from other embedded provenance markers.AB 853 separately defines categories of actors and distribution channels. It specifies which kinds of websites or apps will be considered hosts for generative models and which public-facing platforms will count as large distribution systems.

Capture devices are defined broadly (cameras, phones, microphones, recorders) and manufacturers of those devices are included, with a carve-out for entities that only assemble devices. Finally, the statute clarifies terminology—latent versus manifest—for use in later assessments of whether content characteristics are obvious to a human viewer.Read together, these definitions push the policy debate into technical territory: how to measure user reach, how to implement provenance embedding that respects privacy, how to treat cryptographic attestations, and how to apply human-recognition standards in automated contexts.

The section doesn't create duties by itself, but it sets the legal vocabulary that any compliance program will need to follow and gives regulators a set of bright-line and operational concepts to enforce.

The Five Things You Need to Know

1

The bill defines a “covered provider” as a creator of a generative AI system that has over 1,000,000 monthly visitors or users and is publicly accessible in California.

2

A “large online platform” is defined by a distribution-based test: platforms that distributed content to users who did not create it and that exceeded 2,000,000 unique monthly users in the prior 12 months, with narrow exclusions for broadband and telecommunications services.

3

A “GenAI hosting platform” reaches sites or apps that make a model’s source code or model weights available for download by a state resident, regardless of whether the download is paid or unpaid.

4

The statute treats “personal provenance data” as either personal information or unique device/system identifiers reasonably capable of being associated with a particular user, but explicitly excludes data contained within a cryptographic “digital signature.”, “Capture device” is defined broadly to include cameras, mobile phones with cameras or microphones, and voice recorders, and the category “capture device manufacturer” excludes entities that only assemble devices.

Section-by-Section Breakdown

Every bill we cover gets an analysis of its key sections. Expand all ↓

22757.1(a), (f)

Working definitions for AI and generative systems

These clauses anchor the statute in functionality: 'AI' is defined by its ability to infer from inputs to influence environments, and 'generative AI' is tied to systems that create synthetic content reflecting their training data. That framing prepares the ground to regulate outputs (synthetic images, text, audio) rather than narrowly regulating particular architectures. For compliance teams, this means product features that generate content—regardless of model design—will likely fall within the regulatory perimeter if other thresholds are met.

22757.1(d), (g), (h)

Actor categories and distribution channels

The statute separates creators of generative systems from platforms that host or distribute them, and it uses user/visitor thresholds to differentiate covered actors from smaller players. It also creates a category for hosting sites that make models or their binaries available for download by residents of the state. These distinctions will be central to any future duties: whether you are the originator of a model, a host of model artifacts, or a distributor of AI-generated content will determine different compliance obligations and legal exposure.

22757.1(n), (o), (p), (e)

Provenance taxonomy and the role of digital signatures

Provenance is split into 'personal' and 'system' flavors: the former contains data that can identify a person, the latter focuses on non-identifying technical markers and authenticity signals. The separate definition of digital signature—and its explicit exclusion from personal provenance—signals that cryptographic attestations are intended to be handled differently from embedded metadata that may carry personal identifiers. Practically, organizations will need to plan for both privacy-safe system provenance markers and for separate cryptographic signing workflows.

2 more sections
22757.1(b), (c)

Capture devices and manufacturer carve‑outs

Capture devices are defined expansively to include everyday consumer hardware; manufacturers of those devices are captured unless their role is limited to assembly. That carve-out creates a narrow manufacturing safe harbor but leaves original device makers within scope. If later provisions require provenance embedding at the point of capture or hardware‑level support for authenticity metadata, device makers (not assemblers) will have the primary engineering and compliance burden.

22757.1(i), (j), (m), (l)

Terms for observability and data types

The section defines 'latent' vs 'manifest' to create a human-recognizability standard and reuses standard privacy vocabulary for 'personal information' and 'metadata.' These definitional tools will be used to set disclosure triggers or labeling standards later: for example, whether a synthetic attribute is 'manifest' to an ordinary person could determine whether a transparency notice is required, while metadata language sets the baseline for what counts as embedded provenance.

At scale

This bill is one of many.

Codify tracks hundreds of bills on Technology across all five countries.

Explore Technology in Codify Search →

Who Benefits and Who Bears the Cost

Every bill creates winners and losers. Here's who stands to gain and who bears the cost.

Who Benefits

  • Privacy-conscious consumers and recipients of content — Clear definitions of personal provenance and the exclusion of digital signatures provide a framework that can limit inadvertent disclosure of identifying device data, helping users keep sensitive identifiers out of embedded provenance markers.
  • Journalists, researchers, and verification organizations — A statutory concept of provenance and system-level authenticity signals gives these actors a clearer legal basis to demand and interpret provenance information when investigating synthetic content.
  • State regulators and policymakers — The definitions give regulators concrete levers (actor categories, provenance taxonomies, observability standards) to design enforceable duties without having to invent technical terms from scratch.

Who Bears the Cost

  • Developers and companies creating generative models — Being categorized as a covered provider or host triggers future compliance obligations (recordkeeping, provenance embedding, reporting) and imposes engineering and governance costs to track training data and outputs.
  • Hosting platforms and major social platforms — Platforms that distribute or enable downloads of models will face operational costs measuring reach, policing availability to state residents, and implementing provenance mechanisms; they may also need new legal and compliance teams.
  • Capture device manufacturers (non‑assemblers) — If later rules require provenance support at capture time, original equipment manufacturers will need to design firmware or hardware features to tag media, increasing manufacturing complexity and after‑sale support obligations.
  • Privacy officers and legal teams — The split between personal and system provenance and the digital signature carve-out will create interpretive work: teams must decide what counts as 'reasonably capable of being associated' and draft policies balancing transparency with privacy.

Key Issues

The Core Tension

The bill balances two legitimate goals—transparency/authenticity of synthetic content and protection of individual privacy and innovation—but does so through definitions that tighten enforcement on some actors while leaving subjective standards (like 'reasonably capable' or 'manifest') for later interpretation; the central dilemma is how to require meaningful provenance without forcing corporations to either over-collect personal identifiers or to choke off open innovation through heavy technical obligations.

The definitions are precise in places and deliberately elastic in others. They supply numeric and categorical gateways for regulatory reach while leaving ambiguous standards—like what it means for data to be 'reasonably capable' of association with a user or for content characteristics to be 'manifest'—for future interpretation.

That design keeps enforcement flexible but shifts a lot of rule‑making into implementation: agencies, industry consortia, or courts will have to translate these phrases into technical tests, measurement windows, and detection methods.

Technical integration raises thorny operational problems. Embedding provenance in media that is cropped, transcoded, or compressed across platforms is nontrivial; relying on metadata alone is fragile, and excluding digital signatures from 'personal provenance' may create mixed signals about reliance on cryptographic attestations versus persistent metadata.

The hosting‑by‑download construct also sweeps in cross‑border questions: a platform outside the state that allows a California resident to download model artifacts may be drawn into compliance even if the operator never intended a California market. Finally, the capture device manufacturer carve‑out for assemblers creates a narrow technical liability line that could be litigated in disputes about who must provide provenance support in complex supply chains.

Try it yourself.

Ask a question in plain English, or pick a topic below. Results in seconds.