Towards compromise: A concrete two-tier proposal for foundation models in the EU AI Act

Authors: Rishi Bommasani and Tatsunori Hashimoto and Daniel E. Ho and Marietje Schaake and Percy Liang


The European Union is in the final days of negotiating the AI Act, which would be the world’s first comprehensive legislation on artificial intelligence. Negotiations have stalled as the path forward on foundation models has become murky. The Parliament argues that requirements are necessary, the Commission has proposed a two-tiered approach, and the Council remains divided. Namely, the Spanish presidency is pushing for a two-tiered approach, while the trio of France, Germany, and Italy advocate for “mandatory self-regulation” with minimal requirements.

Beyond the negotiators, we highlight positions across stakeholders:

  • Industry. Foundation model developer Mistral AI has loudly raised concerns about the compliance burden of the AI Act. Other foundation model developers as part of DigitalEurope have argued against a two-tiered regime. In contrast, the broad ecosystem of European small and medium enterprises covered by the European DIGITAL SME Alliance have advocated for requirements on foundation models.
  • Academia. Computer scientist Yoshua Bengio has made clear that requirements are essential, saying “exemption [of foundation models] … goes completely against the European values of protecting the public and even the spirit of doing the AI Act in the first place”. Legal scholar Philipp Hacker has raised similar points in his previous three-layer proposal for foundation models as has philosopher Luciano Floridi.
  • Civil society. Across the ideological spectrum, groups like the Ada Lovelace Institute, the AI Now Institute, Open Future, the Future of Life Institute, and the European creative community all have firmly argued for requirements on foundation models. Several groups favor a tiered approach. Former Estonian president Kersti Kaljulaid writes that a “tiered approach … allows for targeting so that competitors to major AI companies can emerge without onerous restrictions”.

Our position. A comprehensive AI regulatory scheme cannot ignore foundation models: foundation models are at the epicenter of AI, the very basis for public awareness of AI, and are pivotal to the future of AI. Given the significance of foundation models, we argue that pervasive opacity compromises accountability for foundation models. Foundation models and the surrounding ecosystem are insufficiently transparent, with recent evidence showing transparency is deteriorating further. Without sufficient transparency, the EU cannot implement meaningful accountability mechanisms as we cannot govern what we cannot see. Placing all burdens on downstream application developers is myopic: foundation models play an outsized role in the AI value chain and their providers often wield asymmetric bargaining power to shape contracts. With that said, requirements for foundation model providers should provide value while not imposing unnecessary burden.

Therefore, we put forth an exact proposal for how foundation models should be governed under the AI Act. In line with Spain, we designate two tiers: in the generic tier, foundation models are subject to disclosure requirements that improve transparency at minimal marginal compliance cost for companies. And in the high-impact tier, which triggers once foundation models show significant impact in society, more strenuous requirements are imposed. Our proposal interpolates between the positions of different EU negotiators to work towards compromise. Therefore, the proposal represents a potential common ground between the negotiators: it is not our view on what constitutes ideal regulation absent the current political realities for the AI Act negotiation.

The generic tier for foundation models

For all foundation models,1 we recommend transparency-centric requirements for five reasons.

  1. Demonstrated consensus. In spite of some disagreement, we see transparency as the overarching focus for mandatory requirements across the Parliament position, Spanish proposal, and the Italian, German, and French proposal.
  2. Current opacity. The Foundation Model Transparency Index confirms that the foundation model ecosystem suffers from pervasive opacity.
  3. Historic opacity. The history of digital technologies is riddled with harms borne from opacity (ghost work, dark patterns, scandals). Years after those materialized, the EU has taken action via the Digital Services Act and the establishment of the DSA Transparency Database.
  4. Product safety. The EU AI Act’s legislative purpose and legal basis is in product safety regulation under the EU’s new legislative framework; transparency aligns directly with these objectives.
  5. Downstream coordination. Foundation models are general-purpose technologies that power myriad downstream applications across market sectors; the EU recognizes that transparency on these critical intermediary technologies is essential for compliance, safety, and innovation along the value chain.

We recommend the following requirements.2 The requirements we propose are proportional disclosure requirements for any commercial model provider or well-resourced entity. We additionally view exemptions for very low-resource entities (e.g. students, hobbyists, non-commercial academic groups) as important to ensuring compliance burdens remain proportionate with impact. We do not discuss those exemptions in detail here as this proposal is aimed at entities with substantial societal impact.

  1. Registration. Foundation models when placed on the market, put into service or made available in the Union shall be registered in a public EU database alongside high-risk AI systems.
  2. Transparency Report. Foundation model providers shall disclose the following Transparency Report3 on their public website, along all distribution channels for the foundation model, and in the EU database:
    1. Name. Trade name and any additional unambiguous reference allowing the identification of the foundation model.
    2. Data sources. Description of the data sources used in the development of the foundation model.
    3. Data summary. Summary of the data, including the size of data, used in the development of the foundation model.
    4. Compute. The amount of computing power, hardware, and time used in the development of the foundation model.
      • Source: This requirement clarifies the requirement under Amendment 771 item 7 in Annex VIII – Section C of the Parliament position.
      • Note: The amount of computing power should be reported in FLOPs. The amount of hardware should be reported in relevant hardware units (e.g. number of NVIDIA A100-80GB GPUs). Standards-setting bodies should be required to establish standards for how training time should be measured.
    5. Environment.The energy used and emissions emitted in the development of the foundation model OR the owner/provider and location of hardware.
      • Source: This requirement clarifies and expands the requirement under Amendment 771 item 7 in Annex VIII – Section C of the Parliament position.
      • Note: To directly measure energy and emissions will likely require information about the specific data center. Therefore, if providers are not aware of this information (e.g. it is not provided by cloud providers), to facilitate compliance, they can instead disclose who operates the hardware in which location. From this information, along with the amount of hardware used and the duration, reasonable estimates can be computed for energy and emissions.
    6. Model.The size, input modality and output modality of the foundation model.
    7. Model properties.A description of the significant capabilities and limitations of the foundation model.
    8. Risks and mitigations.The reasonably foreseeable risks and mitigations implemented for each of those risks. For any unmitigated risks, an explanation on the reason why they are not mitigated.
      • Source: This requirement, along with item above, is identical to the requirement under Amendment 771 item 6 in Annex VIII – Section C of the Parliament position, with the noted adjustment.
      • Note: Relative to the Parliament position, this phrasing clarifies the correspondence between risks and associated mitigations. It also weakens “cannot be mitigated” to “are not mitigated”.
    9. Evaluations.Description of the model’s performance, including on public benchmarks and state of the art industry benchmarks.
      • Source: This requirement modifies the requirement under Amendment 771 item 8 in Annex VIII – Section C of the Parliament position, replacing the “or” with an “and”.
      • Note: The AI Office and standards-setting bodies, in coordination with concurrent international efforts (e.g. the UK and US AI Institutes), should actively track and incentivize evaluation development, determining minimal standards.
    10. Distribution channels.The distribution channels by which the foundation model is or has been knowingly placed on the market, put into service or made available in the Union.
      • Source: Indicator 68 in the Foundation Model Transparency Index.
      • Note: It is possible that the below disclosures may depend on the distribution channel. If so, the foundation model provider should make this clear.
    11. Member states.The Member States in which the foundation model is or has been placed on the market, put into service, or made available in the Union.
    12. Legal documents.Links to any licenses, terms-of-service, or acceptable use policies that apply to the foundation model.
    13. Uses.The intended, permitted, restricted, and prohibited uses of the foundation model.
    14. Users.The permitted and prohibited users for the foundation model.
  3. In the event that the foundation model is distributed via a distribution channel with a user-facing interface:
    1. User interaction. The user should be aware they are interacting with an AI system.
    2. Machine-generated content.The user should be aware of what content, if any, is machine-generated.

Compliance costs. Critical to the recent discourse around the AI Act has been the compliance burden for foundation model developers. The costs of compliances will depend on the developer, their specific operating environment, and several other factors. In short, we encourage developers arguing that compliance is costly to be precise: what are the specific line items that are especially onerous?

As a third party, we provide our assessment of compliance costs. We make these judgments from our significant expertise on foundation models, even if lacking certain practical context. These costs are primarily aimed at the technical costs of compliance: what work is required to acquire the relevant information to be disclosed? We do not account for the costs of having legal personnel. In turn, these cost assessments may neglect factors that arise in practice for some developers, and therefore may be inaccurate. Nonetheless, we provide them to help ground the discourse on compliance burden from perspectives beyond model developers, whose views on cost cannot be decoupled from their self-interest to minimize regulatory burden.

We argue the aforementioned requirements impose fairly minimal marginal compliance cost: the cost to a foundation model provider, over the set of basic practices most are already doing for their own internal purposes or for their clients, is small in our judgment. To make this argument crisp, we informally “price” each requirement for its marginal compliance cost. However, we cannot provide a fully precise compliance cost assessment, because the costs of compliance will depend on the legal interpretation: how implementing acts and subsequent standard-setting clarify the minimal expectations for compliance will substantially shape compliance costs.

  1. Registration. The costs of registration are largely determined by the complexity of the registration form and any processing time/bureaucracy. We recommend a digitized process using a structured form with no subsequent approvals to keep costs very low.
  2. Transparency Report.
    1. Name. Minimal cost. Note: some foundation model providers do not rigorously version models at present.
    2. Data sources. Some cost. To develop the foundation model, developers must select data sources and should already record this info. We acknowledge disclosure of this information may require legal review given the potential exposure to liability.
    3. Data summary. Some cost. We expect the cost of summary will go down considerably by the time the AI Act goes into force due to the development of more robust data summarization tooling.
    4. Compute. Minimal cost. Compute information is likely tracked by model providers to make core business decisions (e.g. to price the costs of computation).
    5. Environment. Minimal cost. Model providers know their compute providers, though there could be challenges for hardware location in some cases.
    6. Model. Minimal cost. Very basic information that all developers know.
    7. Model properties. Minimal cost. Very basic information that all developers know and should already report for sensible downstream use.
    8. Risks and mitigations. Some cost. Developers should report this information to coordinate with downstream actors to ensure risks are mitigated across the value chain.
    9. Evaluations. Minimal cost. We expect the cost of evaluations will go down considerably by the time the AI Act goes into force due to the development of third-party solutions for low-cost evaluations. In particular, we emphasize that the model developers already conduct evaluations internally to make business decisions and evaluations are expected by downstream actors to sensibly choose between models.
    10. Distribution channels. Minimal cost. Very basic information.
    11. Member states. Minimal cost. Very basic information.
    12. Legal documents. Minimal cost. Very basic information.
    13. Uses. Minimal cost. Very basic information.
    14. Users. Minimal cost. Very basic information.
  3. In the event that the foundation model is distributed via a distribution channel with a user-facing interface:
    1. User interaction. Minimal cost. The exact cost may depend upon the precise standards for this disclosure.
    2. Machine-generated content. Minimal cost. The exact cost may depend upon the precise standards for this disclosure.

The high-impact tier for the most regulated models

In addition to the requirements for the generic tier, foundation models that demonstrate significant societal impact4 warrant greater scrutiny and should meet higher standards.

We recommend the following requirements:

  1. Expanded Transparency Report. Providers of high-impact foundation models shall disclose the following additional information in Transparency Reports on their public website, along all distribution channels for the foundation model, and in the EU database:
    1. Model behavior policy. The model behaviors that are permitted, restricted, and prohibited.
    2. Risk evaluations. Description of the model’s performance on a set of risk categories determined by the AI Office.
      • Source: Indicators 52 and 54 in the Foundation Model Transparency Index.
      • Note: The AI Office should be instructed to create templates for disclosing this information, including the specification of risk categories and acceptable evaluations for each category.
    3. Risk management. Risk management and governance policies, including for example accountability and governance processes to identify, assess, prevent, and address risks, where feasible throughout the AI lifecycle.
  2. Data governance. Providers of high-impact foundation models shall process and incorporate only datasets that are subject to appropriate data governance measures for foundation models, in particular measures to examine the suitability of the data sources and possible biases and appropriate mitigation.
  3. Energy efficiency. Providers of high-impact foundation models shall design and develop the foundation model, making use of applicable standards to reduce energy use, resource use and waste, as well as to increase energy efficiency, and the overall efficiency of the system. This shall be without prejudice to relevant existing Union and national law and this obligation shall not apply before the standards referred to in Article 40 are published. They shall be designed with capabilities enabling the measurement and logging of the consumption of energy and resources, and, where technically feasible, other environmental impacts the deployment and use of the systems may have over their entire lifecycle.
  4. Cybersecurity. Providers of high-impact foundation models shall implement operational security measures for information security and appropriate cyber/physical access controls to secure model weights, algorithms, servers, and datasets. Operational security measures include monitoring dependencies on external code to protect against cybersecurity vulnerabilities.
  5. Internal red-teaming. Providers of high-impact foundation models shall adversarially evaluate models and report results of the foundation model’s performance in relevant red-team testing to the AI Office.
  6. External auditing. Providers of high-impact foundation models shall provide sufficient model access, as determined by the AI Office, to approved independent third-party auditors.
    • Source: The Spanish position on the tiered approach.
    • Note: The AI Office should create and maintain a certification process for determining if auditors are sufficiently independent and qualified to audit high-impact foundation models. The Office should consult with both the foundation model providers and third-party auditors in determining the appropriate level of model access required on a case-by-case basis.
  7. Law-abiding generated content. Providers of high-impact foundation models shall train, and where applicable, design and develop the foundation model in such a way as to ensure adequate safeguards against the generation of content in breach of Union law in line with the generally acknowledged state of the art, and without prejudice to fundamental rights, including the freedom of expression.
  8. Adverse event reporting. Report adverse events or other harms to a centralized government reporting platform and provide affordances for third-parties or users to similarly report such events.

Relationship with scientific literature on foundation models and AI. The transparency requirements proposed draw inspiration from several resources across the scientific literature, but most directly three in particular: model cards, ecosystem cards, and the Foundation Model Transparency Index. These three works are natural sources of inspiration: model cards as defined by Mitchell et al. (2018)5 pioneered the literature on transparency for machine learning models, ecosystem cards refined the approach to the setting of foundation models, and the Transparency Index characterized the current market state as of October 2023. The requirements beyond transparency draw inspiration from the scientific literature on data governance, energy efficiency, evaluations, model access, auditing, and adverse event reporting.

Relationship with concurrent governance approaches to foundation models. The requirements stated draw direct inspiration from the Parliament position, the G7’s voluntary code of conduct, the US Executive Order, and scientific research. For the generic tier, the transparency requirements are the transparency requirements from the Parliament position, with slight modifications to improve clarity and precision. For the high-impact tier, the requirements adjust the Parliament position, replacing some of the more substantive requirements in the position for other high-value matters that increase overall understanding of risk: internal red-teaming, third-party auditing and adverse event reporting. In particular, the focus on the three matters can be directly traced to the US Executive order (internal red-teaming) and Guha et al (third-party auditing and adverse event reporting).

Compliance costs. At present, we cannot confidently price the compliance costs for this tier. However, we highlight that at present high-impact foundation models are themselves quite costly to build and deploy at scale. Consequently, we expect that the compliance burdens for this tier are likely to be of marginal cost relative to the costs of building/maintaining high-impact foundation models.

How to tier

As we discuss in our previous post, many approaches can be considered for designing tiers. Our fundamental beliefs are that (i) the core basis for governments to apply scrutiny is impact and (ii) the immense uncertainty for foundation models points to legislative caution.

To that end, we recommend the following as the approach to tiers:

  1. The legislative text should indicate that the threshold for the high-impact tier should be based on measures of demonstrated societal/systemic impact.
  2. The legislative text should defer the determination of the threshold to the AI Office, which should be empowered to consult external experts and stakeholders. We note that deferring to standards-setting bodies should proceed with caution, given the standards-setting process may be largely driven by a small cadre of industry actors.
  3. The legislative test should empower the AI Office to rapidly change thresholds over time.
  4. The EU should ensure the AI Office is sufficiently well-resourced to conduct this work, including ensuring sufficient technical expertise.
  5. The legislative text may acknowledge other non-impact measures as alternatives in the event no satisfactory impact quantity can be effectively measured by the time the EU AI Act would go into force around 2025-2026.

At present, we describe some concrete quantities that are surrogates for impact:

  1. The number of customers or entities paying for the foundation model
  2. The number of downstream AI applications
  3. The number of downstream high-risk AI systems
  4. The aggregate number of users across all downstream AI systems
  5. The aggregate number of users across all downstream high-risk AI systems
  6. The number of high-risk areas (e.g. categories under Annex III) covered by downstream high-risk AI systems.
  7. The number of queries to the foundation model.

Of these, 1 should be directly known by all foundation model providers. 3, 5, and 6 could be tracked via linking the registration requirements for foundation models and high-risk AI systems: every high-risk AI system provider would have to declare which, if any, foundation models their high-risk AI system depends upon. The remainder would either require coordination between foundation model providers and distribution channels, or more active market surveillance (e.g. akin to the UK CMA’s efforts) by bodies like the EU AI Office.

We acknowledge that some of these quantities are more difficult to track for open foundation models at present, but we believe societal infrastructure can correct for this. In particular, if downstream developers are required to declare dependencies on (all) foundation models, this would enable the foundation model providers, the EU government, and the public to easily track their downstream impact. As an instructive example, consider scientific papers. Scientific papers are released openly: the author of a scientific paper would find it very difficult, if not impossible, to directly track the use and uptake of their work. However, scientific papers declare (via citation) which papers they depend upon, allowing for centralized tracking (e.g. by Google Scholar) to publicly record the downstream impact (measured in citations) for all papers.

Finally, we do not make a precise judgment of what current level of impact would make sense for the high-impact tier. In particular, we note that the current opacity on the impact of different foundation model providers makes it difficult to be precise. With that said, we remind the EU of the DSA: grounding out foundation models to the way they shape the lives of the EU citizenry, and the scale/nature of the impact on the EU citizens, is precisely how tiers should be drawn.

Conclusion

We provide a concrete proposal to ground the discourse in the AI Act negotiations. Too often, AI Act discourse devolves into speculation on phantom societal risks and phantom compliance costs. At this critical juncture, there is no time to waste: we need careful cost-benefit analyses. Finalizing the AI Act will require thoughtful political negotiation, weighing the interest of different stakeholders. We are hopeful the EU will achieve political compromise on the AI Act, setting a powerful precedent for the world on how to govern AI.

Authors

Rishi Bommasani is the Society Lead at the Stanford Center for Research on Foundation Models (CRFM). He co-led the report that first introduced and defined foundation models: his research addresses the societal impact of foundation models spanning evaluations, supply chain monitoring, transparency, tiers, open models, policy, and the EU AI Act.

Tatsunori Hashimoto is an Assistant Professor of Computer Science at Stanford University.

Daniel E. Ho is the Director of the Stanford Regulation, Evaluation, and Governance Lab (RegLab), Senior Fellow at the Stanford Institute for Human-Centered Artificial Intelligence, and the William Benjamin Scott and Luna M. Scott Professor of Law and a Professor of Political Science at Stanford University. He serves on the US’s National Artificial Intelligence Advisory Commission.

Marietje Schaake is the International Policy Director at the Stanford Cyber Policy Center and the International Policy Fellow at the Stanford Institute for Human-Centered Artificial Intelligence. She served as a Member of European Parliament from 2009 to 2019, where she focused on trade, foreign affairs, and technology policies. She serves on the UN’s AI Advisory Body.

Percy Liang is the Director of the Stanford Center for Research on Foundation Models (CRFM), Senior Fellow at the Stanford Institute for Human-Centered Artificial Intelligence, and an Associate Professor of Computer Science at Stanford University. He co-led the report that first introduced and defined foundation models.

Acknowledgements

We thank Arvind Narayanan, Daniel Zhang, Russell Wald, and Sayash Kapoor for their comments on this piece as well as Ashwin Ramaswami, Aviv Ovadya, Christie Lawrence, Connor Dunlop, Helen Toner, Florence G’Sell, Irene Solaiman, Judy Shen, Kevin Klyman, Markus Anderjlung, Neel Guha, Owen Larter, Peter Cihon, Peter Henderson, Risto Uuk, Rob Reich, Sanna Ali, Shayne Longpre, Steven Cao, Yacine Jernite, and Yo Shavit for discussions on this matter.

Citation

@misc{bommasani2023eu-compromise, 
    author = {Rishi Bommasani and Tatsunori Hashimoto and Daniel  E. Ho and Marietje Schaake and Percy Liang}, 
    title  = {Towards compromise: A concrete two-tier proposal for foundation models in the EU AI Act}, 
    url    = {https://crfm.stanford.edu/2023/12/01/ai-act-compromise.html}, 
    year   = {2023}
}

Footnotes

  1. Definitions of and updates to foundation models. The AI Office should be instructed to clarify and publish the criteria for determining (i) if a model is a foundation model and (ii) if a model derived from a foundation model is still a foundation model. 

  2. Omitted details. We note that we deliberately omit certain mechanical details to emphasize the substance. For example, Amendment 771 of the Parliament position requires “Name, address and contact details of the provider”, which we will also recommend but elide for simplicity. 

  3. Strengthening the G7’s transparency reports. The G7 Code of Conduct indicates that transparency reports should be prepared. 

  4. Relevance under alternative tiering approaches. In the event that greater scrutiny is placed on foundation models for a different reason than demonstrated impact, these requirements should be revisited. With that said, the principles of meaningful forms of additional scrutiny may generalize: these requirements are likely appropriate for several alternative two-tier schemes. 

  5. Model cards is a vacuous concept. We note that model cards today are used more loosely to describe any form of documentation; many model cards today do not include all the fields in the original proposal of Mitchell et al.