Beyond “Release” vs. “Not Release”

Author: Girish Sastry

In response to “On the Opportunities and Risks of Foundation Models” (Bommasani et al., 2021)

Much of the discussion around release options for foundation models has relied on a distinction between “release” and “not release.”1 It’s worth asking what the different options are and whether society can capitalize on the benefits of openness while maintaining advantages associated with being more closed. Put another way: are there ways to have our cake and eat it too?

There may be a vast and underexplored design space of release options, and this seems like a socio-technical design problem that benefits from diverse expertise. Academia could be well-positioned to make progress on it.


Centralized cloud-based AI APIs are one general way to allow users to access the capabilities of models while maintaining some control over potential hazards.2 Even within the narrow goal of exposing capabilities, there are many design and implementation options. The OpenAI API, for example, allows users to interact with large generative language models in a standard “text in - text out” fashion. Other APIs might expose different capabilities: for example, the Cohere API allows users to retrieve embeddings of text and operate from there; the idea being that embeddings open up wider use cases than generating text from a prompt.

It may be, then, that further exposing the guts of a model could enable even more use cases. For researchers, that could mean access to the raw weights: empowering them to study, critique, and improve models. For auditors, access to the weights could mean the ability to verify claims about what the raw model can and can’t do. But the downside is that the more capable models become, the more that misuse and accidental misbehavior are consequential. Today, that accidental misbehavior consists of spouting hate speech. In the future, these accidents could involve code generation tools creating subtle but exploitable security holes or deceiving users in high-stakes situations.

If models become more capable and the costs of accidents and misuse rise, then exposing the guts of the model requires the user to be trusted. Furthermore, the more that model internals are available to end-users, the more the risk of model stealing attacks may rise,3 which may defeat the initial goal of constraining misuse. These considerations point towards a research goal of designing technical and non-technical systems that have layers of trust, like the idea of role-based access controls in computer security.

Enabling access to capabilities while preventing misuse and accidents is just one design goal. To create healthy governance foundations it’s important to enable mechanisms for auditing – and ways to audit the audits. Should there be white-listed auditors with full access to model internals? How would we ensure that they don’t misuse access while opening this up to a broad pool of people? Perhaps legal mechanisms such as licenses, or strong social norms around responsibility can help. To ensure that people follow those mechanisms, red teams could be employed to poke holes in these systems and assumptions. Another option is to treat model internals like we do healthcare datasets – allowing researchers to study these by remotely submitting analysis jobs, which could make models difficult to misuse.

Zooming out further, the system that hosts the API does not even necessarily need to be hosted centrally. Perhaps the highest-trust and most isolated part of the system could be deployed on the end-user’s hardware, allowing them to adapt the model to their needs while protecting from security risks (for example, by interfering with other users’ data). If models are decomposable as different expert models – as the interpretability section of the foundation models paper speculates – then that could enable a principle of restricted access: users could be guaranteed to only access certain experts, while still retaining the knowledge and capabilities of the general model.

There is a vast design space

The foundation models paper opens by distinguishing between research on foundation models and deployment of foundation models. But there’s also room for novel and important research about deployment for these models. This research can draw upon many different fields of expertise – law, political science, economics, computer science, and so on. Part of this project is about articulating the design goals for deployment – like serving customers, enabling auditing, reproducibility – but is also about designing mechanisms to achieve those goals and having assurance in those mechanisms.

Academia is excellent at solving novel problems. It has an important role to play as foundation models become built and released. Consider, for example, the history of systems research, or cybersecurity. Prototype systems or ideas in academia could serve as the basis for new and different modes of deployment and access.

It’s still early days in the paradigm of foundation models. Down the line, as AI capabilities become more powerful, ideas to govern access could be important for maximizing the upside of AI while avoiding risks. And so they are well worth researching.


  1. An example quote from the report: “Some models (e.g., GPT-3) are not released at all (only API access to a limited pool of people). Even datasets (e.g., for GPT-2) are not released. While trained models may be available (e.g., BERT), the actual training of foundation models is unavailable to the vast majority of AI researchers, due to the much higher computational cost and the complex engineering requirements.” 

  2. By “models” we mean to include foundation models and models derived from them. 

  3. Section 4.7 of the foundation models paper, on Security and Privacy, discusses this possibility more. 


  3. Structured Access to AI capabilities; Toby Shevlane. Forthcoming in the Oxford Handbook on AI Governance. Oxford University Press.


Thanks to Toby Shevlane, Miles Brundage, Gretchen Krueger, and Rosie Campbell for feedback.