Analyzing the benefits and risks of foundation models with widely available weights
Developers of closed foundation models exercise unilateral control in determining what is and is not acceptable model behavior. Given that foundation models increasingly intermediate critical societal processes, much as social media platforms do today, the definition of what is acceptable model behavior is a consequential decision that should take into account the views of stakeholders and the context where the model is applied. In contrast, while developers may initially specify and control how the model responds to user queries, downstream developers who use open foundation models can modify them to specify alternative behavior. Open foundation models allow for greater diversity in defining what model behavior is acceptable, whereas closed foundation models implicitly impose a monolithic view that is determined unilaterally by the foundation model developer.
Since open foundation models can be more aggressively customized, they better support
innovation across a range of applications.
In particular, since adaptation and inference can be performed locally, application
developers can more easily adapt or fine-tune models on large proprietary datasets
without data protection and privacy concerns. Similarly, the customizability of open
models allows improvements such as furthering the state-of-the-art across different
languages.
While some developers of closed foundation models provide mechanisms for users to opt
out of data collection, the data storage, sharing, and usage practices of foundation
model developers are not always transparent.
However, the benefits of open foundation models for innovation may have limits due to
potential comparative disadvantages in improving open foundation models over time. For
example, open foundation model developers generally do not have access to user feedback
and interaction logs that closed model developers do for improving models over time.
Further, because open foundation models are generally more heavily customized, model
usage becomes more fragmented and lessens the potential for strong economies of scale.
However, new research directions such as merging
models might allow open foundation
model developers to reap some of these benefits (akin to open source
software).
More generally, the usability of foundation models strongly influences
innovation:
factors beyond whether a model is released openly such
as the capabilities of the model and the quality of potential inference APIs shape
usability.
Foundation models are critical to modern scientific research, within and beyond the
field of artificial intelligence.
Broader access to foundation models enables greater inclusion in scientific research,
and model weights are essential for several forms of research across AI
interpretability, security, and safety.
Ensuring ongoing access to specific models is essential for the scientific
reproducibility of research, something that has been undermined to date by the business
practice of closed model developers to retire models
regularly.
And since closed foundation models are often instrumented by safety measures by
developers, these measures can complicate or render some research impossible.
However, model weights alone are insufficient for several forms of scientific research.
Other assets, especially the data used to build the model, are necessary.
For example, to understand how biases propagate, and are potentially amplified, requires
comparisons of data biases to model biases, which in turn requires access to the
training data.
Access to data and other assets, such as model checkpoints, has already enabled
wide-ranging downstream
research.
While some projects such as BLOOM and Pythia prioritize accessibility to such assets
with the stated goal of
advancing scientific research on foundation
models, it is not common for open models in
general.
In fact, while evaluations have received widespread attention for their potential to
clarify the capabilities and risks of foundation models, correctly interpreting the
results of evaluations requires understanding the relationship between the evaluation
data and a model's training data.
Transparency is a vital precondition for responsible innovation and public
accountability.
Yet digital technologies are plagued by problematic opacity. Widely available model
weights enable external researchers, auditors, and journalists to investigate and
scrutinize foundation models more deeply. In particular, such inclusion is especially
valuable given that the foundation model developers often underrepresent marginalized
communities that are likely to be subject to the harms of foundation models.
The history of digital technology demonstrates that broader scrutiny, including by those
belonging to marginalized groups that experience harm most acutely, reveals concerns
missed by developers. The 2023 Foundation Model Transparency Index indicates that
developers of major open foundation models tend to be more transparent than their closed
counterparts.
Still, model weights only make some types of transparency (such as
evaluations
of risk) possible, but they do not guarantee such transparency will manifest.
More generally, model weights do not guarantee transparency on the upstream resources
used to build the foundation model (e.g., data sources, labor practices, energy
expenditure) nor transparency on the downstream impact of the foundation model (e.g.,
affected markets, adverse events, usage policy enforcement).
Such transparency can help address prominent societal concerns surrounding bias,
privacy, copyright, labor, usage practices, and demonstrated harms.
Foundation models function as infrastructure for building downstream applications,
spanning market sectors.
By design, they contribute to the rise of algorithmic monoculture: many downstream
applications depend on the same foundation model.
Monocultures often yield poor societal resilience and are susceptible to widespread
systemic risk: consider the Meltdown and Spectre attacks, which led to massive security
risks because of the widespread dependence on Intel and ARM-based microprocessors.
Further, foundation model monocultures have been conjectured to lead to correlated
failures
and cultural homogenization. Since open foundation models are more easily and deeply
customized,
they may yield more diverse downstream model behavior, thereby reducing the severity of
homogeneous outcomes.
Broad access to model weights and greater customizability further enable greater
competition in downstream markets, helping to reduce market concentration at the
foundation model level from vertical cascading.
In the foundation model market, there are barriers to entry for low-resource actors in
developing foundation models given their significant capital costs.
Further, while open foundation models may increase competition in some regions of the AI
supply chain, they are unlikely to reduce market concentration in the highly
concentrated upstream markets of computing and specialized hardware providers.
All misuse analyses should systematically identify and characterize the potential threats being analyzed. In the context of open foundation models, this would involve naming the misuse vector, such as spear-phishing scams or influence operations, as well as detailing the manner in which the misuse would be executed. To present clear assumptions, this step should clarify the potential malicious actors and their resources: individual hackers are likely to employ different methods and wield different resources relative to state-sponsored entities.
Given a threat, misuse analyses should clarify the existing misuse risk in society. For example, Seger et al. (2023) outline the misuse potential for open foundation models via disinformation on social media, spear-phishing scams over email, and cyberattacks on critical infrastructure. Each of these misuse vectors already are subject to risk absent open foundation models. So understanding the pre-existing level of risk contextualizes and baselines any new risk introduced by open foundation models.
Assuming that risks exist for the misuse vector in question, misuse analyses should clarify how society (or specific entities or jurisdictions) defends against these risks. Defenses can include technical interventions (e.g., spam filters to detect and remove spear-phishing emails) and regulatory interventions (e.g., laws punishing the distribution of child sexual abuse material). Understanding the current defensive landscape informs the efficacy, and sufficiency, with which new risks introduced by open foundation models will be addressed.
The threat identification, paired with an analysis of existing risks and defenses, provides the conceptual foundation for reasoning about the risks of open foundation models. Namely, subject to the status quo, we can evaluate the marginal risk of open foundation models. Being aware of existing risk clarifies instances where open foundation models simply duplicate existing risk (e.g., an open language model providing biological information available via Wikipedia). Similarly, being aware of existing defenses clarifies instances where open foundation models introduce concerns that are well-addressed by existing measures. Conversely, we can identify critical instances where new risks are introduced (e.g., fine tuning models to create non-consensual intimate imagery of specific people) or where existing defenses will be inadequate (e.g., AI-generated child sexual abuse material may overwhelm existing law enforcement resources). Further, the marginal risk analysis need not only be conducted relative to the status quo, but potentially relative to other (possibly hypothetical) baselines. For example, understanding the marginal risk of open release relative to a more restricted release (e.g., API release of a closed foundation model) requires reasoning about the relevant existing defenses for said restricted release. This perspective ensures greater care is taken to not assume that closed releases are intrinsically more safe and, instead, to interrogate the quality of existing defenses.
While existing defenses provide a baseline for addressing new risks introduced by open foundation models, they do not fully clarify the marginal risk. In particular, new defenses can be implemented or existing defenses can be modified to address the increase in overall risk. Therefore, characterizations of the marginal risk should anticipate how defenses will evolve in reaction to risk: for example, (open) foundation models may also contribute to such defenses (e.g., the creation of better disinformation detectors or code fuzzers).
Finally, it is imperative to articulate the uncertainties and assumptions that underpin the risk assessment framework for any given misuse risk. This may encompass assumptions related to the trajectory of technological development, the agility of threat actors in adapting to new technologies, and the potential effectiveness of novel defense strategies. For example, forecasts of how model capabilities will improve or how the costs of model inference will decrease would influence assessments of misuse efficacy and scalability.
Misuse risk | Paper | Threat identification | Existing risk | Existing defenses | Marginal risk evidence | Ease of defense | Uncertainty/ assumptions |
---|---|---|---|---|---|---|---|
Spear-phishing scams | Hazell (2023) | ||||||
Cybersecurity risk | Seger et al. (2023) | ||||||
Disinformation | Musser (2023) | ||||||
Biosecurity risk | Gopal et al. (2023) | ||||||
Voice-cloning | Ovadya et al. (2019) | ||||||
Non-consensual intimate imagery | Lakatos (2023) | ||||||
Child sexual abuse material | Thiel et al. (2023) |
Name | Affiliation |
---|---|
Sayash Kapoor * | Princeton University |
Rishi Bommasani * | Stanford University |
Kevin Klyman | Stanford University |
Shayne Longpre | Massachusetts Institute of Technology |
Ashwin Ramaswami | Georgetown University |
Peter Cihon | GitHub |
Aspen Hopkins | Massachusetts Institute of Technology |
Kevin Bankston | Center for Democracy and Technology, Georgetown University |
Stella Biderman | Eleuther AI |
Miranda Bogen | Center for Democracy and Technology, Princeton University |
Rumman Chowdhury | Humane Intelligence |
Alex Engler | Work done while at Brookings Institution |
Peter Henderson | Princeton University |
Yacine Jernite | Hugging Face |
Seth Lazar | Australian National University |
Stefano Maffulli | Open Source Initiative |
Alondra Nelson | Institute for Advanced Study |
Joelle Pineau | Meta |
Aviya Skowron | Eleuther AI |
Dawn Song | University of California, Berkeley |
Victor Storchan | Mozilla AI |
Daniel Zhang | Stanford University |
Daniel E. Ho | Stanford University |
Percy Liang | Stanford University |
Arvind Narayanan | Princeton University |