On the Opportunities and Risks of Foundation Models

Download the report.

Authors: Rishi Bommasani*, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Kohd, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang , Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang*

Abstract

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

Introduction

To learn more, see this section in the report.

This report investigates an emerging paradigm for building artificial intelligence (AI) systems based on a general class of models which we term foundation models. A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks; current examples include BERT [Devlin et al. 2019], GPT-3 [Brown et al. 2020], and CLIP [Radford et al. 2021]. From a technological point of view, foundation models are not new — they are based on deep neural networks and self-supervised learning, both of which have existed for decades. However, the sheer scale and scope of foundation models from the last few years have stretched our imagination of what is possible; for example, GPT-3 has 175 billion parameters and can be adapted via natural language prompts to do a passable job on a wide range of tasks despite not being trained explicitly to do many of those tasks [Brown et al. 2020]. At the same time, existing foundation models have the potential to accentuate harms, and their characteristics are in general poorly understood. Given their impending widespread deployment, they have become a topic of intense scrutiny [Bender et al. 2021].

Capabilities

To learn more, see this section in the report.

Foundation models acquire various capabilities that can power applications. We have chosen to discuss five potential capabilities: the ability to process different modalities (e.g., language, vision), to affect the physical world (robotics), to perform reasoning, and to interact with humans (interaction). Finally, we conclude with a philosophical discussion of potential limits on their capabilities.

Language

To learn more, see this section in the report.
Authors: Isabel Papadimitriou, Christopher D. Manning

NLP as a field has blazed the trail for foundation models. While these models dominate standard benchmarks, there is a clear gap between the capabilities these models acquire currently and those that characterize language as a complex system for human communication and thought. In response to this, we emphasize the full range of linguistic variation (e.g., different styles, dialects, languages), which poses an opportunity and challenge given some variants are data-limited. Further, child language acquisition is more sample efficient than the training of foundation models; we examine how signals beyond text and grounding may help to bridge this gap. Both of these characteristics of language provide clear directions for future foundation models research.

Vision

To learn more, see this section in the report.
Authors: Shyamal Buch, Drew A. Hudson, Frieda Rong, Alex Tamkin, Xikun Zhang, Bohan Wu, Ehsan Adeli, Stefano Ermon, Ranjay Krishna, Juan Carlos Niebles, Jiajun Wu, Li Fei-Fei

Computer vision led the adoption of deep learning in AI, demonstrating that models pretrained on large annotated datasets can transfer to numerous downstream settings. Now, pretraining on web-scale raw data instead of curated datasets, foundation models are on the rise in computer vision. These models have shown promising results for standard tasks in the field, like image classification and object detection, and training on multimodal and embodied data beyond images may enable progress on significant challenges (e.g., 3D geometric and physical understanding, commonsense reasoning). We also discuss some of the key challenges in modeling (e.g., the ability to scale effectively to videos) and evaluation (e.g., the measurement of higher-order capabilities) along with the applications (e.g., ambient intelligence for healthcare) and societal considerations (e.g., surveillance) that will determine the impact of foundation models for computer vision going forward.

Robotics

To learn more, see this section in the report.
Authors: Siddharth Karamcheti, Annie Chen, Suvir Mirchandani, Suraj Nair, Krishnan Srinivasan, Kyle Hsu, Jeannette Bohg, Dorsa Sadigh, Chelsea Finn

A longstanding goal of robotics research is to develop "generalist" robots capable of performing myriad tasks across physically diverse environments. Unlike language and vision, which have led the way with foundation models both due to the abundance of raw data to train these models on and the availability of virtual applications to apply these models to, robotics faces fundamental challenges due to being anchored to the physical world. The principal challenge in developing new types of foundation models for robotics—different in nature than their language and vision counterparts—is acquiring sufficient data of the right form that is conducive to learning: we explore how plentiful data (e.g., generic videos of humans, amongst others) that is not specific to particular environments and across modalities (e.g., language, vision) may help to bridge this gap. These new robotic foundation models could allow for easier task specification and learning, ushering in new applications (e.g., better robotic assistance for household tasks) and heightening the importance of robustness and safety (e.g., formal safety evaluation).

Reasoning and search

To learn more, see this section in the report.
Authors: Yuhuai Wu, Frieda Rong, Hongyu Ren, Sang Michael Xie, Xuechen Li, Andy Shih, Drew A. Hudson, Omar Khattab

Reasoning and search problems such as theorem proving and program synthesis have been long-standing challenges in AI. The combinatorial search space renders traditional search-based methods intractable. However, humans are known to operate intuitively even in the most mathematical of domains, and indeed existing work such as AlphaGo have already shown that deep neural networks can be effective in guiding the search space. But humans also transfer knowledge across tasks, facilitating much more efficient adaptation and the ability to reason more abstractly. Foundation models offer the possibility of closing this gap: their multi-purpose nature along with their strong generative and multimodal capabilities offer new leverage for controlling the combinatorial explosion inherent to search.

Interaction

To learn more, see this section in the report.
Authors: Joon Sung Park, Chris Donahue, Mina Lee, Siddharth Karamcheti, Dorsa Sadigh, Michael S. Bernstein

Foundation models show clear potential to transform the developer and user experience for AI systems: foundation models lower the difficulty threshold for prototyping and building AI applications due to their sample efficiency in adaptation, and raise the ceiling for novel user interaction due to their multimodal and generative capabilities. This provides a synergy we encourage going forward: developers can provide applications that better fit the user's needs and values, while introducing far more dynamic forms of interaction and opportunities for feedback.

Philosophy of understanding

To learn more, see this section in the report.
Authors: Christopher Potts, Thomas Icard, Eva Portelance, Dallas Card, Kaitlyn Zhou, John Etchemendy

What could a foundation model come to understand about the data it is trained on? Focusing on the case of natural language, we identify different positions on the nature of understanding and explore their relevance for our central question. Our tentative conclusion is that skepticism about the capacity of future foundation models to understand natural language may be premature, especially where the models are trained on multi-modal data.

Applications

To learn more, see this section in the report.

At present, foundation model research is largely confined to computer science and AI, with the impact of foundation models and the applications they support largely being centered in the tech industry. Moving forward, foundation models present clear potential to transform and extend the reach of AI across many sectors beyond the tech industry, suggesting a more pervasive effect on people's lives. While there is a multitude of applications and domains to consider, we we have chosen three applications — healthcare, law, and education — because they represent foundational pillars of our society. For foundation models to significantly contribute to these application domains, models will require specific capabilities as well as technical innovation to account for the unique considerations in each domain. Further, since these domains are critical to societal function, applying foundation models in these domains requires engaging with deeply sociotechnical matters such as those those pertaining to data, privacy, interpretability, fairness and ethics.

Healthcare and biomedicine

To learn more, see this section in the report.
Authors: Michihiro Yasunaga, Jing Huang, Camilo Ruiz, Yuhui Zhang, Giray Ogut, Saahil Jain, William Wang, Yusuf Roohani, Hongyu Ren, Antoine Bosselut, Ehsan Adeli, Jure Leskovec, Russ Altman

Healthcare tasks (e.g., patient care via disease treatment) and biomedical research (e.g., scientific discovery of new therapies) require expert knowledge that is limited and expensive. Foundation models present clear opportunities in these domains due to the abundance of data across many modalities (e.g., images, text, molecules) to train foundation models, as well as the value of improved sample efficiency in adaptation due to the cost of expert time and knowledge. Further, foundation models may allow for improved interface design for both healthcare providers and patients to interact with AI systems, and their generative capabilities suggest potential for open-ended research problems like drug discovery. Simultaneously, they come with clear risks (e.g., exacerbating historical biases in medical datasets and trials). To responsibly unlock this potential requires engaging deeply with the sociotechnical matters of data sources and privacy as well as model interpretability and explainability, alongside effective regulation of the use of foundation models for both healthcare and biomedicine.

Law

To learn more, see this section in the report.
Authors: Peter Henderson, Lucia Zheng, Jenny Hong, Neel Guha, Mark Krass, Julian Nyarko, Daniel E. Ho

Legal applications require that attorneys read and produce long coherent narratives that incorporate shifting contexts and decipher ambiguous legal standards. Foundation models may provide benefits in this domain: ample data exists in the form of legal documents and their generative capabilities are well-suited to the many generative tasks required in law, but significant improvements are required for foundation models to be able to reliably reason over various sources of information to generate truthful long-form documents. As is the care in healthcare, the sample efficiency of adaptation for foundation models is of heightened value given the costs of expert time and knowledge in the legal domain, which may allow for the re-allocation of expertise towards pressing problems of justice and government service. The responsible development of foundation models for law will require specific consideration of privacy, and highlights core limitations of existing foundation models that will require fundamental advances with respect to provenance for their behavior and guarantees for the factuality of their generation.

Education

To learn more, see this section in the report.
Authors: Ali Malik, Dorottya Demszky, Pang Wei Koh, Moussa Doumbouya, Drew A. Hudson, Allen Nie, Hamed Nilforoshan, Alex Tamkin, Emma Brunskill, Noah Goodman, Chris Piech

Education is a complex and subtle domain; effective teaching involves reasoning about student cognition and should reflect the learning goals of students. The nature of foundation models presents promise here that has yet to be realized in the sphere of AI for education: while certain many streams of data in education are individually too limited to train foundation models, the ability to leverage relevant data from outside the domain (e.g., the Internet) and make use of data across multiple modalities (e.g., textbooks, mathematical formula, diagrams, video-based tutorials) jointly offers hope for foundation models that are broadly applicable to educational tasks. If foundation models lead to a significant improvement in education-relevant capabilities, there is clear potential for new applications that align with the open-ended generative (e.g., problem generation) and interactive (e.g., feedback to teachers) aspects of foundation models; the sample efficient adaptation of foundation models suggests greater ability for adaptive and personalized learning. In this event, renewed consideration is required of hallmarks of applying technology to education (e.g., student privacy), along with certain concerns becoming more critical (e.g., inequity in access to technology in education, technology-aided plagiarism).

Technology

To learn more, see this section in the report.

Now we discuss the technology behind building better model architectures, training and adaptation procedures, and of course scaling up the systems. One crucial but often overlooked topic is data—where does it come from and what is its composition? In addition, we want foundation models to be robust to distribution shifts and secure against attackers. Finally, we wish to understand why foundation models work from both a mathematical perspective as well as an empirical perspective.

Modeling

To learn more, see this section in the report.
Authors: Drew A. Hudson, Antoine Bosselut, Alex Tamkin, Omar Khattab, Jared Quincy Davis, Jiaxuan You, Trevor Gale

What structural properties give rise to a foundation model? In the modeling section, we explore the underlying architectures behind foundation models and identify 5 key attributes. First, we start by discussing expressivity of the computational model — to capture and assimilate real-world information, and scalability — to adeptly handle large quantities of high-dimensional data. These properties are successfully realized by existing architectures such as the transformer network that underpins most foundation models to date. We then proceed to attributes may be essential for the next generation of models, including: multimodallity — to consume, process and potentially produce content from different sources and domains, memory capacity — to effectively store and retrieve the acquired knowledge, and finally, compositionality, to foster successful generalization to novel settings and environments. We believe that realizing the full potential envisioned for foundation models will hinge on modelling advances to fulfill these desiderata.

Training

To learn more, see this section in the report.
Authors: Alex Tamkin

Training objectives mathematically specify how models should learn and acquire capabilities from their training data. The current status quo for training foundation models involves modality-specific objectives (e.g., masked language modeling for text and SimCLR for images) that are often chosen heuristically. We envision that future training objectives for foundation models will reflect two changes: principled selection derived from systematic evidence and evaluation, and domain-generality to provide rich, scalable, and unified training signal across data sources and modalities. We also discuss important design trade-offs, including generative vs discriminative training, the choice of input data representation, and the potential of future training objectives that involve explicit representations of goals.

Adaptation

To learn more, see this section in the report.
Authors: Xiang Lisa Li*, Eric Mitchell*, Sang Michael Xie, Xuechen Li, Tatsunori Hashimoto

Foundation models are intermediary assets; they are unfinished and generally should not be used directly, instead requiring adaptation for specific downstream tasks. The de facto approach for adaptation has been fine-tuning, with recent work suggesting that lightweight fine-tuning alternatives and prompting-based methods may achieve favorable accuracy-efficiency tradeoffs. Moving forward, we envision a more expansive view of adaptation that goes beyond just specializing foundation models to perform the task of interest: adaptation will alleviate deficiencies of stand-alone foundation models (e.g., temporal adaptation to reflect changes over time in the world) or introduce constraints (e.g., GDPR compliance relating to the right to be forgotten); this broader perspective on adaptation coincides with a need for new evaluation protocols that systematically evaluate adaptation methods while controlling for resources (e.g., runtime, memory) and access requirements involved in adaptation.

Evaluation

To learn more, see this section in the report.
Authors: Rishi Bommasani, Kawin Ethayarajh, Omar Khattab

Evaluation offers context to foundation models by providing a means to track progress, understand models, and document their capabilities and biases. Foundation models challenge the ability of standard evaluation paradigms in machine learning to achieve these goals since they are one step removed from specific tasks. To envision new paradigms in evaluation that suit foundation models, we discuss (i) evaluating foundation models directly to measure their inherent capabilities and inform how foundation models are trained, (ii) evaluating task-specific models by controlling for adaptation resources and access, and (iii) broader evaluation design to provide richer context beyond measures of accuracy (e.g., robustness, fairness, efficiency, environmental impact). Reform of evaluation practices will allow for evaluation that adequately serves both the diverse goals and stakeholders involved in the foundation model paradigm.

Systems

To learn more, see this section in the report.
Authors: Deepak Narayanan, Trevor Gale, Keshav Santhanam, Omar Khattab, Tianyi Zhang, Matei Zaharia

While the training data determines the theoretical information available for foundation models, and model architectures and training objectives determine how much of this information can be extracted, computer systems determine what is practically achievable. Systems are a key bottleneck for scaling in terms of data and model size, both of which appear to reliably track with improvements in capabilities. To ensure that we can train the next generation of foundation models efficiently (with respect to time and cost), we will require the co-design of algorithms, models, software, and hardware. This co-design is already starting to happen to in various forms, from carefully tuned parallelism strategies to new architectures such as retrieval-based and mixture-of-expert models. Beyond training, we consider what will be required to deploy applications on top of foundation models (e.g., efficient inference).

Data

To learn more, see this section in the report.
Authors: Laurel Orr, Simran Arora, Karan Goel, Avanika Narayan, Michael Zhang, Christopher Ré

Data is the lifeblood of foundation models; the training data of these models largely determines what these capabilities these models can acquire. The centrality of data is not unique to foundation models; recent calls for data-centric AI indicate the pervasive importance of managing, understanding, and documenting data used to train machine learning models. For foundation models specifically, the current modus operandi is for training data to be selected using unspecified or unclear principles with a general lack of transparency regarding the nature of training data. We believe an alternative approach is needed to re-imagine the data ecosystem surrounding foundation models: we draw upon work on data visualization and management to propose a data hub for foundation models. We articulate how this proposal relates to many of the relevant data-centric considerations for foundation models: selection, curation, documentation, access, visualization and inspection, quality assessment, and legal regulation.

Security and privacy

To learn more, see this section in the report.
Authors: Florian Tramèr*, Rohith Kuditipudi*, Xuechen Li*

Security and privacy for foundation models is largely uncharted at present. Fundamentally, foundation models are a high-leverage single point of failure, making them a prime target for attack: existing work demonstrates a variety of security vulnerabilities (e.g., adversarial triggers to generate undesirable outputs) or privacy risks (e.g., memorization of training data) for these models. Further, the generality of foundation models compounds these concerns, intensifying the risk for function creep or dual use (i.e., use for unintended purposes). For security, we view foundation models as akin to operating systems in traditional software systems; we discuss steps towards secure foundation models which, if achieved, would provide a strong abstraction layer to build upon for reliable ML applications. For privacy, by leveraging knowledge transfer from public data, foundation models may enable more sample efficient adaptation to sensitive data distributions, i.e., privacy-preserving applications may incur less degradation in accuracy when built using foundation models.

Robustness to distribution shifts

To learn more, see this section in the report.
Authors: Sang Michael Xie, Ananya Kumar, Rohan Taori, Tony Lee, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto

A major limitation of standard machine learning is that it produces models that are not robust to distribution shifts, where the training distribution does not match the test distribution (for the downstream task). Existing work shows that adapting a foundation model trained on a broad range of unlabeled data improves the robustness of adapted models across a wide variety of shifts. This opens a new set of promising directions for improving training and adaptation of foundation models for robustness. However, we do not believe that foundation models are a panacea for robustness—challenges such as extrapolation across time and spurious correlations are not likely to be fully addressed.

AI safety and alignment

To learn more, see this section in the report.
Authors: Alex Tamkin, Geoff Keeling, Jack Ryan, Sydney von Arx

Ensuring foundation models are reliable, robust, and interpretable is increasingly important when considering the potential real-world applications of these models. In addition to critical and immediate considerations, we also consider the relationship between foundation models and larger-scale risks, hazards, and harms that have the potential for increased relevance as model capabilities continue to advance. For example, we consider the importance of aligning foundation models such that they are not deployed with misspecified goals or values. We also discuss the relevance of forecasting the emergent behaviors of foundation models (e.g., the ability to deceive or plan strategically), which may complicate attempts to adapt them to particular tasks, and may require new approaches for interpretability or evaluation.

Theory

To learn more, see this section in the report.
Authors: Aditi Raghunathan, Sang Michael Xie, Ananya Kumar, Niladri Chatterji, Rohan Taori, Tatsunori Hashimoto, Tengyu Ma

Learning theory provides a broad foundation for the variety of contexts encountered in applied machine learning; theory offers both understanding, principles, and guarantees to complement empirical findings. At present, the study of foundation models is largely empirical: the theory of standard supervised learning, while relatively mature, is inadequate to fully explain foundation models. Specifically, the discrepancy between the training phase and the adaptation phase within the foundation model regime pinpoints the insufficiency of existing theory, since these phases correspond to (potentially) completely different tasks and data distributions. Nevertheless, we endeavor that advances in theory to address this discrepancy, even in simple, limited settings, will provide useful insights.

Interpretability

To learn more, see this section in the report.
Authors: John Hewitt*, Armin W. Thomas*, Pratyusha Kalluri, Rodrigo Castellon, Christopher D. Manning

Interpretability provides clarity to foundation models: the opacity of the deep neural networks that underpin foundation models, alongside the expected ubiquity of foundation models, heightens the need to understand these models and their capabilities. Interpretability methods at present generally are designed for interpreting and explaining the behavior of task-specific models; the nature of foundation models (i.e., the wide array of tasks these models are beneficial for and the unexpected emergent properties they acquire) introduces new challenges for interpretability research. To frame the discussion of interpretability for foundation models, we propose the one model-many models paradigm, which aims to determine the extent to which the one model (the foundation model) and its many models (its adapted derivatives) share decision-making building blocks. In addition to interpreting the decision-making components involved, we further discuss explainability in the context of foundation models (e.g., the validity ofpost hoc explanations generated by models) as well as the mechanisms that drive model behavior (which may clarify the extent to which understanding foundation models can extend to understanding their adapted derivatives). Given the critical role we ascribe interpretability in the study of foundation models, we conclude with an assessment of the societal impact of interpretability and non-interpretability.

Society

To learn more, see this section in the report.

We believe the rapid development of foundation models, adapted and deployed to various applications, will have wide-ranging consequences on the health of societies. What makes these models so exciting and also so troubling is their task agnosticity. Societal impact is easier (but still non-trivial) to understand and reason about when we talk about specific systems deployed to users, but how can we take into account the societal impact of all possible systems and use cases when developing foundation models?

Inequity and fairness

To learn more, see this section in the report.
Authors: Rishi Bommasani, Fereshte Khani, Esin Durmus, Faisal Ladhak, Dan Jurafsky

In many contexts, machine learning has been shown to contribute to, and potentially amplify, societal inequity. Foundation models may extend this trend, i.e., furthering the unjust treatment of people who have been historically discriminated against. However, understanding the relationship between inequity and foundation models requires reckoning with the abstraction of foundation models; foundation models are intermediary assets that are adapted for applications that impact users. Therefore, we delineate intrinsic biases, i.e., properties in foundation models that portend harm, and extrinsic harms, i.e., harms arising in the context of specific applications built using foundation models. We taxonomize various sources (e.g., training data, lack of diversity among foundation model developers, the broader sociotechnical context) that give rise to these biases and harms, emphasizing the importance, and technical difficulty, of source tracing to understand ethical and legal responsibility. We do not view unfairness as inevitable in the foundation model paradigm: to address unfair outcomes that arise from foundation models, we dually consider proactive interventions (e.g., technical methods like counterfactual data augmentation) and reactive recourse (e.g., mechanisms for feedback propagation and attribution of moral/legal responsibility).

Misuse

To learn more, see this section in the report.
Authors: Antoine Bosselut*, Shelby Grossman*, Ben Newman

We define foundation model misuse as the use of foundation models as they are technically intended (e.g., to generate language or video), but with the goal of causing societal harm (e.g., to generate disinformation, to develop deepfakes for harassment). We argue that advances in foundation models will result in higher-quality machine-generated content that will be easier to create and personalize for misuse purposes. For example, disinformation actors may use them to quickly generate collections of articles targeted across different demographic groups (e.g., nationality, political party, religion, etc.). While these new capabilities may limit existing human detection methods for harmful content (e.g., tracking similar text across different sources), foundation models may themselves provide promising potential as automated misuse detectors.

Environment

To learn more, see this section in the report.
Authors: Peter Henderson, Lauren Gillespie, Dan Jurafsky

Foundation models are the byproducts of computationally expensive training regimes, with the existing trajectory favoring even more intensive models; the energy required for this training coincides with the release of more carbon into the atmosphere and the degradation of the environment. At present, current discussion centers these enormous single-time training costs and the potential to amortize these costs across repeated use. We seek to clarify these discussions by identifying assumptions that shape the calculus of environmental impact for foundation models. Further, we envision that the ecosystem surrounding foundation models requires a multi-faceted approach: (i) more compute-efficient models, hardware, and energy grids all may mitigate the carbon burden of these models, (ii) environmental cost should be a clear factor that informs how foundation models are evaluated, such that foundation models can be more comprehensively juxtaposed with more environment-friendly baselines, and (iii) the cost-benefit analysis surrounding environmental impact necessitates greater documentation and measurement across the community.

Legality

To learn more, see this section in the report.
Authors: Neel Guha, Peter Henderson, Lucia Zheng, Mark Krass, Daniel E. Ho

Foundation models rest on tenuous legal footings at present; how the law bears on both the development and use of these models is largely unclear. Legal and regulatory frameworks for foundation models specifically, alongside those for AI technology more generally, will be needed to influence, constrain, and even foster practices in research, development, and deployment. Centering on the legal landscape of the United States, where existing consideration of algorithmic tools remains broadly uncertain, we highlight the pertinent issues of liability for model predictions and protections from model behavior. With respect to both issues, we describe how legal standards will need to be advanced to address these given the intermediary status of foundation models (as opposed to that of user-facing task-specific models).

Economics

To learn more, see this section in the report.
Authors: Zanele Munyikwa, Mina Lee, Erik Brynjolfsson

Foundation models are likely to have substantial economic impact due to their novel capabilities and potential applications in a wide variety of industries and occupations. We consider the implications of the development and use of foundation models for the future of the US and global economy with a focus on productivity, wage inequality, and concentration of ownership.

Ethics of scale

To learn more, see this section in the report.
Authors: Kathleen Creel, Dallas Card, Rose E. Wang, Isabelle Levent, Alex Tamkin, Armin W. Thomas, Lauren Gillespie, Rishi Bommasani, Rob Reich

In addition to running the risk of increasing inequity, as discussed in the section on fairness, the widespread adoption of foundation models poses other ethical, political and social concerns. We discuss ethical issues related to the scale of application of foundation models, such as homogenization and the concentration of power, as well as the norms and release strategies appropriate to address them.

Conclusion

To learn more, see this section in the report.

In this report, we have endeavored to comprehensively discuss many of the most critical aspects of foundation models, ranging from their technical underpinnings to their societal consequences. In this way, we acknowledge the unusual approach taken: we have attempted to clarify the nature of a paradigm that may only have just begun, rather than waiting for more to unfold or the dust to settle. Therefore, much still remains unclear in spite of our efforts and we reiterate that this is just the beginning of a paradigm shift: foundation models have only just begun to transform the way AI systems are built and deployed in the world. Moving forward, we view this document as serving an important role in orienting and framing dialogue on these models and this new paradigm in AI. That said, to ensure the responsible development and deployment of these models on durable foundations, we envision collaboration between different sectors, institutions, and disciplines from the onset to be especially critical.

Acknowledgements

We would like to thank the following people for their valuable feedback: Mohit Bansal, Boaz Barak, Yoshua Bengio, Stella Biderman, Su Lin Blodgett, Sam Bowman, Collin Burns, Nicholas Carlini, David Chalmers, Jack Clark, Jeff Dean, Jesse Dodge, Jarred Dunnmon, Gabe Dupre, Jason Eisner, Iason Gabriel, Dan Hendrycks, Avery Hill, Yacine Jernite, Gabbrielle Johnson, Sarah Kreps, Jay McClelland, Preetum Nakkiran, Julian Nyarko, Fernando Pereira, Vinodkumar Prabhakaran, Colin Raffel, Marten van Schijndel, Ludwig Schmidt, Yoav Shoham, Madalsa Singh, Megha Srivastava, Jacob Steinhardt, Emma Strubell, Qian Yang, Luke Zettlemoyer, and Ruiqi Zhong. In addition, we would like to especially thank Vanessa Parli for helping to organize this effort.

Citation Guidelines

To cite the entire report, please use the BibTeX entry provided below.
To cite an individual section of the report, please reference the section number. For example, for the ethics section, cite as (Bommasani et al., 2021, §5.6) or (Bommasani et al., 2021, §5.6: Ethics). This can be achieved through the command \citep[][§5.6]{Bommasani2021FoundationModels} in LaTeX.

@article{Bommasani2021FoundationModels,
title={On the Opportunities and Risks of Foundation Models},
author={Rishi Bommasani and Drew A. Hudson and Ehsan Adeli and Russ Altman and Simran Arora and Sydney von Arx and Michael S. Bernstein and Jeannette Bohg and Antoine Bosselut and Emma Brunskill and Erik Brynjolfsson and S. Buch and Dallas Card and Rodrigo Castellon and Niladri S. Chatterji and Annie S. Chen and Kathleen A. Creel and Jared Davis and Dora Demszky and Chris Donahue and Moussa Doumbouya and Esin Durmus and Stefano Ermon and John Etchemendy and Kawin Ethayarajh and Li Fei-Fei and Chelsea Finn and Trevor Gale and Lauren E. Gillespie and Karan Goel and Noah D. Goodman and Shelby Grossman and Neel Guha and Tatsunori Hashimoto and Peter Henderson and John Hewitt and Daniel E. Ho and Jenny Hong and Kyle Hsu and Jing Huang and Thomas F. Icard and Saahil Jain and Dan Jurafsky and Pratyusha Kalluri and Siddharth Karamcheti and Geoff Keeling and Fereshte Khani and O. Khattab and Pang Wei Koh and Mark S. Krass and Ranjay Krishna and Rohith Kuditipudi and Ananya Kumar and Faisal Ladhak and Mina Lee and Tony Lee and Jure Leskovec and Isabelle Levent and Xiang Lisa Li and Xuechen Li and Tengyu Ma and Ali Malik and Christopher D. Manning and Suvir P. Mirchandani and Eric Mitchell and Zanele Munyikwa and Suraj Nair and Avanika Narayan and Deepak Narayanan and Benjamin Newman and Allen Nie and Juan Carlos Niebles and Hamed Nilforoshan and J. F. Nyarko and Giray Ogut and Laurel Orr and Isabel Papadimitriou and Joon Sung Park and Chris Piech and Eva Portelance and Christopher Potts and Aditi Raghunathan and Robert Reich and Hongyu Ren and Frieda Rong and Yusuf H. Roohani and Camilo Ruiz and Jack Ryan and Christopher R'e and Dorsa Sadigh and Shiori Sagawa and Keshav Santhanam and Andy Shih and Krishna Parasuram Srinivasan and Alex Tamkin and Rohan Taori and Armin W. Thomas and Florian Tram{\`e}r and Rose E. Wang and William Wang and Bohan Wu and Jiajun Wu and Yuhuai Wu and Sang Michael Xie and Michihiro Yasunaga and Jiaxuan You and Matei A. Zaharia and Michael Zhang and Tianyi Zhang and Xikun Zhang and Yuhui Zhang and Lucia Zheng and Kaitlyn Zhou and Percy Liang},
journal={ArXiv},
year={2021},
url={https://crfm.stanford.edu/assets/report.pdf}
}