Context. Foundation models like GPT-4 and Llama 2 are used by millions of people. While
the societal impact of these models is rising, transparency is on the decline.
If this trend continues, foundation models could become just as opaque as social media platforms
and other previous technologies, replicating their failure modes.
Design. We introduce the Foundation Model Transparency Index to assess the transparency
of foundation model developers. We design the Index around 100 transparency indicators, which
codify transparency for foundation models, the resources required to build them, and their use
in the AI supply chain.
Execution. For the 2023 Index, we score 10 leading developers against our 100 indicators.
This provides a snapshot of transparency across the AI ecosystem. All developers have
for improvement that we will aim to track in the future versions of the Index.
The top-scoring model scores only 54 out of 100. No major foundation model developer
is close to providing adequate transparency, revealing a fundamental lack of transparency in
the AI industry.
The mean score is a just 37%. Yet, 82 of the indicators are satisfied by at least
one developer, meaning that developers can significantly improve transparency by adopting
best practices from their competitors.
Open foundation model developers lead the way. Two of the three open foundation model
developers get the two highest scores. Both
their model weights to be downloaded. Stability AI, the third open foundation model developer,
is a close fourth, behind OpenAI.
We define 100 indicators that comprehensively characterize
transparency for foundation model developers.
We divide our indicators into three broad domains:
Upstream. The upstream indicators specify the ingredients and processes involved in building a
foundation model, such as the computational resources, data, and labor used to build foundation models.
Full list of upstream indicators.
Model. The model indicators specify the properties and function of the foundation model, such as
the model's architecture, capabilities, and risks
Full list of model indicators.
Downstream. The downstream indicators specify how the foundation model is distributed and used,
such as the the model's impact on users, any updates to the model, and the policies that govern its use.
Full list of downstream indicators.
Scores by subdomain
In addition to the top-level domains (upstream, model, and downstream), we also group indicators together
into subdomains. Subdomains provide a more granular and incisive analysis, as shown in the figure below.
Each of the subdomains in the figure includes three or more indicators.
Data, labor, and compute are blind spots across developers.
Developers are least transparent with respect to the resources required to build foundation models.
This stems from low performance on the data, labor, and compute subdomains.
All developers' scores sum to just 20%, 17%, and 17% of the total available points
for data, labor, and compute.
Developers are more transparent about user data protection and the basic functionality of their
Developers score well on indicators related to user data protection (67%),
basic details about how their foundation models are developed (63%),
the capabilities of their models (62%), and their limitations (60%).
This reflects some baseline level of transparency across developers regarding how they process user data
and the basic functionality of their products.
There is room for improvement even in subdomains where developers are most transparent.
No developer provides information about the process by which it provides access to usage data.
Only a handful of developers are transparent in demonstrating the limitations of their models
or having third parties evaluate models' capabilities.
While every developer describes the input and output modality of its model, only three disclose the
model components and only two disclose the model size.
Open vs. Closed models
One of the most contentious policy debates in AI today is whether AI models should be open or closed.
While the release strategies of AI are not
binary, for the analysis below, we label models whose weights are
broadly downloadable as open. Open models lead the way: We find that two of the three open models (Meta's
Llama 2 and Hugging Face's
BLOOMZ) score greater than or equal to the best closed model (as shown in the
figure on the left), with Stability AI's Stable Diffusion 2 right behind OpenAI's GPT-4. Much of this
disparity is driven the lack of transparency of closed developers on
issues such as the data, labor, and compute used to build the model (as shown in the figure on the right).
Targets. We selected 10 major foundation model developers based on their influence,
and status as established companies. We assessed these companies on the basis of their most salient
and capable foundation model.
Information gathering. We systematically gathered information made publicly available by the
developer as of September 15, 2023.
Initial scoring. For each developer, two researchers scored the 100 indicators, assessing
whether the developer satisfied the indicator on the basis of public information. We compared scores and
resolved disagreements through discussion.
Company response. We shared the initial scores with leaders at each company, encouraging them to
contest scores they disagreed with. We addressed their reviews, finalizing scores along with
justifications and sources.
Acknowledgments. We thank Alex Engler, Anna Lee Nabors, Anna-Sophie Harling, Arvind Narayanan, Ashwin
Ramaswami, Aspen Hopkins, Aviv Ovadya, Benedict Dellot, Connor Dunlop, Conor Griffin, Dan Ho, Dan Jurafsky,
Deb Raji, Dilara Soylu, Divyansh Kaushik, Gerard de Graaf, Iason Gabriel, Irene Solaiman, John Hewitt,
Joslyn Barnhart, Judy Shen, Madhu Srikumar, Marietje Schaake, Markus Anderljung, Mehran Sahami, Peter Cihon,
Peter Henderson, Rebecca Finlay, Rob Reich, Rohan Taori, Rumman Chowdhury, Russell Wald, Seliem El-Sayed,
Seth Lazar, Stella Biderman, Steven Cao, Toby Shevlane, Vanessa Parli, Yann Dubois, Yo Shavit, and Zak
Rogoff for discussions on the topics of foundation models, transparency, and/or indexes that informed the
Foundation Model Transparency Index.
We especially thank Loredana Fattorini for her extensive work on the visuals for this project, as well as
Shana Lynch for her work in publicizing this effort.