The Foundation Model Transparency Index

A comprehensive assessment of the transparency of foundation model developers

Paper Board Press Data

CRFM Image
Context. Foundation models like GPT-4 and Llama 2 are used by millions of people. While the societal impact of these models is rising, transparency is on the decline. If this trend continues, foundation models could become just as opaque as social media platforms and other previous technologies, replicating their failure modes.
Design. We introduce the Foundation Model Transparency Index to assess the transparency of foundation model developers. We design the Index around 100 transparency indicators, which codify transparency for foundation models, the resources required to build them, and their use in the AI supply chain.
Execution. For the 2023 Index, we score 10 leading developers against our 100 indicators. This provides a snapshot of transparency across the AI ecosystem. All developers have significant room for improvement that we will aim to track in the future versions of the Index.


Key Findings

  • The top-scoring model scores only 54 out of 100. No major foundation model developer is close to providing adequate transparency, revealing a fundamental lack of transparency in the AI industry.
  • The mean score is a just 37%. Yet, 82 of the indicators are satisfied by at least one developer, meaning that developers can significantly improve transparency by adopting best practices from their competitors.
  • Open foundation model developers lead the way. Two of the three open foundation model developers get the two highest scores. Both allow their model weights to be downloaded. Stability AI, the third open foundation model developer, is a close fourth, behind OpenAI.
Overall Scores for the 10 foundation model providers


Indicators

We define 100 indicators that comprehensively characterize transparency for foundation model developers. We divide our indicators into three broad domains:

Overall Scores for the 10 foundation model providers broken down by domain
Scores for the 10 foundation model providers, broken down by domain.


Scores by subdomain

In addition to the top-level domains (upstream, model, and downstream), we also group indicators together into subdomains. Subdomains provide a more granular and incisive analysis, as shown in the figure below. Each of the subdomains in the figure includes three or more indicators.

Scores for the 10 foundation model providers broken down by 13 subdomains which have three or more indicators.
Scores for the 10 foundation model providers broken down by 13 subdomains, each of which have three or more indicators. Analysis at the level of major subdomains reveals actionable insight into what types of transparency or opacity lead to the above findings.


Open vs. Closed models

One of the most contentious policy debates in AI today is whether AI models should be open or closed. While the release strategies of AI are not binary, for the analysis below, we label models whose weights are broadly downloadable as open. Open models lead the way: We find that two of the three open models (Meta's Llama 2 and Hugging Face's BLOOMZ) score greater than or equal to the best closed model (as shown in the figure on the left), with Stability AI's Stable Diffusion 2 right behind OpenAI's GPT-4. Much of this disparity is driven the lack of transparency of closed developers on upstream issues such as the data, labor, and compute used to build the model (as shown in the figure on the right).

Scores for the 10 foundation model providers broken down by the 13 subdomains which have three or more indicators.
Open models (Meta's Llama-2, Hugging Face's BLOOMZ, and Stability AI's Stable Diffusion 2) lead the way.
Disparity is driven by upstream details, such as details about the data, labor, and compute used to develop
                    the model
The disparity between open and closed models is driven by upstream indicators, such as details about the data, labor, and compute used to develop the model


Methodology



Board


The FMTI advisory board will work directly with the Index team, advising the design, execution, and presentation of subsequent iterations of the Index. Concretely, the Index team will meet regularly with the board to discuss key decision points: How is transparency best measured, how should companies disclose the relevant information publicly, how should scores be computed/presented, and how should findings be communicated to companies, policymakers, and the public? The Index aims to measure transparency to bring about greater transparency in the foundation model ecosystem: the board’s collective wisdom will guide the Index team in achieving these goals. (Home)

Board members


Arvind Narayanan is a professor of computer science at Princeton University and the director of the Center for Information Technology Policy. He co-authored a textbook on fairness and machine learning and is currently co-authoring a book on AI snake oil. He led the Princeton Web Transparency and Accountability Project to uncover how companies collect and use our personal information. His work was among the first to show how machine learning reflects cultural stereotypes, and his doctoral research showed the fundamental limits of de-identification. Narayanan is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE).



Daniel E. Ho is the William Benjamin Scott and Luna M. Scott Professor of Law, professor of political science, professor of computer science (by courtesy), senior fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), senior fellow at the Stanford Institute for Economic Policy Research, and director of the Regulation, Evaluation, and Governance Lab (RegLab). Ho serves on the National Artificial Intelligence Advisory Committee (NAIAC), advising the White House on AI policy, as senior advisor on Responsible AI at the U.S. Department of Labor and as special advisor to the ABA Task Force on Law and Artificial Intelligence. His scholarship focuses on administrative law, regulatory policy, and antidiscrimination law. With the RegLab, his work has developed high-impact demonstration projects of data science and machine learning in public policy.



Danielle Allen is James Bryant Conant University Professor at Harvard University. She is a professor of political philosophy, ethics, and public policy and director of the Democratic Knowledge Project and of the Allen Lab for Democracy Renovation. She is also a seasoned nonprofit leader, democracy advocate, national voice on AI and tech ethics, distinguished author, and mom. A past chair of the Mellon Foundation and Pulitzer Prize Board, and former Dean of Humanities at the University of Chicago, she is a member of the American Academy of Arts and Sciences and American Philosophical Society. Her many books include the widely acclaimed Talking to Strangers: Anxieties of Citizenship Since Brown v Board of Education; Our Declaration: A Reading of the Declaration of Independence in Defense of Equality; Cuz: The Life and Times of Michael A.; Democracy in the Time of Coronavirus; and Justice by Means of Democracy. She writes a column on constitutional democracy for the Washington Post. She is also a co-chair for the Our Common Purpose Commission and founder and president for Partners In Democracy, where she advocates for democracy reform to create greater voice and access in our democracy, and to drive progress toward a new social contract that serves and includes us all.



Daron Acemoglu is an Institute Professor of Economics in the Department of Economics at the Massachusetts Institute of Technology and also affiliated with the National Bureau of Economic Research, and the Center for Economic Policy Research. His research covers a wide range of areas within economics, including political economy, economic development and growth, human capital theory, growth theory, innovation, search theory, network economics and learning. He is an elected fellow of the National Academy of Sciences, the British Academy, the American Philosophical Society, the Turkish Academy of Sciences, the American Academy of Arts and Sciences, the Econometric Society, the European Economic Association, and the Society of Labor Economists.



Rumman Chowdhury is the CEO and co-founder of Humane Intelligence, a tech nonprofit that creates methods of public evaluations of AI models, as well as a Responsible AI affiliate at Harvard’s Berkman Klein Center for Internet and Society. She is also a research affiliate at the Minderoo Center for Democracy and Technology at Cambridge University and a visiting researcher at the NYU Tandon School of Engineering. Previously, Dr. Chowdhury was the director of the META (ML Ethics, Transparency, and Accountability) team at Twitter, leading a team of applied researchers and engineers to identify and mitigate algorithmic harms on the platform. She was named one of BBC’s 100 Women, recognized as one of the Bay Area’s top 40 under 40, and a member of the British Royal Society of the Arts (RSA). She has also been named by Forbes as one of Five Who are Shaping AI.

(Top)


About us

The 2023 Foundation Model Transparency Index was created by a group of eight AI researchers from Stanford University's Center for Research on Foundation Models (CRFM) and Institute on Human-Centered Artificial Intelligence (HAI), MIT Media Lab, and Princeton University's Center for Information Technology Policy. The shared interest that brought the group together is improving the transparency of foundation models. See author websites below.