Stanford CRFM

Today, we are calling for AI developers to invest in the needs of third-party, independent researchers, who investigate flaws in AI systems. Our new paper advocates for a new standard of researcher protections, reporting and coordination infrastructure. The paper, In House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI, has 34 authors with expertise in machine learning, law, security, social science, and policy.

Introduction

Today, we are calling for AI developers to invest in the needs of third-party, independent researchers, who investigate flaws in AI systems. Our new paper advocates for a new standard of researcher protections, reporting, and coordination infrastructure. Together, these are necessary to empower the research community and make AI systems safer across the board.

Our contributions include:

building a standardized AI Flaw Report for researchers,
providing legal language developers can adopt to protect good-faith research,
designing flaw bounties developers can implement, and
recommending the creation of a centralized organization to coordinate and triage flaws across stakeholders.

AI flaws go unreported, undermining safety

Researchers face major barriers that block them from reporting flaws they discover in general-purpose AI (GPAI) systems. Independent AI evaluation lacks legal protections, and when flaws are found there is no infrastructure to responsibly distribute findings. As a result, many researchers do not report flaws–meaning companies cannot fix them–or share flaws with just one AI developer, ignoring many other impacted firms.

These gaps can have serious consequences for public safety. GPAI systems are causing real-world harm, from bad mental health advice to non-consensual intimate imagery. Without the help of independent researchers, AI will cause more harm to more people.

To tackle these issues, we brought together evaluators, system providers, and other stakeholders to design an improved process for coordinated flaw reporting.

First, we provide a checklist for third-party AI evaluators to facilitate good-faith research and responsible flaw reporting.
Second, we recommend action steps for GPAI system providers, including protections for responsible researchers and creating flaw bounties.
Third, we propose the creation of a Disclosure Coordination Center to help route standardized AI Flaw Reports to impacted stakeholders.

The status quo and envisioned GPAI flaw reporting ecosystem. On the left, we provide a non-exhaustive list of GPAI flaws that may warrant reporting. We propose that users disclose flaws via standardized AI Flaw Reports to a Disclosure Coordination Center. The Disclosure Coordination Center then routes AI Flaw Reports to affected stakeholders across the supply chain.

Fixing flaws is increasingly urgent as they often transfer to similar GPAI systems: if one system is susceptible to a specific kind of flaw, that same flaw is often present in another system. Developers and deployers make substantial efforts to mitigate these “transferable” flaws, but without regular reports from users they will miss important flaws and be slower to resolve them.

Third-party evaluators therefore have a central role in promoting AI safety and security. A core hurdle researchers and white-hat hackers face is a lack of clarity about what to report, when to report it, and to whom to report. As a remedy, we propose that they fill out standardized AI Flaw Reports (see figure below) that include the key information associated with a flaw and the system being examined.

Each report requires information on the relevant system, timestamps, a description of the flaw and how to reproduce it, policies the flaw violates, as well as tags that help prioritize mitigations. For flaws associated with outputs of a GPAI system, we recommend that reports are accompanied by statistical validity metrics that describe the frequency with which undesirable outputs appear for relevant prompts. These reports, rooted in existing literature and software security practices, can help with triaging of flaws as well as identification of transferable flaws. They are stem from ongoing organizational efforts to do just this, including from the AI Incident Database, AI Vulnerability Database, MITRE, and many others.

AI Flaw Report Schema. The Flaw Report contains common elements of disclosure from software security to improve reproducibility of flaws and triage among them. Green fields are automatically completed upon submission, gray fields are optional.

Developers should build reporting infrastructure

GPAI system providers should work closely not only with contracted, second-party evaluators but also third-party researchers (see figure below). Providers can take a number of steps to proactively encourage responsible third-party flaw disclosure. For instance, broad readings of providers’ terms of service can deter flaw discovery as terms may prohibit “reverse engineering” or “automatic data collection.” To address this, providers should offer safe harbor to good faith researchers: if there is no evidence that a company’s rules of engagement for researchers were violated, then providers should commit to refraining from legal action.

The spectrum of independence in GPAI evaluations. Evaluations can be stratified by their level of independence from the provider of the GPAI system. This ranges from entirely in-house evaluation (first-party) to contracted research (second-party) and research without a contractual relationship with the system provider (third-party). There are grey areas throughout the spectrum, and we provide examples for each gradation.

We also recommend that system providers create a dedicated flaw reporting program. This entails creating an interface to report flaws and publishing a responsible disclosure policy. The reporting interface should provide a mechanism for third-party evaluators to anonymously submit flaw reports and engage with the provider throughout the process of flaw reproduction and mitigation, while enabling the provider to triage reports. Many companies simply offer a company email address, which does not support these objectives.

Platforms like HackerOne, Bugcrowd, and Humane Intelligence provide interfaces designed specifically for these purposes. A provider’s accompanying policy should detail a broad scope for GPAI flaws, the rules of engagement for testers, and a liability exception for evaluators who follow these rules. Developers that currently include AI systems in their bug bounties often restrict the scope to cover security-related flaws only, but they should consider a broader scope to include safety and trustworthiness flaws as well.

Designing a Disclosure Coordination Center

Disclosure of AI flaws can be thorny. Transferable flaws can increase the risk of reporting prior to remediation, and complex AI supply chains have many different stakeholders that play a role in flaw mitigation. GPAI models are integrated into prominent products and services, often without the public’s advanced knowledge, making it difficult to catalog all providers who might benefit from a flaw report.

An AI Disclosure Coordination Center could address these complexities through a centralized structure. It would receive flaw reports and route them to the relevant stakeholders: data providers, system developers, model hubs or hosting services, app developers, model distribution platforms, government agencies, and eventually, after a delayed disclosure period, the broader public. For sensitive flaws, the Center should set appropriate public disclosure periods and help navigate requests to extend disclosure periods in order to implement mitigations. Interventions like an AI Disclosure Coordination Center can help pave the way for productive collaborations between companies and third-party researchers.

Conclusion

Coordinated flaw reporting infrastructure coupled with protections for third-party AI researchers can help reshape the future of AI evaluation. There are thousands of independent researchers who can help find and fix flaws faster, all while building a reporting culture in AI much like that in software security. If developers encourage this ecosystem to flourish, they will produce systems that are more transparent, accountable, and safe.

Additional Work

For citations to the extensive prior work on this topic, please see our paper’s references as well as the work of the 24 organizations whose affiliates helped write the paper: Massachusetts Institute of Technology, Stanford University, Princeton University, OpenPolicy, UL Research Institutes, Hugging Face, AI Risk and Vulnerability Alliance, Institute for Advanced Study, Boston University, Bugcrowd, HackerOne, University of California Berkeley, Hacking Policy Council, Carnegie Mellon University Software Engineering Institute, Partnership on AI, Google, Centre for the Governance of AI, Knight First Amendment Institute at Columbia University, PRISM Eval, Mozilla, Thorn, MLCommons, Microsoft, and Humane Intelligence.

General-Purpose AI Needs Coordinated Flaw Reporting

Authors: Shayne Longpre and Ruth Appel

Introduction

AI flaws go unreported, undermining safety

Developers should build reporting infrastructure

Designing a Disclosure Coordination Center

Conclusion

Additional Work

General-Purpose AI Needs Coordinated Flaw Reporting

Authors: Shayne Longpre and Ruth Appel

Introduction

AI flaws go unreported, undermining safety

Evaluators should share standardized AI Flaw Reports

Developers should build reporting infrastructure

Designing a Disclosure Coordination Center

Conclusion

Additional Work