Is the amount of carbon emitted in building the model disclosed?

Disclosure:

References:

Not disclosed

Score justification:

Indicator notes:

Emissions should be reported in appropriate units, which most often will be tons of carbon dioxide emitted (tCO2), along with a description of the measurement methodology, which may involve estimation. Emissions should be reported to a precision of one significant figure (e.g. 500 tCO2). No form of decomposition into compute phases is required, but it should be clear whether the reported emissions is for a single model run or includes additional runs, or hyperparameter tuning, or training other models like reward models, or other steps in the model development process that generate emissions. If the developer is unable to measure or estimate this quantity due to information not being available from another party (e.g. compute provider), we will award this point if the developer explicitly discloses what information it lack and why it lacks it. Emissions should correspond with the energy used in the previous indicator.

Example disclosure:

Are code and prompts to allow for an external reproduction of the evaluation of post-training mitigations disclosed?

Disclosure:

References:

Not disclosed

Score justification:

The developer points to collaborations with Haize Labs and Enkrypt AI, but these collaboration summaries do not include code and prompts to allow for an external reproduction of the evaluation of post-training mitigations.

Indicator notes:

The released code and prompts need not be the same as what is used internally, but should allow the developer's results on all mitigations evaluations to be reproduced. The released code must be open-source, following the OSI definition of open-source. Alternatively, we will award this point if the developer reports that it does not mitigate risk.

Example disclosure:

We release the code and prompts for reproducing post-training mitigation evaluations at this GitHub link: [URL]

56. Model theft prevention measures (Score: 1)

Does the developer disclose the security measures used to prevent unauthorized copying (“theft”) or unauthorized public release of the model weights?

Disclosure:

Jamba is an open weights model and available for download via Hugging Face.

References:

Not disclosed

Score justification:

The developer specifies that the model is an open-weights model, which we interpret to mean that there are no safeguards against model theft implemented.

Indicator notes:

This indicator assesses the developer's disclosures regarding how it addresses the risk that malicious actors or insiders could exfiltrate or replicate proprietary weights. Security measures could include insider threat analysis and detection, in addition to external threat management. Examples of such measures include encryption at rest, key management, remote attestation, or auditing for suspicious queries. We will award a point if the developer discloses specific steps taken to safeguard the model weights or that none are implemented.

Example disclosure:

We store model weights on encrypted volumes with hardware-based key management. We monitor inference queries for suspicious patterns (like repeated attempts to reconstruct weights token-by-token), and we audit all staff access logs monthly.

57. Release stages (Score: 1)

Are the stages of the model's release disclosed?

Disclosure:

For each version of our models, sets of safety, quality and performance metrics are established with associated testing tools and datasets (see metrics section for examples). Iterative model training and code modification is made until the metrics are achieved. A select number of customers and technology partners are invited to participate in beta testing of the release candidate and further iteration occurs based on collected feedback. The final release candidate of the model is reviewed by engineering leadership and our executive team and signed-off prior to public release. Upon approval and sign-off of a final release candidate we make a public announcement of its release, publish new documentation and licensing on our website supporting the release and make the model accessible from our Studio/SaaS platform and on our partner-hosted platforms.

References:

Not disclosed

Score justification:

The developer outlines the stages of release.

Indicator notes:

Release stages include A/B testing, release on a user-facing product, GA release, open-weight release, etc. We recognize that the release of a foundation model falls along a spectrum, with many forms of partial release, and that different developers may conceptualize release differently. We will award a point if the developer provides a clear identification of the stages through which the model was released.

Example disclosure:

We began with an internal alpha test for two weeks, followed by a closed beta with selected enterprise partners for one month, then a public waitlisted preview, and finally a general availability release once thresholds on safety benchmarks were met.

58. Risk thresholds (Score: 1)

Are risk thresholds disclosed?

Disclosure:

See: https://cdn.prod.website-files.com/66f8847c37d8e0032d189a19/678dfb51cd3fe5f2e69c0341_Report%20(2).pdf For each malicious prompt the response is judged and scored to help focus future rounds of training. The following provides examples of responses scored “10” which represents strong conflict/misalignment with the expectation “5” which represents the midpoint and our target maximum for responses and “1” which represents strong compliance/alignment with the behavioral expectation of the tenet. https://www.haizelabs.com/technology/haize-labs-and-ai21-labs-setting-new-standards-for-ethical-ai-in-business https://www.enkryptai.com/blog/enkryptai-and-ai21-labs-deliver-safer-language-models

References:

Not disclosed

Score justification:

The developer discloses in the first link provided (i) the harmful outcomes being scored and (ii) how the scores are computed. For (iii), the developer specifies "target" scores that must be reached in various phases of alignment.

Indicator notes:

Risk thresholds determine when a risk level is unacceptably high to a developer (e.g. leading to the decision to not release a model), moderately high (e.g. triggering additional safety screening), or low enough to permit normal usage. We will award this point if the developer discloses explicit risk thresholds that clarify (i) which harmful outcomes are being scored, (ii) how the scores are computed (in general terms, not necessarily disclosing internal algorithms), and (iii) what triggers an action to block, delay, or otherwise modify a model's release. Alternatively, we will award a point if the developer discloses that it does not consider explicit risk thresholds during model release.

Example disclosure:

Our risk threshold for biorisks is the ability to autonomously create bioweapons. Current models score a medium: they don't autonomously create bioweapons but could help a skilled practitioner with access to materials in speeding up creation of bioweapons. Risk thresholds higher than medium would delay the model's release until the risk level drops to medium or below.

59. Versioning protocol (Score: 1)

Is there a disclosed protocol for versioning and deprecation of the model?

Disclosure:

Major version numbers are assigned after a full pre-training run has been conducted (J1, J2) - minor version numbers are assigned after an instruct training run and/or new APIs are added to the platform (J1.1, J2.1). The latest model version remains active in our Studio/SaaS environment for three months after release of a new major version. This versioning is communicated to users in our release notes, whitepapers, documentation and change logs on from our corporate website.

References:

Not disclosed

Score justification:

The developer discloses a versioning and deprecation protocol.

Indicator notes:

We will award a point if the developer discloses how model versions are labeled, updated, deprecated, and communicated to users.

Example disclosure:

We version models based on the date of release: e.g., ModelName-11-01-2024. We additionally provide ModelName-latest, corresponding to the latest release. We deprecate versions of models when we plan to remove access to with a six months notice to users. Users should respond to model deprecation by switching to the newest version of the models or an equivalent non-deprecated model. Users can switch to a different model by replacing the model identifier (to e.g., ModelName-latest for the latest version) in API calls or through the Python SDK.

60. Change log (Score: 1)

Is there a disclosed change log for the model?

Disclosure:

The change log is available on AI21's website - https://docs.ai21.com/changelog

References:

Not disclosed

Score justification:

The developer discloses a changelog on its website that lists the performance and feature changes.

Indicator notes:

We will award a point if the developer publishes a version-by-version record of new features, fixes, or performance improvements.

Example disclosure:

On 11/1/2024 (version ModelName-11-01-2024), we improved model reasoning in technical domains. This resulted in a 20-point increase on the MATH benchmark (from 62% to 82%). Past change logs can be viewed at [URL]

61. Foundation model roadmap (Score: 1)

Is a forward-looking roadmap for upcoming models, features, or products disclosed?

Disclosure:

We plan to continue improving the Jamba family with new versions and also introduce a new AI planning and orchestrator product in the near future.

References:

Not disclosed

Score justification:

The developer provides information about new features and a general timeframe for that release.

Indicator notes:

A foundation model roadmap is a transparent statement about how the developer intends to evolve or expand its LLM offerings, including upcoming models, major feature releases, or expanded products based on the model, along with approximate timelines or version milestones. It can be high-level (e.g., “new model Q2 2025”), but must exist publicly.

Example disclosure:

We plan to release ModelX2 in Q2 2025, featuring enhanced multilingual capabilities and improved retrieval. We also aim to launch an enterprise-specific product tier for regulated industries by early 2026.

62. Top distribution channels (Score: 0)

Are the top-5 distribution channels for the model disclosed?

Disclosure:

Jamba is available directly from the (1) AI21 SaaS platform and through a growing network of technology partners including (2) Amazon Web Services (Bedrock and Sagemaker), (3) Google Cloud Platform Vertex and Launchpad, (4) Microsoft Azure, (5) Hugging Face. see - https://docs.ai21.com/docs/model-availability-across-platforms

References:

Not disclosed

Score justification:

The developer appears to have more than five distribution channels specified in the linked documentation (AWS, GCP, Azure, HF, Kaggle, AI21 SaaS), which requires the developer to provide the top-5 (along with the ranking metric used to determine that top-5). The developer specifies five platforms but does not seem to suggest that these are the top-5 and does not specify a ranking metric used to determine the top-5.

Indicator notes:

We define distribution channels to be either an API provider (a pathway by which users can query the model with inputs and receive outputs) or a model distributor (a pathway by which model weights are released). We recognize that distribution channels may arise without the knowledge of the model developer. For example, the weights of a model may be released through one distribution channel and then be distributed through other channels. Distribution channels can be ranked by any reasonable metric (e.g., number of queries, number of downloads, number of users, revenue). A description of the metric should be provided. API providers and model distributors may be ranked separately using different metrics as long as the total number of distribution channels equals five (if five distribution channels exist). For example, the developer may choose to disclose the top-3 API providers (ranked by the number of queries) and the top-2 model distributors (ranked by the number of downloads).

Example disclosure:

We provide API access to the model through A, B, and C. We distribute model weights through D and E. We pick the top-3 API providers based on the average number of queries per month and the top-2 model weight providers based on the average number of downloads per month.

63. Quantization (Score: 1)

Is the quantization of the model served to customers in the top-5 distribution channels disclosed?

Disclosure:

Like all models in its size class, Jamba 1.6 Large can’t be loaded in full (FP32) or half (FP16/BF16) precision on a single node of 8 GPUs. Dissatisfied with currently available quantization techniques, we developed ExpertsInt8, a novel quantization technique tailored for MoE models. With ExpertsInt8, we only quantize weights that are parts of the MoE (or MLP) layers, which for many MoE models account for over 85% of the model weights. In our implementation, we quantize and save these weights in INT8, an 8-bit precision format, and dequantize them at runtime directly inside the MoE GPU kernel. This is avaialble in all distribution channels.

References:

Not disclosed

Score justification:

The developer provides a description of the quantization of the model applied to all distribution channels.

Indicator notes:

We will award this point for a disclosure of the model precision in each of the top-5 distribution channels.

Example disclosure:

We serve the model at 16-bit precision on all distribution channels.

64. Terms of use (Score: 1)

Are the terms of use of the model disclosed?

Disclosure:

https://lp.ai21.com/hubfs/resources/AI21-Models-Terms-of-Service.pdf?_gl=1*wirk8d*_gcl_au*NDI4NzM2OTk5LjE3NDQ5MDcxODA.

References:

Not disclosed

Score justification:

The developer discloses its terms of service, which appears to apply to the bulk of the model's distribution channels is disclosed.

Indicator notes:

We define terms of use to include terms of service and model licenses. We will award this point for a pointer to the terms of service or model license. In the event that model's licenses are written more generally, it should be clear which assets they apply to. We recognize that different developers may adopt different business models and therefore have different types of model licenses. Examples of model licenses include responsible AI licenses, open-source licenses, and licenses that allow for commercial use. Terms of service should be disclosed for each of the top-5 distribution channels. However, we will award this point if there are terms-of-service that appear to apply to the bulk of the model’s distribution channels.

Example disclosure:

Our terms of service are published at https://ourcompany.com/model-tos - these terms cover both our API and all distribution channels for model weights.

65. Distribution channels with usage data (Score: 0)

What are the top-5 distribution channels for which the developer has usage data?

Disclosure:

Jamba is available directly from the AI21 SaaS platform and through a growing network of technology partners including Amazon Web Services (Bedrock and Sagemaker), Google Cloud Platform Vertex and Launchpad, Microsoft Azure, Hugging Face. We have access to usage data. see - https://docs.ai21.com/docs/model-availability-across-platforms

References:

Not disclosed

Score justification:

The developer does not fully disclose if it has access to usage data from its distribution channels - it states that it has access to usage data, but does not specify if this is true for all of the channels listed.

Indicator notes:

We define distribution channels to be either an API provider (a pathway by which users can query the model with inputs and receive outputs) or a model distributor (a pathway by which model weights are released). We recognize that distribution channels may arise without the knowledge of the model developer. For example, the weights of a model may be released through one distribution channel and then be distributed through other channels. Distribution channels can be ranked by any reasonable metric (e.g., number of queries, number of downloads, number of users, revenue). A description of the metric should be provided. We define usage data as any form of developer-exclusive data collected from any of a developer's distribution channel. A developer has access to usage data from a distribution channel if it is able to use that data for downstream purposes (e.g., analytics, training etc.). Usage data may be shared outside of the developer, but it is initially collected by the distribution channel and shared to the developer.

Example disclosure:

We have access to usage data through the distribution channels: A, B, and C.

66. Amount of usage (Score: 0)

For each of the top-5 distribution channels, how much usage is there?

Disclosure:

Not Disclosed

References:

Not disclosed

Score justification:

The developer acknowledges no disclosure.

Indicator notes:

Usage should be reported as the number of queries over the span of a month, reported to the precision of one significant figure (e.g., 50 million queries).

Example disclosure:

Distribution channel A: 50 million queries. Distribution channel B: 10 million queries. Distribution channel C: 10 million queries.

67. Classification of usage data (Score: 0)

Is a representative, anonymized dataset classifying queries into usage categories disclosed?

Disclosure:

Not Disclosed

References:

Not disclosed

Score justification:

The developer acknowledges no disclosure.

Indicator notes:

Developers may either share a fully public dataset or a partially restricted dataset (e.g., under a research license). We will award this point if there is a clear, aggregated or sample dataset that reveals categories of tasks/queries.

Example disclosure:

We provide quarterly releases of an anonymized dataset that classifies user queries into 20 broad job-related categories. Researchers can request access via [URL]. We ensure no PII is included.

68. Data retention and deletion policy (Score: 1)

Is a policy for data retention and deletion disclosed?

Disclosure:

We honor verified user requests to delete personal data from our training corpus by removing it from any subsequent scheduled retraining.

References:

Not disclosed

Score justification:

The developer discloses a policy regarding data retention and deletion via the disclosure provided to FMTI.

Indicator notes:

A data retention and deletion policy is a policy for removing particular data from the training set and/or preventing it from being used if there is a user or external request (e.g., “right to be forgotten”) that also covers internal data governance. This includes whether there is a formal process to delete or retract data from future training runs and how long raw data is retained. It also clarifies how quickly deletions propagate to the model (e.g., “only in subsequent major model releases”).

Example disclosure:

We honor verified user requests to delete personal data from our training corpus by removing it from any subsequent scheduled retraining. Our data retention policy ensures chat logs are purged after 90 days.

69. Geographic statistics (Score: 0)

Across all forms of downstream use, are statistics of model usage across geographies disclosed?

Disclosure:

There are thousands of applications built using AI21Studio and our APIs that are used by millions of people every day. These applications exist in a wide variety of industry segments including retail, financial services, healthcare, education, e-commerce, hi-tech, media/communications and entertainment/gaming. Jamba models are used by customers in more than 30 countries around the world with predominant usage in the U.S. the U.K and the E.U. Popular usage scenarios include Language Modeling and Completion, Instruction Following, Sentiment Analysis, Paraphrasing, Summarization and Question Answering.

References:

Not disclosed

Score justification:

The developer discloses a description of geographic statistics, but it is not adequately tied to usage statistics as per country usage data is not shared.

Indicator notes:

We will award this point if there is a meaningful, though potentially incomplete or vague, disclosure of geographic usage statistics at the country-level.

Example disclosure:

We share anonymized per-country usage metrics in a publicly accessible dashboard, updated monthly, on this link: [link]

70. Internal products and services (Score: 1)

What are the top-5 internal products or services using the model?

Disclosure:

The model is used by Wordtune with tens of millions of users - Wordtune is the only internal applicaiton.

References:

Not disclosed

Score justification:

The developer discloses the sole internal product/service that makes use of the model.

Indicator notes:

An internal product or service is a product or service built by the developer. Products or services can be ranked by any reasonable metric (e.g., number of users, queries, revenue). A description of the metric should be provided.

Example disclosure:

The model is used in products A, B, C, D, and E. We choose products based on the number of montly active users.

71. External products and services (Score: 0)

What are the top-5 external products or services using the model?

Disclosure:

Not Disclosed

References:

Not disclosed

Score justification:

The developer acknowledges no disclosure.

Indicator notes:

An external product or service is a product or service built by a party external to the developer. Products or services can be ranked by any reasonable metric (e.g., number of users, queries, revenue). A description of the metric should be provided. We will award a point if the developer discloses that that it does not have access to such metrics about external products or services.

Example disclosure:

The model is used in products A, B, C, D, and E. We choose products based on the number of montly active users.

72. Users of internal products and services (Score: 0)

How many monthly active users are there for each of the top-5 internal products or services using the model?

Disclosure:

The model is used by Wordtune with tens of millions of users - Wordtune is the only internal applicaiton.

References:

Not disclosed

Score justification:

The developer discloses the number of users of Wordtune, but not to one significant figure.

Indicator notes:

An internal product or service is a product or service built by the developer. The number of users refers to users who engaged or interacted with the model through the internal product or service over the last month or averaged over the last X months (this should be specified). Number of users should be specified to one significant figure (e.g. 100,000).

Example disclosure:

Over the last 6 months, the total monthly active users for our top-5 products using model Y are: Product A: 100,000 users Product B: 30,000 users Product C: 10,000 users Product D: 10,000 users Product E: 10,000 users

73. Consumer/enterprise usage (Score: 1)

Across all distribution channels for which the developer has usage data, what portion of usage is consumer versus enterprise?

Disclosure:

100% of our licensed/SaaS Jamba usage is from enterprise clients via AI21's enterprise offering.

References:

Not disclosed

Score justification:

The developer discloses that 100% of its licensed/SaaS usage is from enterprise

Indicator notes:

Consumer usage refers to usage by individual consumers. Enterprise usage refers to usage by enterprise customers (including government use). Consumer and enterprise usage should be calculated in terms of the number of queries by or the amount of revenue from consumer or enterprise users. Percentages should be specified to two significant digits (e.g., 12% consumer, 88% enterprise).

Example disclosure:

12% of the usage of model A across all distribution channels is from consumers, 88% is from enterprise users. Of this 88%, 6% is from users at governments. Usage is calculated based on number of queries.

74. Enterprise users (Score: 1)

Across all distribution channels for which the developer has usage data, what are the top-5 enterprises that use the model?

Disclosure:

We are not allowed to disclose these contracts due to NDAs and contractual privacy requirements. We do, however, publish case studies when customers allow it (see #77).

References:

Not disclosed

Score justification:

The developer discloses that it cannot identify its top 5 enterprise users due to contractual prohibitions.

Indicator notes:

Enterprises should be ranked by the number of queries made or the amount of revenue from usage since the model's release. We will also award this point if the developer indicates it does not have access to enterprise usage data.

Example disclosure:

The top-5 enterprises are A, B, C, D, and E. The enterprises are selected based on the number of queries.

75. Government use (Score: 1)

What are the 5 largest government contracts for use of the model?

Disclosure:

We are not allowed to disclose these contracts due to NDAs and contractual privacy requirements.

References:

Not disclosed

Score justification:

The developer acknowledges it cannot identify its top 5 government contracts due to contractual prohibitions.

Indicator notes:

This includes known government contracts of enterprise or government-specific products and services that use the model. We will award this point if the developer discloses its top five government contracts ranked monetary value, though the developer may omit contracts where it is under NDA regarding the existence of the contract.

Example disclosure:

The five largest government users of our service, along with their use cases, are: 1. County A is utilizing our product for improving access to internal resources 2. National Lab B is using our model to advance bioscientific research. 3. Federal agency C is using our product to deliver faster, more accurate translation services 4. City D is participating in a pilot program found our product helped reduce the time spent on routine tasks 5. Country E is using our product to summarize legal documents in their lower courts.

76. Benefits Assessment (Score: 1)

Is an assessment of the benefits of deploying the model disclosed?

Disclosure:

Fnac, a multinational retail chain using Jamba for data classification, saw a 26% improvement in output quality with Jamba 1.6 Mini, allowing them to move from Jamba 1.5 Large to Jamba 1.6 Mini—maintaining high quality while recouping a ~40% improvement in latency. In grounded question answering, Jamba 1.6 is powering personalized chatbots for online education provider Educa Edtech with more than 90% retrieval accuracy and citation reliability, ensuring their community of learners get trustworthy answers. A digital banking pioneer, advancing an assistant that delivers grounded answers to customer questions, found that Jamba Mini 1.6 scored 21% higher on precision than its predecessor on their own internal tests—and matched the quality of OpenAI’s GPT-4o. see more at - https://www.ai21.com/blog/introducing-jamba-1-6/

References:

Not disclosed

Score justification:

The developer provides case studies from clients showing quantified improvements in performance.

Indicator notes:

We will award this point for any quantitative assessment of the benefits or potential benefits of deploying the model.

Example disclosure:

We analyze the impact of using the model in education outcomes using a randomized controlled trial in third grade math assignnments, and find that use in the classroom improves standardized test outcomes by 26%. [Link to report.]

77. AI bug bounty (Score: 1)

Does the developer operate a public bug bounty or vulnerability reward program under which the model is in scope?

Disclosure:

We do not offer bug bounties.

References:

Not disclosed

Score justification:

The developer acknowledges that it does not offer a bug bounty.

Indicator notes:

We will award this point for a publicly documented bug bounty or vulnerability reward program describing (i) in-scope vulnerabilities (e.g., prompt bypasses, data leaks), (ii) out-of-scope items, (iii) submission process, and (iv) reward tiers or recognition if applicable. We will award a point if the developer discloses it has no AI bug bounty that encourages external researchers to report security, privacy, or adversarial vulnerabilities in the model.

Example disclosure:

We run a bug bounty program with HackerOne. We award up to $5,000 for critical vulnerabilities, such as discovering a major exploit that circumvents our content filters or reveals private data. [link to bug bounty]

78. Responsible disclosure policy (Score: 1)

Does the developer clearly define a process by which external parties can disclose model vulnerabilities or flaws?

Disclosure:

While we invite and encourage engagement from safety researchers we do not have a formal responsible disclosure policy. https://docs.ai21.com/docs/safety-research-1

References:

Not disclosed

Score justification:

The developer acknowledges that it does not maintain a responsible disclosure policy.

Indicator notes:

We will award this point for a description of the process external parties can use for responsbly disclosing model vulnerabilities and flaws, which should include (i) what mechanism external parties can use to disclose vulnerabilities or flaws (e.g., a form, an email) and (ii) what process follows a disclosure (e.g., how much time must parties wait until public release). This is often included with a bug bounty, but can also be standalone. We will award a point if the developer discloses it has no responsible disclosure policy.

Example disclosure:

We maintain a responsible disclosure policy at [URL] that describes how external parties can disclose vulnerabilities and flaws in Model A, including a 45-day disclosure window and an official contact for urgent security vulnerabilities.

79. Safe harbor (Score: 1)

Does the developer disclose its policy for legal action against external evaluators conducting good-faith research?

Disclosure:

AI safety is an important challenge with a large surface area, which we believe can be addressed most effectively by working together. We invite anyone interested in conducting research or otherwise promoting AI safety to contact us at safety@ai21.com and explore opportunities for collaboration. We encourage members of the community to contact us at the same address to report bad experiences, vulnerabilities and suspected misuse of our products or to voice any other safety-related concerns. While we invite and encourage engagement from safety researchers we do not have a formal responsible disclosure policy or a formal safe harbor policy.

References:

Not disclosed

Score justification:

The developer discloses it has no formal safe harbor policy

Indicator notes:

We will award this point if the developer discloses whether it has a policy committing it to not pursue legal action against external evaluators conducting good-faith research. This should not be only for software security vulnerabilities, but also AI flaws, and it should be based on researcher conduct standards, not at the sole discretion of the company. We will award this point if the developer provides a clear description of its policy regarding such protections for external researchers, or lack thereof.

Example disclosure:

We do not have a policy for researcher protections for good-faith safety research. OR Our policy ensures no legal action against good‐faith researchers who follow our disclosure guidelines, see: [link]

80. Security incident reporting protocol (Score: 1)

Are major security incidents involving the model disclosed?

Disclosure:

Users and researchers can report incidents regarding our flagship foundation model via safety@ai21.com or via trust.ai21.com, and we commit to an initial acknowledgment within 72 hours. Results are not disclosed publicly.

References:

Not disclosed

Score justification:

The developer discloses a security incident reporting protocol that covers the criteria in the notes via the FMTI report.

Indicator notes:

A security incident reporting protocol provides post-deployment transparency about serious incidents or breaches. Security incidents refer to incidents where external security threats affect the model (e.g., data breaches or DDoS attacks on the service). We will award this point if the developer states (i) how to submit a security incident report, (ii) how quickly it will respond, and (iii) when and whether results are disclosed. Every incident need not be reported publicly, but the developer must disclose a policy determining how incidents are reported and disclosed.

Example disclosure:

We publish a public ‘Security Incident Report’ on our website for any confirmed security incident affecting the model within 7 days of a patch being implemented. Users and researchers can report incidents via security@ourcompany.com, and we commit to an initial acknowledgment within 48 hours.

What commitments has the developer made to government bodies?

Disclosure:

We have commited to the OECD principles and framework - https://oecd.ai/en/ai-principles

References:

Not disclosed

Score justification:

The developer has committed to the OECD AI Principles

Indicator notes:

We will award this point if the company provides an exhaustive list of commitments it has made to government bodies in the jurisdictions where it offers its model.

Example disclosure:

We have committed to the White House Voluntary Committments and the Seoul Committments.