14. Data processing purpose (Score: 1)

For each data processing method, what is its primary purpose?

Disclosure:

References:

Not disclosed

Score justification:

The general methods and purposes for data processing in pre-training are made clear.

Indicator notes:

Data processing refers to any method that substantively changes the content of the data. For example, compression or changing the data file format is generally not in the scope of this indicator.

Example disclosure:

Examples of primary purposes for a data processing method could include: (i) removes low quality data, (ii) removes potentially personal/copyrighted data, (iii) removes product-irrelevant data, (iv) removes toxic data, (v) improves evaluation integrity, or (vi) prepares the data for training the model.

Is the amount of compute used in the model's final training run disclosed?

Disclosure:

No information provided about training FLOPs, though it can be estimated using published GPU hours of 2.8M H800 hours in DeepSeek-V3 paper by assuming estimates for conversion factors and information about R1 RL phase in R1 paper.

References:

https://arxiv.org/pdf/2412.19437

Score justification:

No information provided directly about compute usage.

Indicator notes:

Example disclosure:

Our model was trained using 5 x 10^25 FLOPs, measured according to the Frontier Model Forum guidance provided at this URL: https://www.frontiermodelforum.org/updates/issue-brief-measuring-training-compute/

22. Compute usage including R&D (Score: 0)

Is the amount of compute used to build the model, including experiments, disclosed?

Disclosure:

No information provided about cumulative training FLOPs.

References:

Not disclosed

Score justification:

No information provided directly about compute usage.

Indicator notes:

Compute should be reported in appropriate units, which most often will be floating point operations (FLOPs), along with a description of the measurement methodology, which may involve estimation. Compute should be reported to a precision of one significant figure (e.g. 7 x 10^26 FLOPs). Compared to the previous indicator, this indicator should include an estimation of the total compute used across experiments used towards the final training run for the model (such as including hyperparameter optimization or other experiments), and not just the final training run itself.

Example disclosure:

Our cumulative compute usage involved in building the model was 7 x 10^26 FLOPs, measured according to the Frontier Model Forum guidance provided at this URL: https://www.frontiermodelforum.org/updates/issue-brief-measuring-training-compute/

23. Development duration for final training run (Score: 1)

Is the amount of time required to build the model disclosed?

Disclosure:

The most relevant document is the DeepSeek-V3 paper with the relevant passage on pre-training: "During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pretraining stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training". Nature paper on R1 notes: For the training of DeepSeek-R1-Zero, we employed 64*8 H800 GPUs, and the process required approximately 198 hours. Additionally, during the training phase of DeepSeek-R1, we utilized the same 64*8 H800 GPUs, completing the process in about 4 days, or roughly 80 hours. To create the SFT datasets, we use 5K GPU hours

References:

https://arxiv.org/pdf/2412.19437

Score justification:

Development duration for V3 pretraining is adequately reported in terms of GPU hours and in days if intepreted as 55 days (2664K / 180K * 3.7 days), which is less than 2 months as mentioned. Post-training to produce R1 adds an additional 5k + (64*8*198) + (64*8*80) = 147k GPU hours and an additional 12 days. This yields estimates of 2.8M GPU hours and 67 days.

Indicator notes:

The amount of time should be specified in terms of both the continuous duration of time required and the number of hardware hours used. The continuous duration of time required to build the model should be reported in weeks, days, or hours to a precision of one significant figure (e.g. 3 weeks). The number of hardware hours should be reported to a precision of one significant figure and include the type of hardware hours. No form of decomposition into phases of building the model is required for this indicator, but it should be clear what the duration refers to (e.g. training the model, or training and subsequent evaluation and red teaming).

Example disclosure:

Our model was trained over a period of 90 days using 4x10^4 NVIDIA H100 GPU-days.

24. Compute hardware for final training run (Score: 1)

For the primary hardware used to build the model, is the amount and type of hardware disclosed?

Disclosure:

References:

https://arxiv.org/pdf/2412.19437

Score justification:

2048 Nvidia H800 GPUs throughout V3 training and 512 Nvidia H800 GPUs throughout R1 training.

Indicator notes:

In most cases, this indicator will be satisfied by information regarding the number and type of GPUs or TPUs used to train the model. The number of hardware units should be reported to a precision of one significant figure (e.g. 800 NVIDIA H100 GPUs). We will not award this point if (i) the training hardware generally used by the developer is disclosed, but the specific hardware for the given model is not, or (ii) the training hardware is disclosed, but the amount of hardware is not. We will award this point even if information about the interconnects between hardware units is not disclosed.

Example disclosure:

Our model was trained using 1000 NVIDIA H100 GPUs.

For all stages that are described, is there a clear description of the associated learning objectives or a clear characterization of the nature of this update to the model?

Disclosure:

References:

https://arxiv.org/pdf/2412.19437; https://arxiv.org/pdf/2501.12948

Score justification:

Model stages have clearly described objectives, including description of token prediction for initial pretraining step in V3 report.

Indicator notes:

We recognize that different developers may use different terminology for these stages, or conceptualize the stages differently. We will award this point if there is a clear description of the update to the model related to each stage, whether that is the intent of the stage (e.g. making the model less harmful), a mechanistic characterization (e.g. minimizing a specific loss function), or an empirical assessment (e.g. evaluation results conducted before and after the stage).

Example disclosure:

During unsupervised pre-training, the objective is next-token prediction. During supervised instruction tuning, we optimize for correctness and helpfulness on labeled tasks. RLHF aligns model outputs with human preference judgments. Domain-specific fine-tuning focuses on improving in-domain capabilities using specialized data (e.g., code or legal text). Final safety alignment reduces disallowed or harmful responses.

32. Code access (Score: 0)

Does the developer release code that allows third-parties to train and run the model?

Disclosure:

No information provided about code for building DeepSeek-R1, though inference code is released

References:

https://github.com/deepseek-ai/DeepSeek-R1/tree/main

Score justification:

No training code provided.

Indicator notes:

The released code does not need to match the code used internally.

Example disclosure:

We release training and inference code under an Apache 2.0 license at https://github.com/..., enabling others to replicate our core pipeline.

33. Organization chart (Score: 0)

How are employees developing and deploying the model organized internally?

Disclosure:

The DeepSeek V3 report notes contributors for Research & Engineering, Data Annotation, and Business & Compliance. The DeepSeek R1 report notes core contributors and contributors.

References:

https://arxiv.org/pdf/2412.19437; https://arxiv.org/pdf/2501.12948

Score justification:

No specific information about organization structure relevant for R1.

Indicator notes:

To receive a point, the developer should provide both the internal organization chart for the team developing the model as well as the headcounts (or a proportion of headcounts) by the team.

Example disclosure:

The model team comprises of 63 people, organized as follows: - CEO - Managing Director (Safety) — 24 people - Managing Director (Pre-training) — 12 people - Managing Director (Post-training) — 11 people - Managing Director (API) — 6 people - Director (Infrastructure and reliability) — 7 people - Director (PR and marketing) — 4 people - Director (hiring) — 7 people

34. Model cost (Score: 0)

What is the cost of building the model?

Disclosure:

The most relevant document is the DeepSeek-V3 report with the passage on training costs: "Consequently, our pretraining stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M." For DeepSeek-R1, the developer reports $294,000 USD cost.

References:

https://arxiv.org/pdf/2412.19437 ; https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-025-09422-z/MediaObjects/41586_2025_9422_MOESM1_ESM.pdf

Score justification:

No information provided about total model cost, only an estimate for V3 training compute based on market rates and similarly for R1, i.e. non-compute costs are not accounted for.

Indicator notes:

Monetary cost should be reported in appropriate currency (e.g. USD), along with the measurement methodology, which may involve estimation. Cost should be reported to a precision of one significant figure (e.g. 200 million USD).

Example disclosure:

We spent approximately 200 million USD on building the model: 50 million for data acquisition, 10 million for data processing, 20 million for personnel, 80 million for compute for R&D priced at market rates, and 40 million for compute for the final training run priced at market rates.

35. Basic model properties (Score: 1)

Are all basic model properties disclosed?

Disclosure:

"We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing." The DeepSeek Hugging Face documentation, notes that the number of parameters associated with the model is 671 billion. "Previous work has heavily relied on large amounts of supervised data to enhance model performance. In this study, we demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start. Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data. In the following sections, we present: (1) DeepSeek-R1-Zero, which applies RL directly to the base model without any SFT data, and (2) DeepSeek-R1, which applies RL starting from a checkpoint fine-tuned with thousands of long Chain-of-Thought (CoT) examples. 3) Distill the reasoning capability from DeepSeek-R1 to small dense models."

References:

DeepSeek-R1 Paper

Score justification:

The developer discloses the modalities, the model components, the model size, and the model architecture.

Indicator notes:

Basic model properties include: the input modality, output modality, model size, model components, and model architecture. To receive a point, all model properties should be disclosed. Modalities refer to the types or formats of information that the model can accept as input. Examples of input modalities include text, image, audio, video, tables, graphs. Model components refer to distinct and identifiable parts of the model. We recognize that different developers may use different terminology for model components, or conceptualize components differently. Examples include: (i) For a text-to-image model, components could refer to a text encoder and an image encoder, which may have been trained separately. (ii) For a retrieval-augmented model, components could refer to a separate retriever module. Model size should be reported in appropriate units, which generally is the number of model parameters, broken down by named component. Model size should be reported to a precision of one significant figure (e.g. 500 billion parameters for text encoder, 20 billion parameters for image encoder). Model architecture is the overall structure and organization of a foundation model, which includes the way in which any disclosed components are integrated and how data moves through the model during training or inference. We recognize that different developers may use different terminology for model architecture, or conceptualize the architecture differently; a sufficient disclosure includes any clear, though potentially incomplete, description of the model architecture.

Example disclosure:

Input modality: Text Output modality: Text Model components: Decoder-only model trained using self-supervised learning, followed by supervised fine tuning and RLHF that are used to align the language model to follow users' instructions and be helpful, harmless, and honest. Model size: 70B parameters Model architecture: Autoregressive (causal, decoder only) transformer language model with rotary position embeddings and are trained on the next token prediction task.

36. Deeper model properties (Score: 1)

Is a detailed description of the model architecture disclosed?

Disclosure:

Configuration file released on Hugging Face

References:

https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/config.json

Score justification:

The developer releases a configuration file that allows for the architecture to be reproduced.

Indicator notes:

To receive a point, the model architecture should be described in enough detail to allow for an external entity to fully implement the model. Publicly available code or a configuration file for a model training library (e.g., GPT-NeoX) would be a sufficiently detailed description.

Example disclosure:

The configuration file for training our model using a public model training library A can be found at [URL].

37. Model dependencies (Score: 1)

Is the model(s) the model is derived from disclosed?

Disclosure:

"Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning."

References:

DeepSeek-R1 Paper

Score justification:

The developer discloses that the model is derived from DeepSeek-V3-Base.

Indicator notes:

We will award this point for a comprehensive disclosure of the model or models on which the foundation model directly depends on or is derived from, as well as the method by which it was derived (e.g., through fine tuning, model merging, or distillation). Additionally, we will award a point if the developer discloses that the model is not dependent on or derived from any model.

Example disclosure:

This model is a fine tune of Camel-70B. We used the methods described in [PAPER URL] for distillation.

38. Benchmarked inference (Score: 0)

Is the compute and time required for model inference disclosed for a clearly-specified task on clearly-specified hardware?

Disclosure:

Not disclosed

References:

Not disclosed

Score justification:

No information is provided about the inference compute or time.

Indicator notes:

The duration should be reported in seconds to a precision of one significant figure (e.g. 0.002 seconds). Compute usage for inference should be reported in FLOPs/second to a precision of one significant figure (e.g. 5 x 10^21 FLOPs/second). The hardware in this evaluation need not be the hardware the developer uses for inference. The developer can report this figure over some known or public dataset.

Example disclosure:

It takes 0.002 seconds and 5 x 10^21 FLOPs/second to generate 100,000 tokens as 5,000 sequences of length 20 given inputs of length 40 from [DATASET URL]. The fixed set of hardware is 8 NVIDIA A100s.

39. Researcher credits (Score: 0)

Is a protocol for granting external entities API credits for the model disclosed?

Disclosure:

Not disclosed

References:

Not disclosed

Score justification:

No information is provided about a model credit access protocol.

Indicator notes:

A model credit access protocol refers to the steps, requirements, and considerations involved in granting credits to external entities. We will award this point if the developer discloses key details of its protocol, including (i) where external entities can request access to credits (e.g. via an access request form); (ii) explicit criteria for selecting external entities; and (iii) its policy on granting a transparent decision on whether access has been granted within a specified, reasonable period of time. Additionally, we will award a point if the developer discloses that it does not grant external entities API credits.

Example disclosure:

We implement a researcher access program: (i) Access can be requested from [URL] (ii) Any researcher at an accredited research institution is eligible to apply. Decisions are made based on the alignment between the applicant's project description and our target research directions (as described here: [URL]). (iii) Decision notifications are sent within three weeks of the application receipt.

40. Specialized access (Score: 0)

Does the developer disclose if it provides specialized access to the model?

Disclosure:

Not disclosed

References:

Not disclosed

Score justification:

No information is provided about specialized access.

Indicator notes:

Specialized access could include several categories, such as early access, subsidized access, or deeper access (e.g., to model weights or checkpoints, that are not publicly available). We will award this point if the developer discloses (i) if it provides specialized access and (ii) statistics on the number of users granted access across academia, industry, non-profits, and governments, to one significant figure.

Example disclosure:

We provide early access to the model via API to: (1) 250 academics vetted by our program (2) 0 industry affiliates (3) 0 non-profit affiliates (3) 2 government entities with whom we have signed MoUs We provide no other specialized research access.

41. Open weights (Score: 1)

Are the model's weights openly released?

Disclosure:

"🔄 DeepSeek-R1 is now MIT licensed for clear open access"

References:

DeepSeek-R1 Paper, DeepSeek API Docs, DeepSeek Hugging Face Doc

Score justification:

The model developer makes model weights publicly available.

Indicator notes:

To receive this point, model weights need to be publicly available at no cost. Developers may receive this point even if there are some restrictions on the external entities that are permitted access (e.g. geographic restrictions), insofar as these restrictions are transparent (e.g. via a license or some high-level description of who has been granted access to the foundation model).

Example disclosure:

Model weights are available on HuggingFace by following this link: [URL]

42. Agent Protocols (Score: 1)

Are the agent protocols supported for the model disclosed?

Disclosure:

The AI Agent frameworks table in GitHub includes a number of frameworks such as smolagents, YoMo, and superagentx

References:

https://github.com/deepseek-ai/awesome-deepseek-integration/tree/main?tab=readme-ov-file#ai-agent-frameworks

Score justification:

The developer discloses the agent protocols supported for this model.

Indicator notes:

Agent protocols are specifications that define how autonomous agents exchange messages, context, or function calls with other agents, tools, or services (e.g., Anthropic’s Model Context Protocol (MCP) and Google’s Agent‑to‑Agent (A2A) spec). To earn this point, documentation must enumerate each protocol and describe any deviations or proprietary extensions.

Example disclosure:

We support MCP and A2A for agents built using model A

43. Capabilities taxonomy (Score: 1)

Are the specific capabilities or tasks that were optimized for during post-training disclosed?

Disclosure:

"We directly apply RL to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeekR1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This breakthrough paves the way for future advancements in this area. We introduce our pipeline to develop DeepSeek-R1. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model’s reasoning and non-reasoning capabilities. We believe the pipeline will benefit the industry by creating better models."

References:

DeepSeek-R1 Paper, DeepSeek API Docs, DeepSeek Hugging Face Doc

Score justification:

The developer lists the capabilities optimized for during post-training.

Indicator notes:

Capabilities refer to the specific and distinctive functions that the model can perform. We recognize that different developers may use different terminology for capabilities, or conceptualize capabilities differently. We will award this point for a list of capabilities specifically optimized for in the post-training phase of the model, even if some of the capabilities are not reflected in the final model.

Example disclosure:

We focus on the following capabilities during post-training: (1) Coding ability (2) Retrieval of information and factuality (3) Multilingual language proficiency on non-English languages (4) Tool-use

44. Capabilities evaluation (Score: 1)

Does the developer evaluate the model's capabilities prior to its release and disclose them concurrent with release?

Disclosure:

"We evaluate models on MMLU (Hendrycks et al., 2020), MMLU-Redux (Gema et al., 2024), MMLU-Pro (Wang et al., 2024), C-Eval (Huang et al., 2023), and CMMLU (Li et al., 2023), IFEval (Zhou et al., 2023), FRAMES (Krishna et al., 2024), GPQA Diamond (Rein et al., 2023), SimpleQA (OpenAI, 2024c), C-SimpleQA (He et al., 2024), SWE-Bench Verified (OpenAI, 2024d), Aider 1 , LiveCodeBench (Jain et al., 2024) (2024-08 – 2025-01), Codeforces 2 , Chinese National High School Mathematics Olympiad (CNMO 2024)3 , and American Invitational Mathematics Examination 2024 (AIME 2024) (MAA, 2024). In addition to standard benchmarks, we also evaluate our models on open-ended generation tasks using LLMs as judges. Specifically, we adhere to the original configurations of AlpacaEval 2.0 (Dubois et al., 2024) and Arena-Hard (Li et al., 2024), which leverage GPT-4-Turbo-1106 as judges for pairwise comparisons. Here, we only feed the final summary to evaluation to avoid the length bias. For distilled models, we report representative results on AIME 2024, MATH-500, GPQA Diamond, Codeforces, and LiveCodeBench."

References:

DeepSeek-R1 Paper, DeepSeek API Docs, DeepSeek Hugging Face Doc

Score justification:

The developer discloses evaluations of the capabilities specified in the previous indicator.

Indicator notes:

The evaluations must contain precise quantifications of the model's behavior in relation to the capabilities specified in the capabilities taxonomy. We will award this point for any clear, but potentially incomplete, evaluation of multiple capabilities.

Example disclosure:

We evaluate capabilities using the following benchmarks: (1) Coding: HumanEval (2) Retrieval: HotPotQA (3) Multilingual performance: MMMLU (4) Tool use: UltraTool

Are mechanisms that are used for detecting content generated by this model disclosed?

Disclosure:

References:

https://cdn.deepseek.com/policies/en-US/deepseek-terms-of-use.html

Score justification:

A score of 0 is granted as no mechanism for detecting machine-generated content is disclosed.

Indicator notes:

A mechanism for detecting machine-generated content might include storing a copy of all outputs generated by the model to compare against, implementing a watermark on model outputs, adding cryptographic metadata (such as C2PA), or training a detector post-hoc to identify such content. We will award this point if any such mechanism is disclosed or if the developer reports that it does not have or use any such mechanism.

Example disclosure:

We train a classifier using model generations and human-written text to identify machine-generated content from Model A and our other models.

92. Documentation for responsible use (Score: 0)

Does the developer provide documentation for responsible use by downstream developers?

Disclosure:

HuggingFace provides usage recommendations

References:

https://huggingface.co/deepseek-ai/DeepSeek-R1#usage-recommendations

Score justification:

No information provided by developer or acknowledgement of lack of documentation.

Indicator notes:

To receive a point, the developer should provide documentation for responsible use. This might include details on how to adjust API settings to promote responsible use, descriptions of how to implement mitigations, or guidelines for responsible use. We will also award this point if the developer states that it does not provide any such documentation. For example, the developer might state that the model is offered as is and downstream developers are accountable for using the model responsibly.

Example disclosure:

Our Developer Documentation Hub consolidates integration guides, responsible‐use guidelines, and best practices: [link]

93. Permitted and prohibited users (Score: 1)

Is a description of who can and cannot use the model on the top-5 distribution channels disclosed?

Disclosure:

You represent and warrant that Services may not be used in or for the benefit of, or exported, re-exported, or transferred (a) to or within any country subject to comprehensive sanctions under Export Control and Sanctions Laws; (b) to any party on any restricted party lists under any applicable Export Control and Sanctions Laws that would prohibit your use of Services.

References:

https://cdn.deepseek.com/policies/en-US/deepseek-open-platform-terms-of-service.html

Score justification:

A score of 1 is granted as the terms of use clarify that sanctioned entities are prohibited users

Indicator notes:

We will award this point for a description of the company's policies for permitted and prohibitted users on its top-5 distribution channels. We will award this point if the developer has a more general acceptable use policy that it confirms applies across these distribution channels. We will award this point if there are no restrictions on users.

Example disclosure:

We allow usage by individuals 13 years of age or older who accept our Terms of Service. We prohibit use by export controlled entities or persons on denied-parties lists or in countries under U.S. embargo. We also reserve the right to restrict use if users engage in targeted harassment. For example, we only permit users over 13 with valid credentials, and prohibit usage from OFAC-sanctioned regions. We do not allow state-sponsored disinformation agencies to access our services.

94. Permitted, restricted, and prohibited uses (Score: 1)

Which uses are explicitly allowed, conditionally permitted, or strictly disallowed under the acceptable use policy for the top-5 distribution channels?

Disclosure:

R1 released on HF under an MIT license with no prohibited uses. Section 3.4 of terms of use: ""3.4 You will not use the Services to generate, express or promote content or a chatbot that:...""

References:

https://cdn.deepseek.com/policies/en-US/deepseek-terms-of-use.html"

Score justification:

A score of 1 is granted as the restricted and prohibited uses are disclosed across key distribution channels.

Indicator notes:

We will award this point for a rough characterization of two or more of permitted, restricted, and prohibited uses across the top-5 distribution channels. We will award this point if the developer has a more general acceptable use policy that it confirms applies across these distribution channels. We will award this point if there are no restrictions on users.

Example disclosure:

Permitted uses include general conversational queries, brainstorming, and coding assistance. Restricted uses include adult or violent content that requires caution or additional review. Prohibited uses include facilitating illicit activity, disinformation campaigns, or harassment. For example, we permit typical user requests like Q&A, text generation, and educational uses. We restrict content that depicts graphic violence or sexual content by applying additional filters. We prohibit any use aiming to conduct unlawful surveillance, promote extremist violence, or defraud others.

95. AUP enforcement process (Score: 1)

What are the methods used by the developer to enforce the acceptable policy?

Disclosure:

For users who violate regulations, DeepSeek has the right, according to its reasonable judgment and without notice, to take actions such as warnings, functionality restrictions, service suspensions or terminations, account bans, content deletions, and to save related records and report to competent authorities

References:

https://chat.deepseek.com/downloads/DeepSeek%20User%20Agreement.pdf

Score justification:

A score of 1 is granted as the user agreement provides a reasonable best-effort description of the processes used to respond to potential AUP violations.

Indicator notes:

We will award this point if the developer discloses the processes (automated or manual) it uses to detect, review, and respond to potential acceptable use policy violations. We will award this point for a reasonable best-effort attempt to provide the bulk of this information, though one line indicating the developer reserves the right to terminate accounts is insufficient. Alternatively, we will award this point if the developer reports that it does not use such methods to enforce its acceptable use policy.

Example disclosure:

We combine automated checks with human review for severe or repeated violations, issuing warnings or suspensions after repeat violations.

96. AUP enforcement frequency (Score: 0)

Are statistics on the developer's AUP enforcement disclosed?

Disclosure:

Open Platform Terms describe enforcement actions and record retention for violation

References:

https://cdn.deepseek.com/policies/en-US/deepseek-open-platform-terms-of-service.html

Score justification:

No information provided on statistics regarding frequency of enforcement.

Indicator notes:

We will award this point if the developer discloses enforcement statistics (e.g., violation counts or actions taken) from its enforcement of its acceptable use policy. Alternatively, we will award this point if the developer reports that it does not enforce its acceptable use policy.

Example disclosure:

We publish a quarterly enforcement report detailing violation counts by prohibited use category and the corresponding actions taken at [LINK]

What commitments has the developer made to government bodies?

Disclosure:

We also participated in the China AI Safety and Security Commitments Framework. Moreover, China’s AI regulatory framework differs significantly from that of the U.S., relying on formal legislation and regulatory oversight rather than solely on voluntary corporate commitments. As such, participation in commitments should not be used as a direct basis for comparison.

References:

https://aihub.caict.ac.cn/ai_security_and_safety_commitments

Score justification:

DeepSeek discloses its participation in the China AI Safety and Security Commitments Framework

Indicator notes:

We will award this point if the company provides an exhaustive list of commitments it has made to government bodies in the jurisdictions where it offers its model.

Example disclosure:

We have committed to the White House Voluntary Committments and the Seoul Committments.