general: CONTRIBUTING.md - guidelines for quantization schemes (#19762)

author Piotr Wilkin (ilintar) <redacted>

Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)

committer GitHub <redacted>

Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)
author Piotr Wilkin (ilintar) <redacted>
Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)
committer GitHub <redacted>
Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index 996f34ed8206c2e7434f4ec4227f851b8650f1d0..fc26289aecfea9b350bd33d20e9911f17c479263 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -30,14 +30,19 @@ Before submitting your PR:
  - Search for existing PRs to prevent duplicating efforts
  - llama.cpp uses the ggml tensor library for model evaluation. If you are unfamiliar with ggml, consider taking a look at the [examples in the ggml repository](https://github.com/ggml-org/ggml/tree/master/examples/). [simple](https://github.com/ggml-org/ggml/tree/master/examples/simple) shows the bare minimum for using ggml. [gpt-2](https://github.com/ggml-org/ggml/tree/master/examples/gpt-2) has minimal implementations for language model inference using GPT-2. [mnist](https://github.com/ggml-org/ggml/tree/master/examples/mnist) demonstrates how to train and evaluate a simple image classifier
  - Test your changes:
-    - Execute [the full CI locally on your machine](ci/README.md) before publishing
-    - Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
-    - If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
-    - If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
+  - Execute [the full CI locally on your machine](ci/README.md) before publishing
+  - Verify that the perplexity and the performance are not affected negatively by your changes (use `llama-perplexity` and `llama-bench`)
+  - If you modified the `ggml` source, run the `test-backend-ops` tool to check whether different backend implementations of the `ggml` operators produce consistent results (this requires access to at least two different `ggml` backends)
+  - If you modified a `ggml` operator or added a new one, add the corresponding test cases to `test-backend-ops`
  - Create separate PRs for each feature or fix:
-    - Avoid combining unrelated changes in a single PR
-    - For intricate features, consider opening a feature request first to discuss and align expectations
-    - When adding support for a new model or feature, focus on **CPU support only** in the initial PR unless you have a good reason not to. Add support for other backends like CUDA in follow-up PRs
+  - Avoid combining unrelated changes in a single PR
+  - For intricate features, consider opening a feature request first to discuss and align expectations
+  - When adding support for a new model or feature, focus on **CPU support only** in the initial PR unless you have a good reason not to. Add support for other backends like CUDA in follow-up PRs
+  - In particular, adding new data types (extension of the `ggml_type` enum) carries with it a disproportionate maintenance burden. As such, to add a new quantization type you will need to meet the following *additional* criteria *at minimum*:
+    - convert a small model to GGUF using the new type and upload it to HuggingFace
+    - provide [perplexity](https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity) comparisons to FP16/BF16 (whichever is the native precision) as well as to types of similar size
+    - provide KL divergence data calculated vs. the FP16/BF16 (whichever is the native precision) version for both the new type as well as types of similar size
+    - provide [performance data](https://github.com/ggml-org/llama.cpp/tree/master/tools/llama-bench) for the new type in comparison to types of similar size on pure CPU
  - Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
  - If you are a new contributor, limit your open PRs to 1.
author	Piotr Wilkin (ilintar) <redacted>
	Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)
committer	GitHub <redacted>
	Fri, 13 Mar 2026 11:21:33 +0000 (12:21 +0100)