Fix __dp4a documentation (#2348)

author Johannes Gäßler <redacted>

Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)

committer GitHub <redacted>

Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)
author Johannes Gäßler <redacted>
Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)
committer GitHub <redacted>
Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)
diff --git a/README.md b/README.md

index c9fe6187bf9d184db6a3ce5fc17b4ce0eb3e44e0..a0e0ea2e01262685174b3c715ab7e71d0398fc52 100644 (file)
--- a/README.md
+++ b/README.md
@@ -401,7 +401,7 @@ Building the program with BLAS support may lead to some performance improvements
  
    | Option                  | Legal values           | Default | Description |
    |-------------------------|------------------------|---------|-------------|
-  | LLAMA_CUDA_FORCE_DMMV   | Boolean                |   false | Force the use of dequantization + matrix vector multiplication kernels instead of using kernels that do matrix vector multiplication on quantized data. By default the decision is made based on compute capability (MMVQ for 7.0/Turing/RTX 2000 or higher). Does not affect k-quants. |
+  | LLAMA_CUDA_FORCE_DMMV   | Boolean                |   false | Force the use of dequantization + matrix vector multiplication kernels instead of using kernels that do matrix vector multiplication on quantized data. By default the decision is made based on compute capability (MMVQ for 6.1/Pascal/GTX 1000 or higher). Does not affect k-quants. |
    | LLAMA_CUDA_DMMV_X       | Positive integer >= 32 |      32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |
    | LLAMA_CUDA_MMV_Y       | Positive integer       |       1 | Block size in y direction for the CUDA mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. |
    | LLAMA_CUDA_DMMV_F16     | Boolean                |   false | If enabled, use half-precision floating point arithmetic for the CUDA dequantization + mul mat vec kernels. Can improve performance on relatively recent GPUs. |
author	Johannes Gäßler <redacted>
	Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)
committer	GitHub <redacted>
	Sun, 23 Jul 2023 15:49:06 +0000 (17:49 +0200)