readme : update hot topics

author Georgi Gerganov <redacted>

Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)

committer GitHub <redacted>

Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)
author Georgi Gerganov <redacted>
Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)
committer GitHub <redacted>
Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)
diff --git a/README.md b/README.md

index 225db8e49ce39fefcfa25790329670eaef2b5725..ce5dec7caeac9e145558931365b45de151610623 100644 (file)
--- a/README.md
+++ b/README.md
@@ -10,13 +10,8 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
  
  ### Hot topics
  
-- Remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD: https://github.com/ggerganov/llama.cpp/pull/5240
-- Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138
-  - [SYCL backend](README-sycl.md) is ready (1/28/2024), support Linux/Windows in Intel GPUs (iGPU, Arc/Flex/Max series)
-- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow
-- Collecting Apple Silicon performance stats:
-  - M-series: https://github.com/ggerganov/llama.cpp/discussions/4167
-  - A-series: https://github.com/ggerganov/llama.cpp/discussions/4508
+- Support for Gemma models: https://github.com/ggerganov/llama.cpp/pull/5631
+- Non-linear quantization IQ4_NL: https://github.com/ggerganov/llama.cpp/pull/5590
  - Looking for contributions to improve and maintain the `server` example: https://github.com/ggerganov/llama.cpp/issues/4216
  
  ----
author	Georgi Gerganov <redacted>
	Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)
committer	GitHub <redacted>
	Wed, 21 Feb 2024 13:39:54 +0000 (15:39 +0200)