readme : clarify MODEL_ENDPOINT usage (#20941)

author Adrien Gallouët <redacted>

Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)

committer GitHub <redacted>

Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)
author Adrien Gallouët <redacted>
Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)
committer GitHub <redacted>
Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)
diff --git a/README.md b/README.md

index 7d3ae6b7c275f9d75af87296ca5882a068ebb3a9..be23abcea67fb43c28e725501d8cafd33555d6b7 100644 (file)
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ LLM inference in C/C++
  
  ## Hot topics
  
-- **HuggingFace cache migration: models downloaded with `-hf` are now stored in the standard HuggingFace cache directory, enabling sharing with other HF tools.**
+- **Hugging Face cache migration: models downloaded with `-hf` are now stored in the standard Hugging Face cache directory, enabling sharing with other HF tools.**
  - **[guide : using the new WebUI of llama.cpp](https://github.com/ggml-org/llama.cpp/discussions/16938)**
  - [guide : running gpt-oss with llama.cpp](https://github.com/ggml-org/llama.cpp/discussions/15396)
  - [[FEEDBACK] Better packaging for llama.cpp to support downstream consumers 🤗](https://github.com/ggml-org/llama.cpp/discussions/15313)
@@ -242,7 +242,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
  <details>
  <summary>Tools</summary>
  
-- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from HuggingFace Hub and convert them to GGML
+- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from Hugging Face Hub and convert them to GGML
  - [akx/ollama-dl](https://github.com/akx/ollama-dl) – download models from the Ollama library to be used directly with llama.cpp
  - [crashr/gppm](https://github.com/crashr/gppm) – launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption
  - [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage
@@ -301,13 +301,13 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
  - [Trending](https://huggingface.co/models?library=gguf&sort=trending)
  - [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
  
-You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
+You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
  
  ```sh
  llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
  ```
  
-By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
+By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. The `MODEL_ENDPOINT` must point to a Hugging Face compatible API endpoint.
  
  After downloading a model, use the CLI tools to run it locally - see below.
author	Adrien Gallouët <redacted>
	Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)
committer	GitHub <redacted>
	Tue, 24 Mar 2026 09:35:07 +0000 (10:35 +0100)