cmake --build build --config Release -j $(nproc)
```
-- By default, NNPA is enabled when available. To disable it (not recommended):
+- By default, NNPA is disabled by default. To enable it:
```bash
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
- -DGGML_NNPA=OFF
+ -DGGML_NNPA=ON
cmake --build build --config Release -j $(nproc)
```

- You can find popular models pre-converted and verified at [s390x Ready Models](https://huggingface.co/collections/taronaeo/s390x-ready-models-672765393af438d0ccb72a08).
+ You can find popular models pre-converted and verified at [s390x Verified Models](https://huggingface.co/collections/taronaeo/s390x-verified-models-672765393af438d0ccb72a08) or [s390x Runnable Models](https://huggingface.co/collections/taronaeo/s390x-runnable-models-686e951824198df12416017e).
- These models have already been converted from `safetensors` to `GGUF Big-Endian` and their respective tokenizers verified to run correctly on IBM z15 and later system.
+ These models have already been converted from `safetensors` to `GGUF` Big-Endian and their respective tokenizers verified to run correctly on IBM z15 and later system.
2. **Convert safetensors model to GGUF Big-Endian directly (recommended)**
The model you are trying to convert must be in `safetensors` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct)). Make sure you have downloaded the model repository for this case.
+ Ensure that you have installed the required packages in advance
+
+ ```bash
+ pip3 install -r requirements.txt
+ ```
+
+ Convert the `safetensors` model to `GGUF`
+
```bash
python3 convert_hf_to_gguf.py \
--outfile model-name-be.f16.gguf \

- The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
+ The model you are trying to convert must be in `gguf` file format (for example [IBM Granite 3.3 2B GGUF](https://huggingface.co/ibm-granite/granite-3.3-2b-instruct-GGUF)). Make sure you have downloaded the model file for this case.
```bash
python3 gguf-py/gguf/scripts/gguf_convert_endian.py model-name.f16.gguf BIG
### 2. NNPA Vector Intrinsics Acceleration
-Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned on when available) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
+Only available in IBM z16 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation.
### 3. zDNN Accelerator
-_Only available in IBM z16 or later system. No direction at the moment._
+_Only available in IBM z16 / LinuxONE 4 or later system. No support currently available._
### 4. Spyre Accelerator
-_No direction at the moment._
+_Only available with IBM z17 / LinuxONE 5 or later system. No support currently available._
## Performance Tuning
Answer: Please ensure that your GCC compiler is of minimum GCC 15.1.0 version, and have `binutils` updated to the latest version. If this does not fix the problem, kindly open an issue.
+4. Failing to install the `sentencepiece` package using GCC 15+
+
+ Answer: The `sentencepiece` team are aware of this as seen in [this issue](https://github.com/google/sentencepiece/issues/1108).
+
+ As a temporary workaround, please run the installation command with the following environment variables.
+
+ ```bash
+ export CXXFLAGS="-include cstdint"
+ ```
+
+ For example,
+
+ ```bash
+ CXXFLAGS="-include cstdint" pip3 install -r requirements.txt
+ ```
+
+5. `-DGGML_NNPA=ON` generates gibberish output
+
+ Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`.
+
## Getting Help on IBM Z & LinuxONE
1. **Bugs, Feature Requests**
- ✅ - acceleration available
- 🚫 - acceleration unavailable, will still run using scalar implementation
- ❓ - acceleration unknown, please contribute if you can test it yourself
+
+Last Updated by **Aaron Teo (aaron.teo1@ibm.com)** on July 25, 2025.