### Intel GPU
-**Verified devices**
+SYCL backend supports Intel GPU Family:
+
+- Intel Data Center Max Series
+- Intel Flex Series, Arc Series
+- Intel Built-in Arc GPU
+- Intel iGPU in Core CPU (11th Generation Core CPU and newer, refer to [oneAPI supported GPU](https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-toolkit-system-requirements.html#inpage-nav-1-1)).
+
+#### Verified devices
| Intel GPU | Status | Verified Model |
|-------------------------------|---------|---------------------------------------|
| Intel Data Center Flex Series | Support | Flex 170 |
| Intel Arc Series | Support | Arc 770, 730M, Arc A750 |
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake |
-| Intel iGPU | Support | iGPU in i5-1250P, i7-1260P, i7-1165G7 |
+| Intel iGPU | Support | iGPU in 13700k, i5-1250P, i7-1260P, i7-1165G7 |
*Notes:*
### II. Build llama.cpp
#### Intel GPU
+
+```
+./examples/sycl/build.sh
+```
+
+or
+
```sh
# Export relevant ENV variables
source /opt/intel/oneapi/setvars.sh
### III. Run the inference
-1. Retrieve and prepare model
+#### Retrieve and prepare model
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
-2. Enable oneAPI running environment
+##### Check device
+
+1. Enable oneAPI running environment
```sh
source /opt/intel/oneapi/setvars.sh
```
-3. List devices information
+2. List devices information
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
```sh
./build/bin/llama-ls-sycl-device
```
+
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
```
found 2 SYCL devices:
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
```
+#### Choose level-zero devices
+
+|Chosen Device ID|Setting|
+|-|-|
+|0|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
+|1|`export ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
+|0 & 1|`export ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
+
+#### Execute
+
+Choose one of following methods to run.
+
+1. Script
+
+- Use device 0:
+
+```sh
+./examples/sycl/run_llama2.sh 0
+```
+- Use multiple devices:
+
+```sh
+./examples/sycl/run_llama2.sh
+```
-4. Launch inference
+2. Command line
+Launch inference
There are two device selection modes:
-- Single device: Use one device target specified by the user.
+- Single device: Use one device assigned by user. Default device id is 0.
- Multiple devices: Automatically choose the devices with the same backend.
In two device selection modes, the default SYCL backend is level_zero, you can choose other backend supported by SYCL by setting environment variable ONEAPI_DEVICE_SELECTOR.
```sh
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
```
-or run by script:
-
-```sh
-./examples/sycl/run_llama2.sh 0
-```
- Use multiple devices:
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
```
-Otherwise, you can run the script:
-
-```sh
-./examples/sycl/run_llama2.sh
-```
-
*Notes:*
- Upon execution, verify the selected device(s) ID(s) in the output log, which can for instance be displayed as follow:
In the oneAPI command line, run the following to print the available SYCL devices:
```
-sycl-ls
+sycl-ls.exe
```
There should be one or more *level-zero* GPU devices displayed as **[ext_oneapi_level_zero:gpu]**. Below is example of such output detecting an *intel Iris Xe* GPU as a Level-zero SYCL device:
### II. Build llama.cpp
+You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
+
+Choose one of following methods to build from source code.
+
+1. Script
+
+```sh
+.\examples\sycl\win-build-sycl.bat
+```
+
+2. CMake
+
On the oneAPI command line window, step into the llama.cpp main directory and run the following:
```
cmake --build build --config Release -j
```
-Otherwise, run the `win-build-sycl.bat` wrapper which encapsulates the former instructions:
-```sh
-.\examples\sycl\win-build-sycl.bat
-```
-
Or, use CMake presets to build:
+
```sh
cmake --preset x64-windows-sycl-release
cmake --build build-x64-windows-sycl-release -j --target llama-cli
cmake --build build-x64-windows-sycl-debug -j --target llama-cli
```
-Or, you can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
+3. Visual Studio
+
+You can use Visual Studio to open llama.cpp folder as a CMake project. Choose the sycl CMake presets (`x64-windows-sycl-release` or `x64-windows-sycl-debug`) before you compile the project.
*Notes:*
### III. Run the inference
-1. Retrieve and prepare model
+#### Retrieve and prepare model
-You can refer to the general [*Prepare and Quantize*](README#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
+You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
-2. Enable oneAPI running environment
+##### Check device
+
+1. Enable oneAPI running environment
On the oneAPI command line window, run the following and step into the llama.cpp directory:
```
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
```
-3. List devices information
+2. List devices information
Similar to the native `sycl-ls`, available SYCL devices can be queried as follow:
```
-build\bin\ls-sycl-device.exe
+build\bin\llama-ls-sycl-device.exe
```
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
| 0|[level_zero:gpu:0]| Intel(R) Arc(TM) A770 Graphics| 1.3| 512| 1024| 32| 16225243136|
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
+```
+#### Choose level-zero devices
+
+|Chosen Device ID|Setting|
+|-|-|
+|0|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
+|1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
+|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
+
+#### Execute
+
+Choose one of following methods to run.
+
+1. Script
+
+```
+examples\sycl\win-run-llama2.bat
```
+2. Command line
-4. Launch inference
+Launch inference
There are two device selection modes:
```
build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
```
-Otherwise, run the following wrapper script:
-```
-.\examples\sycl\win-run-llama2.bat
-```
Note:
use 1 SYCL GPUs: [0] with Max compute units:512
```
+
## Environment Variable
#### Build
| Name | Value | Function |
|--------------------|-----------------------------------|---------------------------------------------|
-| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path. |
+| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA | Set the SYCL target device type. |
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
-| CMAKE_C_COMPILER | icx | Set *icx* compiler for SYCL code path. |
-| CMAKE_CXX_COMPILER | icpx *(Linux)*, icx *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
+| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
+| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
#### Runtime
```
Otherwise, please double-check the GPU driver installation steps.
+- Can I report Ollama issue on Intel GPU to llama.cpp SYCL backend?
+
+ No. We can't support Ollama issue directly, because we aren't familiar with Ollama.
+
+ Sugguest reproducing on llama.cpp and report similar issue to llama.cpp. We will surpport it.
+
+ It's same for other projects including llama.cpp SYCL backend.
+
+
### **GitHub contribution**:
Please add the **[SYCL]** prefix/tag in issues/PRs titles to help the SYCL-team check/address them without delay.
## TODO
-- Support row layer split for multiple card runs.
+- NA