]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)
authorSigbjørn Skjæret <redacted>
Sun, 29 Jun 2025 09:04:10 +0000 (11:04 +0200)
committerGeorgi Gerganov <redacted>
Tue, 1 Jul 2025 14:54:53 +0000 (17:54 +0300)
commit8bc638ee5644411d84058edff1cda4b0abe0b2a1
treee48f3f9df29a9cbd7512017b636d5ae3ea395afd
parent00b36237ba7b7569995bff1ced92f0f796ed8b5c
ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)

* implement unary REGLU/GEGLU/SWIGLU cpu ops

* relax constraints

* duplicate shape of source

* fix ggml_vec_geglu_f16

* special case gated ops

* implement unary REGLU/GEGLU/SWIGLU cuda ops

* tighten constraints again

* refactor into GGML_GLU_OP

* metal : add glu kernels

ggml-ci

* add CUDA_GLU_BLOCK_SIZE [no ci]

* more constraints and use 64bit ints

ggml-ci

* 64bit multiplication [no ci]

* implement swapped variants (cpu/cuda)

* update comment [no ci]

ggml-ci

* Vulkan: Add GLU ops and shaders

* SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate

* ggml : implement GLU for split up/gate (llama/14181)

* implement GLU for split up/gate

* add tests for ggml_glu_split

* Vulkan: Implement glu_split logic and shader support

* add split to logging [no ci]

* SYCL: refactor element_size ops and add split up and gate support to gated kernels

* SYCL: switch GEGLU to use tanh approximation

---------

Co-authored-by: 0cc4m <redacted>
Co-authored-by: Akarshan <redacted>
* GGML: increase OP count in assertion

* Refactor: Optimize SYCL element-wise operations with unary function inlining

This commit refactors the SYCL element-wise operations to improve performance by:

- Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead.
- Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic.
- Replacing direct kernel calls with calls to these inlined functions.
- Using `__dpct_inline__` to encourage compiler inlining.
- Minor code cleanup and consistency improvements.

The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices.

* vulkan: Increase workgroup size for GLU, for performance (llama/14345)

* vulkan: Increase workgroup size for GLU, for performance

* vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup

* merge fix

* metal : add support for split and swap

ggml-ci

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: 0cc4m <redacted>
Co-authored-by: Akarshan <redacted>
Co-authored-by: Jeff Bolz <redacted>
23 files changed:
ggml/include/ggml.h
ggml/src/ggml-cpu/ggml-cpu.c
ggml/src/ggml-cpu/ops.cpp
ggml/src/ggml-cpu/ops.h
ggml/src/ggml-cpu/vec.cpp
ggml/src/ggml-cpu/vec.h
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/unary.cu
ggml/src/ggml-cuda/unary.cuh
ggml/src/ggml-metal/ggml-metal-impl.h
ggml/src/ggml-metal/ggml-metal.m
ggml/src/ggml-metal/ggml-metal.metal
ggml/src/ggml-sycl/element_wise.cpp
ggml/src/ggml-sycl/element_wise.hpp
ggml/src/ggml-sycl/ggml-sycl.cpp
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/geglu.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/glu_head.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/glu_main.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/reglu.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/swiglu.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
ggml/src/ggml.c