]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
2 months agoggml: backward pass for split swiglu (llama/14483)
Johannes Gäßler [Thu, 3 Jul 2025 15:05:18 +0000 (17:05 +0200)]
ggml: backward pass for split swiglu (llama/14483)

2 months agoFix conditional enabling following arch checks for ggml-sycl (llama/14504)
Nicolò Scipione [Thu, 3 Jul 2025 09:00:03 +0000 (11:00 +0200)]
Fix conditional enabling following arch checks for ggml-sycl (llama/14504)

Signed-off-by: nscipione <redacted>
2 months agokv-cache : use ggml_set_rows (llama/14285)
Georgi Gerganov [Thu, 3 Jul 2025 07:53:35 +0000 (10:53 +0300)]
kv-cache : use ggml_set_rows (llama/14285)

* kv-cache : use ggml_set_rows

ggml-ci

* graph : separate k and v indices

ggml-ci

* cont : remove redundant ifs

ggml-ci

* kv-cache : improve find_slot impl

* kv-cache : bounds-check when accessing slot_info indices

* kv-cache : add comments

ggml-ci

* ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends

ggml-ci

2 months agoggml : fix FA mask dim 2 and 3 (llama/14505)
Georgi Gerganov [Thu, 3 Jul 2025 07:46:57 +0000 (10:46 +0300)]
ggml : fix FA mask dim 2 and 3 (llama/14505)

* ggml : fix FA mask dim 2 and 3

ggml-ci

* backends : unsupport batched FA in CUDA and Vulkan

ggml-ci

* vulkan : disable FA for mask->ne[2] != 1

2 months agoggml : remove kompute backend (llama/14501)
Georgi Gerganov [Thu, 3 Jul 2025 04:48:32 +0000 (07:48 +0300)]
ggml : remove kompute backend (llama/14501)

ggml-ci

2 months agoCUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497)
Aman Gupta [Wed, 2 Jul 2025 23:45:11 +0000 (07:45 +0800)]
CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497)

2 months agollama : initial Mamba-2 support (llama/9126)
compilade [Wed, 2 Jul 2025 17:10:24 +0000 (13:10 -0400)]
llama : initial Mamba-2 support (llama/9126)

* llama : initial Mamba-2 support

* ggml : SIMD ggml_ssm_scan for Mamba-2

* ggml : improve ggml_mul speed when masking recurrent states

* llama : support running Mamba-Codestral-7B-v0.1

* llama : fix Mamba-2 conv state saving

* ggml : make the ggml_mul fast broadcast path more consistently formatted

* llama : remove unused variable

* llama : add missing break

* convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present

The tokenzier.json of Mamba-Codestral-7B-v0.1 otherwise requires
workarounds to work correctly.

* llama : avoid redundant state copy for Mamba 1 and 2

* metal : attempt to adapt SSM_SCAN for Mamba-2

* metal : fix SSM_SCAN pipeline scope

* metal : use log and exp instead of log1pf and expf in SSM_SCAN

* metal : remove unused arguments for SSM_SCAN

The max index is 31, so trimming the arguments is necessary.

* metal : add back n_seqs to SSM_SCAN args

Whoops, this is needed for the offset in the concatenated output.

* metal : fix SSM_SCAN state head offset

* metal : fix wrong number of tokens per sequence in SSM_SCAN

* ggml : remove unused fast broadcast path in GGML_MUL

This was initially added because states were masked with ggml_mul,
but this is no longer done and so this "optimisation" is no longer
necessary, or at least not worth the additional code complexity.

* ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks

* convert : fix flake8 lint

* metal : fix confusion between ; and ,

* metal : add missing args for nb references in ssm_scan_f32_group

* metal : single-user mamba2 inference works

* kv-cache : remove const_cast when setting inputs for s_copy

And also fix multi-user inference for recurrent models
by using cell_id instead of i as the kv cell index
when populating s_copy.

* convert : avoid AutoConfig for Mamba and Mamba2 hparams

* kv-cache : allow context shift for recurrent models

* graph : fix recurrent state copies when avoiding copies

Works, but using lambda functions might not be that clean.

* ggml : fix mamba2 ssm scan when compiled with SVE

* ggml-cpu : reorder SVE FMA for consistency with other SIMD arches

* cuda : implement ssm scan for Mamba2

There is still room for improvement, but it works!

* cuda : adapt Mamba1 ssm scan to shape changes from Mamba2

* mamba : fix mismatched new and delete size for llm_build_mamba

Subclasses of llm_graph_context cannot have extra fields,
because the called destructor is not the one from the subclass.
This otherwise would cause problems when runnning Mamba-(1|2) inference
when compiled -DGGML_SANITIZE_ADDRESS=ON

* cuda : graceful fallback for Mamba-1 models with weird embd size

2 months agoCUDA: add softmax broadcast (llama/14475)
Aman Gupta [Wed, 2 Jul 2025 12:34:24 +0000 (20:34 +0800)]
CUDA: add softmax broadcast (llama/14475)

* CUDA: add softmax broadcast

* Pass by const ref

* Review: Use blockDims for indexing, remove designated initializers

* Add TODO for noncontigous input/output

2 months agoCUDA: broadcasting for FlashAttention mask (llama/14500)
Johannes Gäßler [Wed, 2 Jul 2025 11:42:12 +0000 (13:42 +0200)]
CUDA: broadcasting for FlashAttention mask (llama/14500)

2 months agovulkan: support softmax/FA batch and broadcast (llama/14449)
Jeff Bolz [Tue, 1 Jul 2025 08:32:56 +0000 (03:32 -0500)]
vulkan: support softmax/FA batch and broadcast (llama/14449)

2 months agosync : llama.cpp
Georgi Gerganov [Sat, 12 Jul 2025 11:35:19 +0000 (14:35 +0300)]
sync : llama.cpp

2 months agoggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)
Georgi Gerganov [Sat, 12 Jul 2025 11:33:49 +0000 (14:33 +0300)]
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)

2 months agoopencl : fix possible buffer overflow in dump_tensor (llama/14490)
zhouwg [Wed, 2 Jul 2025 12:38:10 +0000 (20:38 +0800)]
opencl : fix possible buffer overflow in dump_tensor (llama/14490)

2 months agoopencl : skip empty nodes on cgraph compute (llama/14491)
Eric Zhang [Wed, 2 Jul 2025 11:00:04 +0000 (19:00 +0800)]
opencl : skip empty nodes on cgraph compute (llama/14491)

2 months agoopencl : update upscale to support align corners (llama/14488)
lhez [Wed, 2 Jul 2025 07:07:42 +0000 (00:07 -0700)]
opencl : update upscale to support align corners (llama/14488)

2 months agoggml : Callback before abort (llama/14481)
Björn Ganster [Wed, 2 Jul 2025 05:19:31 +0000 (07:19 +0200)]
ggml : Callback before abort (llama/14481)

* Add a callback that will be called just before abort. This allows apps without a console to display a message to the user and save data if needed.

* Return previous callback to allow callback chaining

* style fixes

---------

Co-authored-by: Diego Devesa <redacted>
2 months agoci : disable fast-math for Metal GHA CI (llama/14478)
Georgi Gerganov [Tue, 1 Jul 2025 15:04:08 +0000 (18:04 +0300)]
ci : disable fast-math for Metal GHA CI (llama/14478)

* ci : disable fast-math for Metal GHA CI

ggml-ci

* cont : remove -g flag

ggml-ci

2 months agoCANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (llama/14411)
Chenguang Li [Tue, 1 Jul 2025 08:47:30 +0000 (16:47 +0800)]
CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (llama/14411)

* [CANN]update to aclnnGroupedMatmulV2

Signed-off-by: noemotiovon <redacted>
* Support MUL_MAT_ID on 310p

Signed-off-by: noemotiovon <redacted>
* fix editorconfig

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
2 months agovulkan: Split large mul_mat_id to fit in shared memory (llama/14451)
Jeff Bolz [Tue, 1 Jul 2025 08:43:08 +0000 (03:43 -0500)]
vulkan: Split large mul_mat_id to fit in shared memory (llama/14451)

2 months agoadd GELU_ERF (llama/14455)
Sigbjørn Skjæret [Tue, 1 Jul 2025 08:14:21 +0000 (10:14 +0200)]
add GELU_ERF (llama/14455)

2 months agoci : simplify, switch to ninja (#1295)
Kai Pastor [Fri, 11 Jul 2025 14:47:57 +0000 (16:47 +0200)]
ci : simplify, switch to ninja (#1295)

* CI: Move GGML_N_THREADS to env

* CI: Move macos-13 into matrix

* CI: Build with ninja

* CI: Remove env

2 months agoexamples : Test installed CMake config package (#1294)
Kai Pastor [Thu, 10 Jul 2025 06:57:51 +0000 (08:57 +0200)]
examples : Test installed CMake config package (#1294)

* Add test-cmake example

* CI: Run test for installed cmake config

3 months agovulkan : implement bilinear interpolation for ggml_upscale/ggml_interpolate (#1291)
Acly [Thu, 3 Jul 2025 17:58:12 +0000 (19:58 +0200)]
vulkan : implement bilinear interpolation for ggml_upscale/ggml_interpolate (#1291)

* supports GGML_SCALE_MODE_BILINEAR and GGML_SCALE_FLAG_ALIGN_CORNERS

3 months agovulkan : implement ggml_roll (#1290)
Acly [Thu, 3 Jul 2025 17:47:15 +0000 (19:47 +0200)]
vulkan : implement ggml_roll (#1290)

* vulkan : implement ggml_roll

* vulkan : refactor vk_op_unary_push_constants initialization

3 months agoggml : add version function to get lib version (#1286)
Daniel Bevenius [Wed, 2 Jul 2025 11:55:32 +0000 (13:55 +0200)]
ggml : add version function to get lib version (#1286)

* ggml : add version function to get lib version

This commit adds a function `ggml_version()` to the ggml library that
returns the version of the library as a string.

The motivation for this is that it can be useful to be able to
programmatically check the version of the ggml library being used.

Usage:
```c
printf("GGML version: %s\n", ggml_version());
```
Output:
```console
GGML version: 0.0.2219
```

* ggml : add ggml_commit()

---------

Co-authored-by: Georgi Gerganov <redacted>
3 months agosync : whisper.cpp
Georgi Gerganov [Wed, 2 Jul 2025 05:07:23 +0000 (08:07 +0300)]
sync : whisper.cpp

3 months agosync : llama.cpp
Georgi Gerganov [Tue, 1 Jul 2025 08:10:25 +0000 (11:10 +0300)]
sync : llama.cpp

ggml-ci

3 months agoggml : remove trailing whitespace (llama/0)
Georgi Gerganov [Tue, 1 Jul 2025 08:05:48 +0000 (11:05 +0300)]
ggml : remove trailing whitespace (llama/0)

3 months agoopencl : add GEGLU, REGLU, SWIGLU (llama/14456)
lhez [Tue, 1 Jul 2025 07:19:16 +0000 (00:19 -0700)]
opencl : add GEGLU, REGLU, SWIGLU (llama/14456)

3 months agoAdd Conv2d for CPU (llama/14388)
Aman Gupta [Mon, 30 Jun 2025 15:57:04 +0000 (23:57 +0800)]
Add Conv2d for CPU (llama/14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

3 months agometal : disable fast-math for some cpy kernels (llama/14460)
Georgi Gerganov [Mon, 30 Jun 2025 14:04:05 +0000 (17:04 +0300)]
metal : disable fast-math for some cpy kernels (llama/14460)

* metal : disable fast-math for some cpy kernels

ggml-ci

* cont : disable for q4_1

ggml-ci

* cont : disable for iq4_nl

ggml-ci

3 months agoggml-cpu: sycl: Re-enable exp f16 (llama/14462)
Romain Biessy [Mon, 30 Jun 2025 12:52:02 +0000 (14:52 +0200)]
ggml-cpu: sycl: Re-enable exp f16 (llama/14462)

3 months agotest-backend-ops : disable llama test (llama/14461)
Diego Devesa [Mon, 30 Jun 2025 10:43:15 +0000 (03:43 -0700)]
test-backend-ops : disable llama test (llama/14461)

3 months agocmake : Remove redundant include path in CMakeLists.txt (llama/14452)
xiaobing318 [Mon, 30 Jun 2025 09:48:24 +0000 (17:48 +0800)]
cmake : Remove redundant include path in CMakeLists.txt (llama/14452)

* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动

* Remove redundant include path in CMakeLists.txt

The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths.

* Enable scheduled Docker image builds

Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

3 months agoscripts : make the shell scripts cross-platform (llama/14341)
Vedran Miletić [Mon, 30 Jun 2025 08:17:18 +0000 (10:17 +0200)]
scripts : make the shell scripts cross-platform (llama/14341)

3 months agoSYCL: disable faulty fp16 exp kernel (llama/14395)
Akarshan Biswas [Sun, 29 Jun 2025 15:37:58 +0000 (21:07 +0530)]
SYCL: disable faulty fp16 exp kernel (llama/14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

3 months agoggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (llama/14443)
Sigbjørn Skjæret [Sun, 29 Jun 2025 12:38:10 +0000 (14:38 +0200)]
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (llama/14443)

3 months agoggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)
Sigbjørn Skjæret [Sun, 29 Jun 2025 09:04:10 +0000 (11:04 +0200)]
ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158)

* implement unary REGLU/GEGLU/SWIGLU cpu ops

* relax constraints

* duplicate shape of source

* fix ggml_vec_geglu_f16

* special case gated ops

* implement unary REGLU/GEGLU/SWIGLU cuda ops

* tighten constraints again

* refactor into GGML_GLU_OP

* metal : add glu kernels

ggml-ci

* add CUDA_GLU_BLOCK_SIZE [no ci]

* more constraints and use 64bit ints

ggml-ci

* 64bit multiplication [no ci]

* implement swapped variants (cpu/cuda)

* update comment [no ci]

ggml-ci

* Vulkan: Add GLU ops and shaders

* SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate

* ggml : implement GLU for split up/gate (llama/14181)

* implement GLU for split up/gate

* add tests for ggml_glu_split

* Vulkan: Implement glu_split logic and shader support

* add split to logging [no ci]

* SYCL: refactor element_size ops and add split up and gate support to gated kernels

* SYCL: switch GEGLU to use tanh approximation

---------

Co-authored-by: 0cc4m <redacted>
Co-authored-by: Akarshan <redacted>
* GGML: increase OP count in assertion

* Refactor: Optimize SYCL element-wise operations with unary function inlining

This commit refactors the SYCL element-wise operations to improve performance by:

- Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead.
- Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic.
- Replacing direct kernel calls with calls to these inlined functions.
- Using `__dpct_inline__` to encourage compiler inlining.
- Minor code cleanup and consistency improvements.

The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices.

* vulkan: Increase workgroup size for GLU, for performance (llama/14345)

* vulkan: Increase workgroup size for GLU, for performance

* vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup

* merge fix

* metal : add support for split and swap

ggml-ci

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: 0cc4m <redacted>
Co-authored-by: Akarshan <redacted>
Co-authored-by: Jeff Bolz <redacted>
3 months agovulkan: Add fusion support for RMS_NORM+MUL (llama/14366)
Jeff Bolz [Sun, 29 Jun 2025 07:43:36 +0000 (02:43 -0500)]
vulkan: Add fusion support for RMS_NORM+MUL (llama/14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <redacted>
3 months agoCUDA: add bf16 and f32 support to cublas_mul_mat_batched (llama/14361)
Aman Gupta [Sat, 28 Jun 2025 17:30:53 +0000 (01:30 +0800)]
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (llama/14361)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

3 months agovulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline (llama/14378)
Jeff Bolz [Sat, 28 Jun 2025 15:36:40 +0000 (10:36 -0500)]
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipeline (llama/14378)

3 months agovulkan: lock accesses of pinned_memory vector (llama/14333)
Jeff Bolz [Sat, 28 Jun 2025 15:17:09 +0000 (10:17 -0500)]
vulkan: lock accesses of pinned_memory vector (llama/14333)

3 months agofix async_mode bug (llama/14432)
Xinpeng Dou [Sat, 28 Jun 2025 09:35:41 +0000 (17:35 +0800)]
fix async_mode bug (llama/14432)

3 months agovulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (llama/14427)
Jeff Bolz [Sat, 28 Jun 2025 03:35:30 +0000 (22:35 -0500)]
vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (llama/14427)

This setting needs to be passed through to vulkan-shaders-gen

3 months agoggml : add ggml_set_rows (llama/14274)
Radoslav Gerganov [Fri, 27 Jun 2025 13:41:40 +0000 (16:41 +0300)]
ggml : add ggml_set_rows (llama/14274)

* ggml : add ggml_set_rows

Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using
indices from 'c'.

ref: #8366

* use I64 for indices

* ggml : add repeat impl for i64

* ggml : add ggml_is_contiguous_rows

* ggml : ggml_set_rows support broadcast

* ggml : ggml_set_rows support quantized dst

ggml-ci

* ggml : support GGML_TYPE_F32 ".from_float" trait

* ggml : ggml_set_rows update comment + better index name

* tests : add ggml_set_rows

* metal : add ggml_set_rows implementation

ggml-ci

* ggml : simplify forward_dup_f32

* ggml : fix supports_op

* tests : add comment to set_rows

* ggml : leave the repeat_i64 for a separate PR

ggml-ci

* ggml : set_rows use std::min instead of MIN

* ggml : better error message for set_rows unsupported type

* metal : perform op->type check only once

* tests : more consistent implementation + more tests

ggml-ci

---------

Co-authored-by: Georgi Gerganov <redacted>
3 months agocmake: regen vulkan shaders when shaders-gen sources change (llama/14398)
bandoti [Thu, 26 Jun 2025 16:46:53 +0000 (13:46 -0300)]
cmake: regen vulkan shaders when shaders-gen sources change (llama/14398)

* Add shaders-gen sources as target deps

3 months agometal : add special-case mat-vec mul for ne00 == 4 (llama/14385)
Georgi Gerganov [Thu, 26 Jun 2025 12:51:19 +0000 (15:51 +0300)]
metal : add special-case mat-vec mul for ne00 == 4 (llama/14385)

ggml-ci

3 months agometal : batch rows copy in a single threadgroup (llama/14384)
Georgi Gerganov [Thu, 26 Jun 2025 12:50:15 +0000 (15:50 +0300)]
metal : batch rows copy in a single threadgroup (llama/14384)

* metal : batch rows copy in a single threadgroup

ggml-ci

* metal : handle some edge cases when threadgroup size is not a power of 2

ggml-ci

3 months agomusa: enable fp16 mma (all) and cublas on qy2 (llama/13842)
R0CKSTAR [Thu, 26 Jun 2025 04:11:59 +0000 (12:11 +0800)]
musa: enable fp16 mma (all) and cublas on qy2 (llama/13842)

* musa: enable fp16 mma (all) and cublas on qy2

Signed-off-by: Xiaodong Ye <redacted>
* Update src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
* musa: disable MUL_MAT_ID (q2_k × f32) due to precision issues

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
Co-authored-by: Johannes Gäßler <redacted>
3 months agoggml-cpu: enable IBM NNPA Vector Intrinsics (llama/14317)
Aaron Teo [Wed, 25 Jun 2025 21:49:04 +0000 (05:49 +0800)]
ggml-cpu: enable IBM NNPA Vector Intrinsics (llama/14317)

* ggml-cpu: add nnpa compile flag

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1)

* ggml-cpu: add fp16->fp32 nnpa first

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929)

* ggml-cpu: add fp32->fp16

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627)

* ggml-cpu: better variable names

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f)

* docs: update s390x docs

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7)

* ggml-cpu: add debugging prints to see if dlf16 is correct

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix print vs printf

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix float placeholder

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: ensure fp16 and fp32 load and stores are called

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fp16 load ensured to hit

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove sigint from fp16 store

for some reason, the function is not getting a hit when debugged with
    gdb. we will need to investigate further

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: nnpa switch to vec_xst test

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: switch to vec_xst for 4 element loops also

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: rework noop

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove noop, general code cleanup

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: clarify variable naming

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add breakpoint for debugging

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: test fix for conversion failure

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: disable fp32->fp16 nnpa conversions for now

there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: switch to elif macro

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: reattempt fp32->fp16

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix typo

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: reattempt fp32->fp16

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix compiler types

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: change to typedef vector types

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add 4 element loops for fp32->fp16

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: clarified vector naming

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: bring back fp32->fp16 store nnpa

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add nnpa macro check in ggml-impl

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add missing __func__

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: diagnose why __NNPA__ macro is not being defined

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: import vecintrin.h to fix compiler errors

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: update macro tests

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move s390x typedef to own header file

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: move s390x typedef to own header file"

This reverts commit 157f856c34589566151630e294563a420702db39.

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: switch to importing ggml-cpu-impl instead

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix macro declaration

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: test more macros

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add debug prints

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: bruteforce macro definitions

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move macro definitions

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add ggml-impl.h to cmakelists

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: switch to private macros

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move s390x typedef to own header file

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 157f856c34589566151630e294563a420702db39)

* ggml-cpu: move things around

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: bring back compile macros

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: switch to quotes for import

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add compiler error macro

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add s390x detection in ggml-src

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: bring back compile definitions

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: undo cmakelists work

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: move s390x typedef to own header file"

This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a.

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove typedefs.h

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove typedef from cmakelists

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add ggml-impl.h future notes

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: add todo comment for future reference

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: clarify naming of dlf16

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove unnecessary target compile definitions

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings

Signed-off-by: Aaron Teo <redacted>
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu

Signed-off-by: Aaron Teo <redacted>
* docs: update broken huggingface link for s390x

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix duplicate func names during compile

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: fix duplicate func names during compile"

This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d.

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"

This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5.

Signed-off-by: Aaron Teo <redacted>
* ggml: refactor fp16<->fp32 simd to ggml-cpu

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix missing simd-mappings.h import in quants.c

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix missing simd-mappings.h within repack

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix amx mmq missing simd-mappings.h

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: attempt at fixing loongarch failing build

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move nnpa together with other fp16<->fp32 simd

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: fix wrong refactor of ggml-base

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555

Signed-off-by: Aaron Teo <redacted>
* ggml: remove dependency on ggml-cpu from ggml-base

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: remove mistaken fallback macro

fallback logic was already implemented but i was too sleepy to realise

Signed-off-by: Aaron Teo <redacted>
* ggml: move ggml_table_f32_f16 to ggml-cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"

This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29.

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"

This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4.

Signed-off-by: Aaron Teo <redacted>
* ggml: move ggml_table_f32_f16 to ggml-cpu

ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006

Signed-off-by: Aaron Teo <redacted>
(cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4)

* ggml: move ggml_table_f32_f16 to ggml-cpu.c

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h

we rely on the variable declaration in ggml-cpu.c instead

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"

This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16.

Signed-off-by: Aaron Teo <redacted>
* ggml-cpu: bring back ggml_table_f32_f16

Signed-off-by: Aaron Teo <redacted>
* Revert "ggml-cpu: bring back ggml_table_f32_f16"

This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49.

Signed-off-by: Aaron Teo <redacted>
* fix ggml time initialization

* fix f32_f16 table init

* remove extra line

---------

Signed-off-by: Aaron Teo <redacted>
Co-authored-by: slaren <redacted>
3 months agoggml : do not output unprintable characters on GGUF load failure (llama/14381)
Sigbjørn Skjæret [Wed, 25 Jun 2025 21:26:51 +0000 (23:26 +0200)]
ggml : do not output unprintable characters on GGUF load failure (llama/14381)

3 months agosycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (llama/13973)
Anton Mitkov [Wed, 25 Jun 2025 16:09:55 +0000 (17:09 +0100)]
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (llama/13973)

3 months agoopencl: ref count `ggml_backend_opencl_context` and refactor profiling (llama/14254)
lhez [Tue, 24 Jun 2025 18:46:25 +0000 (11:46 -0700)]
opencl: ref count `ggml_backend_opencl_context` and refactor profiling (llama/14254)

* Move profiling info into `ggml_backend_opencl_context`
* Add `enqueue_ndrange_kernel` to launch kernel

3 months agoCUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324)
uvos [Mon, 23 Jun 2025 23:12:56 +0000 (01:12 +0200)]
CUDA/HIP: optimize mmv paths taken for HIP devices (llama/14324)

Co-authored-by: Johannes Gäßler <redacted>
3 months agoCUDA: mul_mat_v support for batch sizes > 1 (llama/14262)
Johannes Gäßler [Mon, 23 Jun 2025 11:11:31 +0000 (13:11 +0200)]
CUDA: mul_mat_v support for batch sizes > 1 (llama/14262)

* CUDA: mul_mat_v support for batch sizes > 1

* use 64 bit math for initial offset calculation

3 months agoHIP: enable vec fattn on RDNA4 (llama/14323)
uvos [Sun, 22 Jun 2025 14:51:23 +0000 (16:51 +0200)]
HIP: enable vec fattn on RDNA4 (llama/14323)

3 months agoCUDA: add mean operation (llama/14313)
Aman Gupta [Sun, 22 Jun 2025 04:39:54 +0000 (12:39 +0800)]
CUDA: add mean operation (llama/14313)

* CUDA: add mean operation

* add back sum_rows_f32_cuda

* Review: early exit if col!=0

3 months agoAdd support for VK_EXT_debug_utils to add labels to Vulkan objects. (llama/13792)
Markus Tavenrath [Sat, 21 Jun 2025 06:17:12 +0000 (08:17 +0200)]
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (llama/13792)

* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.

* remove #ifdef for debug utils and add queue marker.

3 months agometal : fix thread-safety (llama/14300)
Georgi Gerganov [Sat, 21 Jun 2025 05:04:18 +0000 (08:04 +0300)]
metal : fix thread-safety (llama/14300)

ggml-ci

3 months agoggml-cpu : "align corners" for bilinear upscale/downscale (#1285)
Acly [Tue, 1 Jul 2025 07:11:00 +0000 (09:11 +0200)]
ggml-cpu : "align corners" for bilinear upscale/downscale (#1285)

* add "align corners" mode for bilinear upscale, and allow downscaling
* add ggml_interpolate, deprecate ggml_upscale_ext, pass in align-corners as bit-flag
* test-backend-ops: replace ggml_upscale_ext with ggml_interpolate, add test cases for downscale and align-corners

3 months agobuild : fix build with clang-cl on Windows (#1284)
Acly [Wed, 25 Jun 2025 10:16:22 +0000 (12:16 +0200)]
build : fix build with clang-cl on Windows (#1284)

* build : fix building tests with clang-cl on Windows

- clang-cl.exe (clang with MSVC CLI) doesn't like the space in /STACK option
- cl.exe (MSVC) works either way

* build : fix MSVC compiler warnings in test-roll.cpp

3 months agoggml-quants : rename best_mad to best_error (#1283)
Daniel Bevenius [Tue, 24 Jun 2025 04:10:16 +0000 (06:10 +0200)]
ggml-quants : rename best_mad to best_error (#1283)

This commit renames the variable `best_mad` to `best_error` in the
`make_qkx2_quants` function.

The motivation for this is that the name `best_mad` can be somewhat
confusing if mean absolute deviation (MAD) is not in use.

3 months agotests : cleanup old tests (#1282)
Georgi Gerganov [Sat, 21 Jun 2025 06:21:28 +0000 (09:21 +0300)]
tests : cleanup old tests (#1282)

ggml-ci

3 months agosync : llama.cpp
Georgi Gerganov [Fri, 20 Jun 2025 18:04:04 +0000 (21:04 +0300)]
sync : llama.cpp

ggml-ci

3 months agoCUDA: add conv_2d_transpose (llama/14287)
Aman Gupta [Fri, 20 Jun 2025 14:48:24 +0000 (22:48 +0800)]
CUDA: add conv_2d_transpose (llama/14287)

* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts

3 months agosycl: add usage of enqueue_functions extension (llama/14244)
Nicolò Scipione [Fri, 20 Jun 2025 13:07:21 +0000 (15:07 +0200)]
sycl: add usage of enqueue_functions extension (llama/14244)

* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <redacted>
3 months agoImplement GGML_CPU_ALL_VARIANTS for PowerPC (llama/14286)
Christian Kastner [Fri, 20 Jun 2025 12:17:32 +0000 (12:17 +0000)]
Implement GGML_CPU_ALL_VARIANTS for PowerPC (llama/14286)

* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <redacted>
3 months agocuda : synchronize graph capture and cublas handle destruction (llama/14288)
Diego Devesa [Fri, 20 Jun 2025 11:57:36 +0000 (04:57 -0700)]
cuda : synchronize graph capture and cublas handle destruction (llama/14288)

Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread

3 months agoggml : fix repack work size for mul_mat_id (llama/14292)
Georgi Gerganov [Fri, 20 Jun 2025 08:19:15 +0000 (11:19 +0300)]
ggml : fix repack work size for mul_mat_id (llama/14292)

ggml-ci

3 months agoggml: Update KleidiAI to v1.9.0 (llama/14277)
Charles Xu [Fri, 20 Jun 2025 07:51:01 +0000 (09:51 +0200)]
ggml: Update KleidiAI to v1.9.0 (llama/14277)

3 months agoCUDA: add conv_2d_dw (llama/14265)
Aman Gupta [Fri, 20 Jun 2025 01:50:24 +0000 (09:50 +0800)]
CUDA: add conv_2d_dw (llama/14265)

* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const

3 months agoggml-cpu : remove unnecesary arm feature detection (llama/14281)
Diego Devesa [Thu, 19 Jun 2025 19:24:14 +0000 (12:24 -0700)]
ggml-cpu : remove unnecesary arm feature detection (llama/14281)

Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.

3 months agobuild : suppress gcc15 compile warnings (llama/14261)
fanyang [Thu, 19 Jun 2025 12:49:48 +0000 (20:49 +0800)]
build : suppress gcc15 compile warnings (llama/14261)

* Change _contains_any() substrs to std::string_view and fix the find comparison logic.

3 months agosycl: Cleanup codepaths in Get Rows in sycl backend (llama/14215)
Anton Mitkov [Thu, 19 Jun 2025 10:40:21 +0000 (11:40 +0100)]
sycl: Cleanup codepaths in Get Rows in sycl backend (llama/14215)

Addresses unused reorder path

3 months agollamafile : support s390x SIMD instruction set (llama/14273)
Aaron Teo [Thu, 19 Jun 2025 09:48:54 +0000 (17:48 +0800)]
llamafile : support s390x SIMD instruction set (llama/14273)

3 months agoVulkan: Set device max size for host memory to avoid OOM warning and fallback to...
0cc4m [Thu, 19 Jun 2025 07:15:42 +0000 (09:15 +0200)]
Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (llama/14249)

3 months agometal : add mean kernel (llama/14267)
Georgi Gerganov [Thu, 19 Jun 2025 05:05:21 +0000 (08:05 +0300)]
metal : add mean kernel (llama/14267)

* metal : add mean kernel

ggml-ci

* cont : dedup implementation

ggml-ci

3 months agoggml-cpu: reduce asm calls for hsum (llama/14037)
Aaron Teo [Wed, 18 Jun 2025 17:10:08 +0000 (01:10 +0800)]
ggml-cpu: reduce asm calls for hsum (llama/14037)

Signed-off-by: Aaron Teo <redacted>
3 months agoggml-cpu: fix uncaught underscore terminators (llama/14023)
Aaron Teo [Wed, 18 Jun 2025 17:06:49 +0000 (01:06 +0800)]
ggml-cpu: fix uncaught underscore terminators (llama/14023)

Signed-off-by: Aaron Teo <redacted>
3 months agoggml: Add Apple support for GGML_CPU_ALL_VARIANTS (llama/14258)
Charles Xu [Wed, 18 Jun 2025 11:40:07 +0000 (13:40 +0200)]
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (llama/14258)

3 months agoAdd `ggml_roll` (#1274) upstream/0.0.2199
Acly [Wed, 18 Jun 2025 11:34:50 +0000 (13:34 +0200)]
Add `ggml_roll` (#1274)

* ggml : add ggml_roll

* use set/get_op_params & std::min

3 months agosync : whisper.cpp
Georgi Gerganov [Wed, 18 Jun 2025 09:41:12 +0000 (12:41 +0300)]
sync : whisper.cpp

3 months agosync : llama.cpp
Georgi Gerganov [Wed, 18 Jun 2025 07:00:11 +0000 (10:00 +0300)]
sync : llama.cpp

ggml-ci

3 months agocmake: remove shader-gen step-targets from ggml-vulkan (llama/14226)
bandoti [Tue, 17 Jun 2025 20:33:25 +0000 (17:33 -0300)]
cmake: remove shader-gen step-targets from ggml-vulkan (llama/14226)

* Remove step-targets from vulkan-shaders-gen

* Unset DESTDIR when building vulkan-shaders-gen

3 months agoggml-cpu : remove the weak alias trick (llama/14221)
xctan [Tue, 17 Jun 2025 09:58:32 +0000 (17:58 +0800)]
ggml-cpu : remove the weak alias trick (llama/14221)

3 months agomusa: fix build warning (unused variable) (llama/14231)
R0CKSTAR [Tue, 17 Jun 2025 09:48:08 +0000 (17:48 +0800)]
musa: fix build warning (unused variable) (llama/14231)

Signed-off-by: Xiaodong Ye <redacted>
3 months agollama : add thread safety test (llama/14035)
Diego Devesa [Mon, 16 Jun 2025 15:11:43 +0000 (08:11 -0700)]
llama : add thread safety test (llama/14035)

* llama : add thread safety test

* llamafile : remove global state

* llama : better LLAMA_SPLIT_MODE_NONE logic

when main_gpu < 0 GPU devices are not used

---------

Co-authored-by: Georgi Gerganov <redacted>
3 months agocmake: clean up external project logic for vulkan-shaders-gen (llama/14179)
bandoti [Mon, 16 Jun 2025 13:32:13 +0000 (10:32 -0300)]
cmake: clean up external project logic for vulkan-shaders-gen (llama/14179)

* Remove install step for vulkan-shaders-gen

* Add install step to normalize msvc with make

* Regenerate modified shaders at build-time

3 months agoHIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202)
uvos [Mon, 16 Jun 2025 11:47:38 +0000 (13:47 +0200)]
HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202)

3 months agoggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206)
Charles Xu [Mon, 16 Jun 2025 09:47:57 +0000 (11:47 +0200)]
ggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206)

3 months agovulkan: mutex around vkQueueSubmit (llama/14127)
Jeff Bolz [Mon, 16 Jun 2025 06:21:08 +0000 (00:21 -0600)]
vulkan: mutex around vkQueueSubmit (llama/14127)

This fixes the remaining crash in test-thread-safety on my system.

3 months agoggml-cpu : rework weak alias on apple targets (llama/14146)
xctan [Mon, 16 Jun 2025 05:54:15 +0000 (13:54 +0800)]
ggml-cpu : rework weak alias on apple targets (llama/14146)

* ggml-cpu : rework weak alias on apple targets

* fix powerpc detection

* fix ppc detection

* fix powerpc detection on darwin

3 months agoCUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14196)
uvos [Sun, 15 Jun 2025 15:30:13 +0000 (17:30 +0200)]
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14196)

3 months agoHIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama...
uvos [Sun, 15 Jun 2025 13:45:27 +0000 (15:45 +0200)]
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (llama/14183)

3 months agosycl: Adding additional cpy dbg print output (llama/14034)
Anton Mitkov [Fri, 13 Jun 2025 07:51:39 +0000 (08:51 +0100)]
sycl: Adding additional cpy dbg print output (llama/14034)

3 months agoSYCL: Bump oneMath commit (llama/14152)
Ewan Crawford [Fri, 13 Jun 2025 07:45:37 +0000 (08:45 +0100)]
SYCL: Bump oneMath commit (llama/14152)

Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.

With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.

```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2

UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        operator()
        Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154

Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
  in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```

3 months agosycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125)
Anton Mitkov [Thu, 12 Jun 2025 13:15:11 +0000 (14:15 +0100)]
sycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125)

3 months agocmake : handle whitepsaces in path during metal build (llama/14126)
Georgi Gerganov [Thu, 12 Jun 2025 07:14:24 +0000 (10:14 +0300)]
cmake : handle whitepsaces in path during metal build (llama/14126)

* cmake : handle whitepsaces in path during metal build

ggml-ci

* cont : proper fix

ggml-ci

---------

Co-authored-by: Daniel Bevenius <redacted>
3 months agoImplement GGML_CPU_ALL_VARIANTS for ARM (llama/14080)
Christian Kastner [Wed, 11 Jun 2025 19:07:44 +0000 (19:07 +0000)]
Implement GGML_CPU_ALL_VARIANTS for ARM (llama/14080)

* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.

3 months agovulkan: Better thread-safety for command pools/buffers (llama/14116)
Jeff Bolz [Wed, 11 Jun 2025 14:48:52 +0000 (09:48 -0500)]
vulkan: Better thread-safety for command pools/buffers (llama/14116)

This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.