git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

commit | commitdiff | tree

Akarshan Biswas [Fri, 7 Feb 2025 09:27:53 +0000 (14:57 +0530)]

SYCL: remove XMX info from print devices (llama/11712)

commit | commitdiff | tree

Jinyang He [Fri, 7 Feb 2025 07:38:31 +0000 (15:38 +0800)]

ggml : optimize and build warning fix for LoongArch (llama/11709)

* ggml : optimize convert f32<->f16 for loongarch_asx

* ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16

* ggml : Fix warnings when run cpu CI locally on LoongArch

commit | commitdiff | tree

Akarshan Biswas [Thu, 6 Feb 2025 11:42:35 +0000 (17:12 +0530)]

SYCL: Adjust support condition for norm operators (llama/11674)

SYCL does not support non contiguous tensors for norm operations

commit | commitdiff | tree

junchao-zhao [Thu, 6 Feb 2025 09:20:00 +0000 (17:20 +0800)]

ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

commit | commitdiff | tree

Jeff Bolz [Thu, 6 Feb 2025 06:15:30 +0000 (00:15 -0600)]

vulkan: optimize coopmat2 iq2/iq3 callbacks (llama/11521)

* vulkan: optimize coopmat2 iq2/iq3 callbacks

* build: trigger CI on GLSL compute shader changes

commit | commitdiff | tree

Rémy O [Thu, 6 Feb 2025 06:09:59 +0000 (07:09 +0100)]

vulkan: initial support for IQ4_XS quantization (llama/11501)

commit | commitdiff | tree

Jeff Bolz [Thu, 6 Feb 2025 06:02:18 +0000 (00:02 -0600)]

vulkan: use smaller combined allocations to avoid fragmentation (llama/11551)

commit | commitdiff | tree

Charles Duffy [Thu, 6 Feb 2025 01:52:31 +0000 (19:52 -0600)]

metal : avoid breaking build when metal API predates TARGET_OS_VISION (llama/11690)

Avoids breakage in nix flake build introduced by b0569130c5e9c671152c913d82803b7c2f014ff9

commit | commitdiff | tree

Georgi Gerganov [Wed, 5 Feb 2025 08:57:42 +0000 (10:57 +0200)]

metal : adjust support conditions for norm operators (llama/11671)

cont #11659

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Wed, 5 Feb 2025 07:58:31 +0000 (08:58 +0100)]

CUDA: support for mat. mul. with ne03 != ne13 (llama/11656)

commit | commitdiff | tree

Johannes Gäßler [Tue, 4 Feb 2025 21:21:42 +0000 (22:21 +0100)]

CUDA: non-contiguous (RMS) norm support (llama/11659)

* CUDA: non-contiguous (RMS) norm support

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

fxzjshm [Tue, 4 Feb 2025 18:18:38 +0000 (02:18 +0800)]

HIP: force max threads per block to be 1024 (llama/11621)

Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Tue, 4 Feb 2025 11:07:18 +0000 (19:07 +0800)]

metal : use residency set for other platforms (llama/11648)

commit | commitdiff | tree

Patrick Peng [Thu, 6 Feb 2025 14:29:13 +0000 (09:29 -0500)]

rpc: fix known RCE in rpc-server (ggml/1103)

Add bounds checking in `rpc_server::copy_tensor` to prevent out-of-bounds writes
+ Check if `(uint8_t *)dst->data + ggml_nbytes(src)` remains within the destination buffer’s allocated region.

commit | commitdiff | tree

masahji [Tue, 25 Feb 2025 09:39:33 +0000 (01:39 -0800)]

stream : add beam size parameter(#2836)

* feat: Add beam size parameter to stream.cpp for beam search configuration

* feat: Add beam size parameter to whisper full params in stream example

* fix: Remove duplicate beam search size assignment in server.cpp

commit | commitdiff | tree

Thomas Fitzsimmons [Tue, 25 Feb 2025 09:38:13 +0000 (09:38 +0000)]

whisper : restore big endian support (#2816)

* whisper : fix BYTESWAP whitespace

* whisper : make byteswap useable with C++17

* cmake : define WHISPER_BIG_ENDIAN for big-endian targets

* ci : fix (again) arm64 build fails

* docker : attempt fixing arm64 build on ci

* qemu v7.0.0-28

[imported from
https://github.com/ggml-org/llama.cpp
/commit/818a340ea8be55b3706e1772527cb8738e90a8c7
(#11895)]

---------

Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Judd [Thu, 6 Feb 2025 07:37:21 +0000 (15:37 +0800)]

Fixes for Windows (#2790)

Fixes for Windows:

* MSVC default to utf-8 without BOM.
* Console output code page changed to utf-8.

---------

Co-authored-by: Judd <redacted>

commit | commitdiff | tree

midnight [Wed, 5 Feb 2025 12:41:10 +0000 (04:41 -0800)]

cmake : fix compile assumptions for power9/etc (#2777)

* Add small comment re: VSX to readme

Co-authored-by: midnight <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 11:03:40 +0000 (13:03 +0200)]

authors : update

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 11:03:09 +0000 (13:03 +0200)]

sync : ggml

commit | commitdiff | tree

Christian Kastner [Mon, 3 Feb 2025 23:17:15 +0000 (00:17 +0100)]

cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)

This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.

This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 08:50:10 +0000 (10:50 +0200)]

readme : add maintenance roadmap

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 07:30:08 +0000 (09:30 +0200)]

ci : add stalebot

commit | commitdiff | tree

billyct [Mon, 3 Feb 2025 20:49:06 +0000 (04:49 +0800)]

node : add max_len params in node addon (#2760)

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 20:42:26 +0000 (22:42 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

mgrachten [Mon, 3 Feb 2025 20:36:32 +0000 (21:36 +0100)]

coreml : always convert to "neuralnetwork" (#2770)

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 19:17:33 +0000 (21:17 +0200)]

ci : more git

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 18:12:37 +0000 (20:12 +0200)]

ci : install git

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 17:50:24 +0000 (19:50 +0200)]

ci : use ubuntu-22.04 instead of ubuntu-latest

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 14:24:38 +0000 (16:24 +0200)]

cmake : sync cmake scripts

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 14:05:34 +0000 (16:05 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 14:05:27 +0000 (16:05 +0200)]

scripts : fix sync paths

commit | commitdiff | tree

Johannes Gäßler [Mon, 3 Feb 2025 12:25:56 +0000 (13:25 +0100)]

CUDA: fix Volta FlashAttention logic (llama/11615)

commit | commitdiff | tree

Johannes Gäßler [Sun, 2 Feb 2025 22:48:29 +0000 (23:48 +0100)]

HIP: fix flash_attn_stream_k_fixup warning (llama/11604)

commit | commitdiff | tree

uvos [Sun, 2 Feb 2025 21:40:09 +0000 (22:40 +0100)]

CUDA/HIP: add support for selectable warp size to mmv (llama/11519)

CUDA/HIP: add support for selectable warp size to mmv

commit | commitdiff | tree

uvos [Sun, 2 Feb 2025 21:08:05 +0000 (22:08 +0100)]

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (llama/11601)

This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly

commit | commitdiff | tree

Johannes Gäßler [Sun, 2 Feb 2025 18:31:09 +0000 (19:31 +0100)]

CUDA: use mma PTX instructions for FlashAttention (llama/11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 17:12:40 +0000 (17:12 +0000)]

`ci`: use sccache on windows instead of ccache (llama/11545)

* Use sccache on ci for windows

* Detect sccache in cmake

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 18:36:00 +0000 (19:36 +0100)]

HIP: require at least HIP 5.5

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 18:12:42 +0000 (19:12 +0100)]

HIP: Prepare reduction operators for wave 64

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 16:46:23 +0000 (17:46 +0100)]

CUDA/HIP: add warp_size to cuda_device_info

commit | commitdiff | tree

Rémy Oudompheng [Wed, 29 Jan 2025 17:29:39 +0000 (18:29 +0100)]

vulkan: implement initial support for IQ2 and IQ3 quantizations (llama/11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <redacted>

commit | commitdiff | tree

Jeff Bolz [Wed, 29 Jan 2025 15:26:50 +0000 (09:26 -0600)]

vulkan: Catch pipeline creation failure and print an error message (llama/11436)

* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging

commit | commitdiff | tree

uvos [Tue, 28 Jan 2025 22:06:32 +0000 (23:06 +0100)]

HIP: Supress transformation warning in softmax.cu

loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

commit | commitdiff | tree

Nikita Sarychev [Tue, 28 Jan 2025 15:42:20 +0000 (07:42 -0800)]

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (llama/11080)

This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

commit | commitdiff | tree

someone13574 [Tue, 28 Jan 2025 14:15:34 +0000 (09:15 -0500)]

cmake : don't fail on `GGML_CPU=OFF` (llama/11457)

commit | commitdiff | tree

Akarshan Biswas [Tue, 28 Jan 2025 09:56:58 +0000 (15:26 +0530)]

SYCL : SOFTMAX F16 mask support and other fixes (llama/11261)

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases

commit | commitdiff | tree

Haus1 [Mon, 27 Jan 2025 13:58:17 +0000 (08:58 -0500)]

AMD: parse the architecture as supplied by gcnArchName (llama/11244)

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

commit | commitdiff | tree

Ihar Hrachyshka [Mon, 27 Jan 2025 07:41:59 +0000 (02:41 -0500)]

metal: Handle null returned from MTLCreateSystemDefaultDevice() (llama/11441)

This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).

commit | commitdiff | tree

Georgi Gerganov [Sun, 26 Jan 2025 18:06:16 +0000 (20:06 +0200)]

metal : use residency sets (llama/11427)

* metal : use residency sets

ggml-ci

* metal : restore commandBufferWithUnretainedReferences calls [no ci]

* metal : release descriptors

ggml-ci

* metal : check env GGML_METAL_NO_RESIDENCY

ggml-ci

* metal : fix build + clean-up

ggml-ci

commit | commitdiff | tree

bandoti [Sun, 26 Jan 2025 16:07:48 +0000 (12:07 -0400)]

cmake: add ggml find package (llama/11369)

* Add initial ggml cmake package

* Add build numbers to ggml find-package

* Expand variables with GGML_ prefix

* Guard against adding to cache variable twice

* Add git to msys2 workflow

* Handle ggml-cpu-* variants

* Link ggml/ggml-base libraries to their targets

* Replace main-cmake-pkg with simple-cmake-pkg

* Interface features require c_std_90

* Fix typo

* Removed unnecessary bracket from status message

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <redacted>
* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Jeff Bolz [Sat, 25 Jan 2025 21:29:57 +0000 (15:29 -0600)]

vulkan: compile shaders on-demand (llama/11406)

Reduce first-run startup time and memory consumption.

Should fix #11339.

commit | commitdiff | tree

uvos [Sat, 25 Jan 2025 20:01:12 +0000 (21:01 +0100)]

Hip: disable VMM on hip as it seams that it dosent work in some configurations (llama/11420)

commit | commitdiff | tree

uvos [Fri, 24 Jan 2025 23:02:23 +0000 (00:02 +0100)]

hip : Add hipGraph and VMM support to ROCM (llama/11362)

* Add hipGraph support

* Enable VMM on rocm

commit | commitdiff | tree

Johannes Gäßler [Fri, 24 Jan 2025 20:02:43 +0000 (21:02 +0100)]

CUDA: fix FP16 cuBLAS GEMM (llama/11396)

commit | commitdiff | tree

uvos [Fri, 24 Jan 2025 16:50:49 +0000 (17:50 +0100)]

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (llama/11356)

commit | commitdiff | tree

Johannes Gäßler [Fri, 24 Jan 2025 11:38:31 +0000 (12:38 +0100)]

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (llama/11380)

commit | commitdiff | tree

Bernhard M. Wiedemann [Fri, 24 Jan 2025 11:21:35 +0000 (12:21 +0100)]

cmake : avoid -march=native when reproducible build is wanted (llama/11366)

See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317
This patch was done while working on reproducible builds for openSUSE.

commit | commitdiff | tree

amd-dwang [Thu, 23 Jan 2025 07:14:28 +0000 (15:14 +0800)]

Vulkan-run-test: fix mmq_wg_denoms (llama/11343)

There should be a copy-and-paste error here.

*mmq_wg_denoms should be used together with *warptile_mmq, instead of
wg_denoms.

commit | commitdiff | tree

Jeff Bolz [Thu, 23 Jan 2025 07:07:50 +0000 (01:07 -0600)]

vulkan: sort shaders for more deterministic binary (llama/11315)

Fixes #11306.

commit | commitdiff | tree

Jeff Bolz [Thu, 23 Jan 2025 07:01:17 +0000 (01:01 -0600)]

vulkan: fix diag_mask_inf (llama/11323)

With robustbufferaccess disabled, this shader was showing OOB stores. There
is a bounds check in the code, but the workgrouop dimensions were reversed vs
CUDA and it was running the wrong number of threads. So fix the workgroup
dimensions and disable robustness for this pipeline.

commit | commitdiff | tree

Radoslav Gerganov [Tue, 21 Jan 2025 13:06:41 +0000 (15:06 +0200)]

rpc : better caching of the base buffer pointer (llama/11331)

There is no need to use map, just store the base pointer in the buffer
context.

commit | commitdiff | tree

Georgi Gerganov [Tue, 21 Jan 2025 06:48:13 +0000 (08:48 +0200)]

metal : fix out-of-bounds write (llama/11314)

ggml-ci

commit | commitdiff | tree

Jeff Bolz [Mon, 20 Jan 2025 16:38:32 +0000 (10:38 -0600)]

vulkan: fix coopmat2 validation failures (llama/11284)

mul mat and flash attention shaders were loading f32 types directly into
A/B matrices, which happens to work but is technically invalid usage.
For FA, we can load it as an Accumulator matrix and convert and this
is not in the inner loop and is cheap enough. For mul mat, it's more
efficient to do this conversion in a separate pass and have the input(s)
be f16.

coopmat2 requires SPIR-V 1.6 (related using to LocalSizeId). LocalSizeId
requires maintenance4 be enabled, and SPIR-V 1.6 requires Vulkan 1.3.

commit | commitdiff | tree

Nicolò Scipione [Sun, 19 Jan 2025 13:33:34 +0000 (14:33 +0100)]

SYCL: Introducing memory host pool (llama/11251)

* Implement host pool for matrix_info

Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp

* Remove unnecessary headers and cast

* Reorder member variable to avoid warning on initialization

* Formatting

* Remove unused variable

* Address PR review feedback - remove warning

---------

Signed-off-by: nscipione <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 18 Jan 2025 14:18:15 +0000 (16:18 +0200)]

cmake : add sanitizer flags for llama.cpp (llama/11279)

* cmake : add sanitizer flags for llama.cpp

ggml-ci

* tests : fix compile warnings

ggml-ci

* cmake : move sanitizer flags to llama_add_compile_flags

ggml-ci

* cmake : move llama.cpp compile flags to top level lists

ggml-ci

* cmake : apply only sanitizer flags at top level

ggml-ci

* tests : fix gguf context use in same_tensor_data

* gguf-test: tensor data comparison

* dummy : trigger ggml-ci

* unicode : silence gcc warnings

ggml-ci

* ci : use sanitizer builds only in Debug mode

ggml-ci

* cmake : add status messages [no ci]

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Jeff Bolz [Sat, 18 Jan 2025 08:26:50 +0000 (02:26 -0600)]

vulkan: fix coopmat2 flash attention for non-contiguous inputs (llama/11281)

Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.

Add noncontiguous FA tests in test-backend-ops.

Fixes #11268.

commit | commitdiff | tree

Radoslav Gerganov [Fri, 17 Jan 2025 08:57:09 +0000 (10:57 +0200)]

rpc : early register backend devices (llama/11262)

Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609

commit | commitdiff | tree

Jeff Bolz [Thu, 16 Jan 2025 21:47:10 +0000 (15:47 -0600)]

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (llama/11166)

* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl

Shaders are based on cpy.cu.

* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32

* ggml: copy q->f32 assumes some contiguity in the destination

commit | commitdiff | tree

Jeff Bolz [Thu, 16 Jan 2025 21:23:49 +0000 (15:23 -0600)]

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (llama/11206)

Do masking on whole dwords, fetch all scales at once.

commit | commitdiff | tree

Jeff Bolz [Thu, 16 Jan 2025 21:16:39 +0000 (15:16 -0600)]

vulkan: optimize coopmat2 q2_k dequant function (llama/11130)

commit | commitdiff | tree

Johannes Gäßler [Thu, 16 Jan 2025 15:43:38 +0000 (16:43 +0100)]

CUDA: backwards pass for misc. ops, add tests (llama/11257)

* CUDA: backwards pass for misc. ops, add tests

* remove restrict from pointers

commit | commitdiff | tree

fj-y-saito [Thu, 16 Jan 2025 09:11:49 +0000 (18:11 +0900)]

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (llama/11227)

* Add SVE support for q4_K_q8_K

* Update ggml/src/ggml-cpu/ggml-cpu-quants.c

change to use K_SCALE_SIZE

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Eve [Wed, 15 Jan 2025 19:50:13 +0000 (19:50 +0000)]

vulkan: scale caching for k quants + misc fixes (llama/11081)

* q6_k scale caching

* 16 bit unpack

* q4_k test (slow)

* revert it

* q3_k

* q2_k

* little stuff

* try precalculating products of a and q2_k scales

* Revert "try precalculating products of a and q2_k scales"

This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b.

* unpack should be u16, add vim swap to gitignore (about time)

* better q4_k scales

* q5_k

* better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations

* q2_k better dequant

* q3_k optimizations

* q3_k use hmask simd from cpu avx version

* make the caches happy

* q3_k separate out calculation

* q2_k separate out

* little stuff

* use calc_superblock everywhere

* q2_k optimize scale calculation

* more barriers

commit | commitdiff | tree

Junil Kim [Wed, 15 Jan 2025 13:17:42 +0000 (22:17 +0900)]

fix: ggml: fix vulkan-shaders-gen build (llama/10448)

* fix: ggml: fix vulkan-shaders-gen build

The vulkan-shaders-gen target was not being built correctly
in case of cross-compilation.
Other outputs need to be built for the cross compile target,
but vulkan-shaders-gen needs to be built for the host.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

- Add GGML_SHADERS_GEN_TOOLCHAIN CMake option.
- Auto-detect host toolchain if not set.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

Use configure_file to generate host_toolchain.cmake from template

* fix: ggml: Fix compile error

Fix compile error not finding vulkan-shaders-gen

* fix: vulkan-shaders-gen build and path handling

Fix build issues with vulkan-shaders-gen:
- Add target dependency for correct build order
- Use CMAKE_HOST_SYSTEM_NAME for executable suffix
- Fix MSVC output directory in host toolchain
- Normalize path handling for cross-compilation

* fix: improve host compiler detection in vulkan shader build

Improve host compiler detection for vulkan shader generation:
- Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches
- Consolidate compiler detection logic
- Fix Windows-specific MSVC detection
- Ensure correct compiler search in cross-compilation

* refactor: Simplify CMake function for detecting host compiler

Simplified the CMake function to improve the process of detecting the host compiler.

* fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt

Since `vulkan-shader-gen.cpp` only requires the `glslc` executable
and not the Vulkan headers or libraries, CMakeLists.txt needs to
be corrected.
(See: ecc93d0558fc3ecb8a5af69d2ece02fae4710ade)

* refactor: Rename host_toolchain.cmake.in

- Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in

* refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

commit | commitdiff | tree

Johannes Gäßler [Wed, 15 Jan 2025 11:51:37 +0000 (12:51 +0100)]

RoPE: fix back, CUDA support for back + noncont. (llama/11240)

* RoPE: fix back, CUDA support for back + noncont.

* fix comments reg. non-cont. RoPE support [no-ci]

commit | commitdiff | tree

Akarshan Biswas [Wed, 15 Jan 2025 03:20:17 +0000 (08:50 +0530)]

SYCL: Add gated linear attention kernel (llama/11175)

* SYCL: Add Gated Linear attention kernel

* glahpp: add a space at the end of file

* gla: Put the barrier inside the main logic loop

commit | commitdiff | tree

William Tambellini [Thu, 23 Jan 2025 19:59:08 +0000 (11:59 -0800)]

ggml : add option to not print stack on abort (ggml/1081)

* Add option to not print stack on abort

Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.

* Update ggml/src/ggml.c

---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

issixx [Fri, 17 Jan 2025 12:29:08 +0000 (21:29 +0900)]

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)

some threads kept looping and failed to terminate properly after an abort during CPU execution.

Co-authored-by: issi <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 14:32:48 +0000 (16:32 +0200)]

ci : dummy commit to trigger CI

commit | commitdiff | tree

KITAITI Makoto [Tue, 21 Jan 2025 07:39:54 +0000 (16:39 +0900)]

ruby : Make context accept initial parameters, API to retrieve a segment and more (#2749)

* Fix type signature for Whisper.log_set

* Use cache file for model when offline

* Extract ruby_whisper_transcribe() into a file

* Extract Whisper::Error

* Use FileList for ext/*.{c,cpp,h}

* Extract Whisper::Segment

* Extract Whisper::Model

* Extract Whisper::Params

* Extract Whisper::Context

* Extract log_callback function

* Write base code in C rather than C++

* Use chdir instead of Dir.chdir in Rakefile

* Define alloc func for Whisper::Model

* Define Whisper::Params' calback and user data reader

* Add test for Whisper::Params.new with keyword arguments

* Make Whisper::Params.new accept keyword arguments

* Update type signatures

* Update README

* Update CLEAN targets

* Fix document comment for Whisper::Params#new_segment_callback=

* Use macro to define params

* Fix dependency of build task

* Set Whisper.finalize_log_callback visibility to private

* Make Whisper::Context#full and full_parallel return self

* Add test for Whisper::Context#full_get_segment

* Add Whisper::Context#full_get_segment

* Update signatures

* Update README

* Fix signature

* Resplace #initialize with .new in signature file [skip ci]

* Fix potential overflow

commit | commitdiff | tree

Corey Earwood [Sat, 18 Jan 2025 10:06:06 +0000 (03:06 -0700)]

whisper.objc : fix build and CI

commit | commitdiff | tree

Georgi Gerganov [Tue, 14 Jan 2025 07:53:50 +0000 (09:53 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Tue, 14 Jan 2025 07:50:06 +0000 (09:50 +0200)]

sync : ggml

commit | commitdiff | tree

Johannes Gäßler [Tue, 14 Jan 2025 07:31:07 +0000 (09:31 +0200)]

GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)

ggml-ci

commit | commitdiff | tree

lhez [Tue, 14 Jan 2025 07:24:03 +0000 (09:24 +0200)]

ggml : add opencl backend (skip) (llama/10693)

---------

Co-authored-by: Skyler Szot <redacted>
Co-authored-by: Shangqing Gu <redacted>
Co-authored-by: Alexander Angus <redacted>
Co-authored-by: Hongqiang Wang <redacted>
Co-authored-by: Max Krasnyansky <redacted>

commit | commitdiff | tree

Andreas Kieslinger [Mon, 13 Jan 2025 15:45:53 +0000 (16:45 +0100)]

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (llama/11042)

* Refactor: Moves cuda graph executable update step to separate function.

* Refactor: Moves cuda graph update check to separate function.

* Refactor: Moves cuda graph maintenance (update or adjusting copy parameters) to separate function for improved readability.

* Fix: Adds missing reference to maintain_cuda_graph() definition.

* Refactor: Improves structure and abstractions by moving CUDA graph evaluation and capture to its own function.

* Refactor: Moves node graph checks and copy ops into individual function for improved readability.

* Refactor: Removes code permanently excluded from compilation to increase readability.

* Style: Adds missing newline

* Style: Consolidates several neighboring '#ifdef USE_CUDA_GRAPH' into a single one

* Refactor: Makes 'cuda_graph_update_required' a local variable

* remove double lines between functions

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Mon, 13 Jan 2025 11:31:41 +0000 (13:31 +0200)]

ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (llama/11211)

Build fails when using HIP and GGML_BACKEND_DL:
```
/usr/bin/ld: ../ggml/src/libggml.so: undefined reference to `ggml_backend_cuda_reg'
collect2: error: ld returned 1 exit status
```
This patch fixes this.

commit | commitdiff | tree

0cc4m [Fri, 10 Jan 2025 05:39:33 +0000 (06:39 +0100)]

Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (llama/11161)

* Vulkan: Remove float16 use in shaders

* Fix validation error about subgroup_size_control extension

commit | commitdiff | tree

Molly Sophia [Fri, 10 Jan 2025 01:58:08 +0000 (09:58 +0800)]

llama: add support for QRWKV6 model architecture (llama/11001)

llama: add support for QRWKV6 model architecture (llama/11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <redacted>
* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <redacted>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <redacted>
* Fix some typos

Signed-off-by: Molly Sophia <redacted>
* code format changes

Signed-off-by: Molly Sophia <redacted>
* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <redacted>
* Fix cuda warning

Signed-off-by: Molly Sophia <redacted>
* Update README.md

Signed-off-by: Molly Sophia <redacted>
* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <redacted>
* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <redacted>
* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <redacted>
Co-authored-by: compilade <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: compilade <redacted>

commit | commitdiff | tree

Akarshan Biswas [Fri, 10 Jan 2025 00:13:03 +0000 (05:43 +0530)]

SYCL: Refactor ggml_sycl_compute_forward (llama/11121)

* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability

commit | commitdiff | tree

hydai [Wed, 8 Jan 2025 20:03:28 +0000 (04:03 +0800)]

fix: add missing msg in static_assert (llama/11143)

Signed-off-by: hydai <redacted>

commit | commitdiff | tree

amritahs-ibm [Wed, 8 Jan 2025 10:54:19 +0000 (16:24 +0530)]

llamafile : ppc64le MMA INT8 implementation (llama/10912)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

Mathieu Baudier [Wed, 8 Jan 2025 08:18:13 +0000 (09:18 +0100)]

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (llama/11117)

* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.

* Perform Vulkan extensions checks in a more sensible order

* Remove unnecessary #ifdef directive

commit | commitdiff | tree

ag2s20150909 [Wed, 8 Jan 2025 08:17:29 +0000 (16:17 +0800)]

fix: Vulkan shader gen binary path when Cross-compiling (llama/11096)

* fix: Vulkan shader gen binary path when cross compiling

commit | commitdiff | tree

Johannes Gäßler [Tue, 7 Jan 2025 17:01:58 +0000 (18:01 +0100)]

GGUF: C++ refactor, backend support, misc fixes (llama/11030)

* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types

commit | commitdiff | tree

Diego Devesa [Tue, 7 Jan 2025 15:11:57 +0000 (16:11 +0100)]

ggml-backend : only offload from host buffers (fix) (llama/11124)

commit | commitdiff | tree

Diego Devesa [Tue, 7 Jan 2025 11:38:05 +0000 (12:38 +0100)]

ggml-backend : only offload from host buffers (llama/11120)

commit | commitdiff | tree

Radoslav Gerganov [Tue, 7 Jan 2025 06:37:02 +0000 (08:37 +0200)]

rpc : code cleanup (llama/11107)

Remove duplicated macros, use GGML_LOG_ERROR for errors

commit | commitdiff | tree

Akarshan Biswas [Tue, 7 Jan 2025 06:26:07 +0000 (11:56 +0530)]

SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (llama/11087)

* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6

* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"

This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52.

* Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6

Packaging of ggerganov/whisper.cpp

RSS Atom