git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

commit | commitdiff | tree

Sigbjørn Skjæret [Sun, 16 Mar 2025 17:46:36 +0000 (18:46 +0100)]

llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Mar 2025 17:29:36 +0000 (19:29 +0200)]

context : fix init of n_outputs (#12397)

ggml-ci

commit | commitdiff | tree

Daniel Bevenius [Sun, 16 Mar 2025 17:22:05 +0000 (18:22 +0100)]

ci : add --symlinks to xcframework zip command (#12409)

This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option, the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```

Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377

commit | commitdiff | tree

marcoStocchi [Sat, 15 Mar 2025 16:23:11 +0000 (17:23 +0100)]

llama-tts : add '-o' option (#12398)

* added -o option to specify an output file name

* llama-tts returns ENOENT in case of file write error

note : PR #12042 is closed as superseded with this one.

commit | commitdiff | tree

aubreyli [Sat, 15 Mar 2025 14:49:03 +0000 (22:49 +0800)]

SYCL: Delete redundant plus sign and space (#12391)

commit | commitdiff | tree

fairydreaming [Sat, 15 Mar 2025 14:19:30 +0000 (15:19 +0100)]

SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399)

* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Chenguang Li [Sat, 15 Mar 2025 01:31:08 +0000 (09:31 +0800)]

[CANN]MUL_MAT optimization (#12382)

commit | commitdiff | tree

Eric Curtin [Fri, 14 Mar 2025 16:41:20 +0000 (16:41 +0000)]

Add CLI arg to llama-run to adjust the number of threads used (#12370)

We default to 4, sometimes we want to manually adjust this

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 14 Mar 2025 15:57:05 +0000 (16:57 +0100)]

main : add -sysf / --system-prompt-file (#12249) (#12250)

* add system_prompt_file

* add -sysf / --system-prompt-file

* remove system_prompt_file

commit | commitdiff | tree

fairydreaming [Fri, 14 Mar 2025 12:47:05 +0000 (13:47 +0100)]

Load all MoE experts during warmup (#11571)

* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup

* common : use new API to enable warmup mode during model warmup

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Victor [Fri, 14 Mar 2025 10:21:17 +0000 (11:21 +0100)]

server: fix "--grammar-file" parameter (#12285)

commit | commitdiff | tree

Georgi Gerganov [Fri, 14 Mar 2025 08:47:44 +0000 (10:47 +0200)]

graph : simplify attn input build for unified KV cache (#12381)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 14 Mar 2025 07:03:24 +0000 (09:03 +0200)]

hparams : add SWA rope parameters (#12374)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 13 Mar 2025 17:08:07 +0000 (19:08 +0200)]

llama : fix Gemma3 SWA KV cache shift (#12373)

* llama : fix Gemma3 SWA KV cache shift

ggml-ci

* hparams : add comment [no ci]

commit | commitdiff | tree

Xuan-Son Nguyen [Thu, 13 Mar 2025 11:34:54 +0000 (12:34 +0100)]

arg : no n_predict = -2 for examples except for main and infill (#12364)

commit | commitdiff | tree

Georgi Gerganov [Thu, 13 Mar 2025 10:35:44 +0000 (12:35 +0200)]

llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)

* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci

commit | commitdiff | tree

Ishaan Gandhi [Thu, 13 Mar 2025 10:10:05 +0000 (06:10 -0400)]

server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338)

* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Oscar Barenys [Wed, 12 Mar 2025 19:06:58 +0000 (20:06 +0100)]

Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301)

commit | commitdiff | tree

Daniel Bevenius [Wed, 12 Mar 2025 12:45:32 +0000 (13:45 +0100)]

llama.swiftui : fix xcframework dir in README [no ci] (#12353)

This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.

commit | commitdiff | tree

Alberto Cabrera Pérez [Wed, 12 Mar 2025 09:57:32 +0000 (09:57 +0000)]

sycl : variable sg_size support for mmvq kernels (#12336)

commit | commitdiff | tree

uvos [Wed, 12 Mar 2025 09:14:11 +0000 (10:14 +0100)]

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

commit | commitdiff | tree

Xuan-Son Nguyen [Wed, 12 Mar 2025 08:30:24 +0000 (09:30 +0100)]

llama : Add Gemma 3 support (+ experimental vision capability) (#12343)

* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64

commit | commitdiff | tree

Jeff Bolz [Wed, 12 Mar 2025 05:59:19 +0000 (00:59 -0500)]

vulkan: fix bug in coopmat1 mul_mat_id (#12316)

* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id

commit | commitdiff | tree

uvos [Tue, 11 Mar 2025 19:16:03 +0000 (20:16 +0100)]

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

jklincn [Tue, 11 Mar 2025 13:25:17 +0000 (21:25 +0800)]

ggml-backend : fix backend search path (#12330)

* Fix backend search path

* replace .native() with '/'

* reverted .native()

commit | commitdiff | tree

BB-fat [Tue, 11 Mar 2025 11:45:02 +0000 (19:45 +0800)]

metal : Cache the Metal library at the device context level (#12265)

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 11 Mar 2025 08:20:16 +0000 (09:20 +0100)]

clip : bring back GPU support (#12322)

* clip : bring back GPU support

* use n_gpu_layers param

* fix double free

* ggml_backend_init_by_type

* clean up

commit | commitdiff | tree

Eve [Mon, 10 Mar 2025 19:28:11 +0000 (19:28 +0000)]

mat vec double buffer (#12188)

commit | commitdiff | tree

R0CKSTAR [Mon, 10 Mar 2025 17:18:25 +0000 (01:18 +0800)]

musa: support new arch mp_31 and update doc (#12296)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]

opencl: use OpenCL C standard supported by the device (#12221)

This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.

commit | commitdiff | tree

John Bean [Mon, 10 Mar 2025 14:13:09 +0000 (22:13 +0800)]

readme: added Sidekick to available UIs (#12311)

commit | commitdiff | tree

Georgi Gerganov [Mon, 10 Mar 2025 12:07:15 +0000 (14:07 +0200)]

tests : fix test-quantize-fns to init the CPU backend (#12306)

ggml-ci

commit | commitdiff | tree

marcoStocchi [Mon, 10 Mar 2025 11:34:13 +0000 (12:34 +0100)]

common : refactor '-o' option (#12278)

As discussed in PR 'llama-tts : add -o option' (#12042):

* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.

* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.

commit | commitdiff | tree

Olivier Chafik [Mon, 10 Mar 2025 10:59:03 +0000 (10:59 +0000)]

`server`: extract <think> tags from qwq outputs (#12297)

* extract <think> tags from qwq outputs

* const for all static regexes in chat.cpp

commit | commitdiff | tree

Olivier Chafik [Mon, 10 Mar 2025 09:45:29 +0000 (09:45 +0000)]

`tool-call`: ensure there's always a non-empty tool call id (#12292)

commit | commitdiff | tree

Olivier Chafik [Mon, 10 Mar 2025 09:45:07 +0000 (09:45 +0000)]

allow missing content in message if tool_calls provided (#12293)

commit | commitdiff | tree

Olivier Chafik [Mon, 10 Mar 2025 09:44:42 +0000 (09:44 +0000)]

`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291)

* Fix typo in lazy grammar handling (fixes trigger tokens)

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

tc-mb [Mon, 10 Mar 2025 08:33:24 +0000 (16:33 +0800)]

llava : fix bug in minicpm-v code (#11513)

* fix bug in minicpm-v code

* update readme of minicpm-v

commit | commitdiff | tree

Georgi Gerganov [Sun, 9 Mar 2025 17:08:20 +0000 (19:08 +0200)]

server : add speculative decoding presets for FIM (#12287)

commit | commitdiff | tree

Georgi Gerganov [Sat, 8 Mar 2025 16:26:00 +0000 (18:26 +0200)]

authors : update (#12271)

commit | commitdiff | tree

Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]

ggml-backend : make path_str compatible with C++20 (#12269)

commit | commitdiff | tree

Georgi Gerganov [Fri, 7 Mar 2025 18:54:30 +0000 (20:54 +0200)]

server : infill gen ends on new line (#12254)

commit | commitdiff | tree

Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]

ggml : skip intermediate .air file when compiling .metallib (#12247)

This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.

commit | commitdiff | tree

Georgi Gerganov [Fri, 7 Mar 2025 12:00:27 +0000 (14:00 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

vmobilis [Fri, 7 Mar 2025 08:11:40 +0000 (11:11 +0300)]

ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)

* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.

commit | commitdiff | tree

Rémy O [Fri, 7 Mar 2025 11:54:22 +0000 (12:54 +0100)]

ggml-cpu: faster AVX2 variant for IQ1_M (#12216)

commit | commitdiff | tree

Georgi Gerganov [Fri, 7 Mar 2025 10:19:31 +0000 (12:19 +0200)]

ci : fix save-load test invocations (#12245)

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 7 Mar 2025 10:15:33 +0000 (11:15 +0100)]

server : Log original chat template parsing error (#12233)

commit | commitdiff | tree

Olivier Chafik [Fri, 7 Mar 2025 09:33:37 +0000 (09:33 +0000)]

sync: minja - support QwQ-32B (#12235)

https://github.com/google/minja/commit/8a76f7815e8a3ae00bd233c2b5a8b7d4e86564ec

commit | commitdiff | tree

BB-fat [Fri, 7 Mar 2025 07:35:57 +0000 (15:35 +0800)]

metal : simplify kernel arguments using a struct (#3229) (#12194)

* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <redacted>

commit | commitdiff | tree

David Huang [Fri, 7 Mar 2025 07:06:08 +0000 (15:06 +0800)]

HIP: fix rocWMMA build flags under Windows (#12230)

commit | commitdiff | tree

Daniel Bevenius [Fri, 7 Mar 2025 05:23:16 +0000 (06:23 +0100)]

metal : fix default.metallib build (#12224)

This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.

commit | commitdiff | tree

lhez [Fri, 7 Mar 2025 00:20:35 +0000 (16:20 -0800)]

opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (#12217)

* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

commit | commitdiff | tree

xiaofei [Thu, 6 Mar 2025 22:58:25 +0000 (06:58 +0800)]

cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094)

Signed-off-by: Ray Lee <redacted>
Co-authored-by: Ray Lee <redacted>

commit | commitdiff | tree

Lucas Moura Belo [Thu, 6 Mar 2025 19:15:13 +0000 (16:15 -0300)]

readme : update bindings (#12229)

commit | commitdiff | tree

Johannes Gäßler [Thu, 6 Mar 2025 17:45:09 +0000 (18:45 +0100)]

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222)

commit | commitdiff | tree

David Huang [Thu, 6 Mar 2025 13:14:11 +0000 (21:14 +0800)]

HIP: rocWMMA documentation and enabling in workflow builds (#12179)

* Enable rocWMMA for Windows CI build

* Enable for Ubuntu

* GGML_HIP_ROCWMMA_FATTN documentation work

commit | commitdiff | tree

Olivier Chafik [Thu, 6 Mar 2025 09:03:31 +0000 (09:03 +0000)]

update function-calling.md w/ template override for functionary-small-v3.2 (#12214)

commit | commitdiff | tree

Aaron Teo [Thu, 6 Mar 2025 08:33:21 +0000 (16:33 +0800)]

llava: add big-endian conversion for image encoder (#12218)

Signed-off-by: Aaron Teo <redacted>

commit | commitdiff | tree

uvos [Thu, 6 Mar 2025 07:20:52 +0000 (08:20 +0100)]

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209)

This avoids conflict with internal cuda/hip runtimes memory managment behavior.

commit | commitdiff | tree

Han Yin [Thu, 6 Mar 2025 06:22:49 +0000 (22:22 -0800)]

android : fix KV cache log message condition (#12212)

commit | commitdiff | tree

Henry Linjamäki [Thu, 6 Mar 2025 01:33:40 +0000 (03:33 +0200)]

opencl : fix buffer alignment (#12197)

Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.

commit | commitdiff | tree

Henry Linjamäki [Thu, 6 Mar 2025 01:31:14 +0000 (03:31 +0200)]

opencl : fix `ulong` kernel args were set from `int` variables (#12174)

... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.

commit | commitdiff | tree

simon886212 [Thu, 6 Mar 2025 01:30:05 +0000 (09:30 +0800)]

opencl : fix profile-related errors (#12095)

Co-authored-by: ubuntu <redacted>

commit | commitdiff | tree

Rémy O [Thu, 6 Mar 2025 01:26:10 +0000 (02:26 +0100)]

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154)

* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC

commit | commitdiff | tree

Akarshan Biswas [Wed, 5 Mar 2025 15:58:23 +0000 (21:28 +0530)]

SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201)

commit | commitdiff | tree

Plamen Minev [Wed, 5 Mar 2025 15:16:01 +0000 (17:16 +0200)]

ggml : fix GGMLMetalClass ODR (#12200)

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

commit | commitdiff | tree

Daniel Bevenius [Wed, 5 Mar 2025 13:16:40 +0000 (14:16 +0100)]

ci : add fetch-depth to xcframework upload (#12195)

This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.

commit | commitdiff | tree

Olivier Chafik [Wed, 5 Mar 2025 13:05:13 +0000 (13:05 +0000)]

`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)

* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Wed, 5 Mar 2025 09:22:29 +0000 (10:22 +0100)]

ci : fix xcframework artifact tag (#12191)

The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.

The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.

commit | commitdiff | tree

Daniel Bevenius [Wed, 5 Mar 2025 07:34:02 +0000 (08:34 +0100)]

ci : remove xframework upload (#12190)

* ci : remove xframework upload

This commit removes the upload of the xframework zip file as an
artifact.

The motivation for this change is that the xframework zip file is
currently being uploaded as part of strategy and will therefore be
attempted to be uploaded multiple times and will fail the build.

The uploading should be moved to somewhere else in the build to avoid
this.

* ci : add xcframework upload to macos-latest job

commit | commitdiff | tree

Clauszy [Wed, 5 Mar 2025 07:25:45 +0000 (15:25 +0800)]

server : fix cache reuse logic (#12161)

The first kv shift offsets the positions of all tokens after head_c.
When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.

commit | commitdiff | tree

Daniel Bevenius [Wed, 5 Mar 2025 05:30:31 +0000 (06:30 +0100)]

llama : add xcframework build script (#11996)

* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS

commit | commitdiff | tree

mgroeber9110 [Tue, 4 Mar 2025 16:53:26 +0000 (17:53 +0100)]

ggml : portability fixes for VS 2017 (#12150)

* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Mar 2025 16:42:44 +0000 (18:42 +0200)]

readme : fix roadmap link (#12185)

commit | commitdiff | tree

Sigbjørn Skjæret [Tue, 4 Mar 2025 16:19:39 +0000 (17:19 +0100)]

main: allow preloading conversation with -p and add -st / --single-turn (#12145)

* Add chat template formatting to -no-cnv

* only enable prompt formatting if explicitly enabled

* add -st / --single-turn

* add --single-turn and -p in conversation mode

* fix -sys + -p

* reword warning

* small readability change and fix (long) outdated example usage

* only activate single turn in conversation mode

commit | commitdiff | tree

Olivier Chafik [Tue, 4 Mar 2025 06:24:07 +0000 (06:24 +0000)]

`server`: fix deadly typo in response_format.json_schema.schema handling (#12168)

commit | commitdiff | tree

David Huang [Mon, 3 Mar 2025 21:10:54 +0000 (05:10 +0800)]

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032)

Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16

---

Signed-off-by: Carl Klemm <redacted>
Co-authored-by: Johannes Gäßler <redacted>
Co-authored-by: Ben Jackson <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Mar 2025 15:57:38 +0000 (17:57 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

cmdr2 [Mon, 3 Mar 2025 15:21:31 +0000 (20:51 +0530)]

cuda: unary ops as float + de-duplicate (ggml/1130)

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Feb 2025 10:37:35 +0000 (12:37 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

cmdr2 [Fri, 28 Feb 2025 10:36:46 +0000 (12:36 +0200)]

cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Feb 2025 07:09:58 +0000 (09:09 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

cmdr2 [Fri, 28 Feb 2025 07:04:39 +0000 (12:34 +0530)]

cuda/cpu: Increase support for fp16 unary operations (ggml/1125)

* Support fp16 unary operations in the CUDA backend

* cpu: increase fp16 support for unary operators in the CPU backend

* cuda: increase fp16 support for unary operators in the CUDA backend

* Add test cases for fp16 unary operators

* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing

* metal: fix PR comments for unary op support after fp16 unary tests

commit | commitdiff | tree

Diego Devesa [Thu, 27 Feb 2025 12:35:07 +0000 (13:35 +0100)]

whisper : support GGML_BACKEND_DL (whisper/2843)

* whisper : support GGML_BACKEND_DL

* fix DTW crash

* whisper.objc : fix build - add ggml-cpp.h

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

midnight [Wed, 5 Feb 2025 12:41:10 +0000 (04:41 -0800)]

cmake : fix compile assumptions for power9/etc (whisper/2777)

* Add small comment re: VSX to readme

Co-authored-by: midnight <redacted>

commit | commitdiff | tree

petterreinholdtsen [Wed, 26 Feb 2025 20:44:00 +0000 (21:44 +0100)]

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)

It is used by Whisper talk-llama example.

Co-authored-by: Petter Reinholdtsen <redacted>

commit | commitdiff | tree

cmdr2 [Tue, 25 Feb 2025 12:36:34 +0000 (18:06 +0530)]

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)

* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend

* Add fp16 support for add/sub/mul/div on the CPU backend

* Add test cases for fp16 add/sub/mul/div

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Feb 2025 07:09:38 +0000 (09:09 +0200)]

scripts : sync-ggml-am.sh fix

commit | commitdiff | tree

Daniel Bevenius [Mon, 3 Mar 2025 15:17:36 +0000 (16:17 +0100)]

ci : set GITHUB_ACTION env var for server tests (#12162)

This commit tries to address/improve an issue with the server tests
which are failing with a timeout. Looking at the logs it seems like
they are timing out after 12 seconds:
```
FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds
```

This is somewhat strange as in utils.py we have the following values:
```python
DEFAULT_HTTP_TIMEOUT = 12

if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ:
DEFAULT_HTTP_TIMEOUT = 30

def start(self, timeout_seconds: int | None = DEFAULT_HTTP_TIMEOUT) -> None:
```
It should be the case that a test running in a github action should have
a timeout of 30 seconds. However, it seems like this is not the case.
Inspecting the logs from the CI job we can see the following environment
variables:
```console
Run cd examples/server/tests
2 cd examples/server/tests
3 ./tests.sh
4 shell: /usr/bin/bash -e {0}
5 env:
6 LLAMA_LOG_COLORS: 1
7 LLAMA_LOG_PREFIX: 1
8 LLAMA_LOG_TIMESTAMPS: 1
9 LLAMA_LOG_VERBOSITY: 10
10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64
```

This probably does not address the underlying issue that the servers
that are providing the models to be downloaded occasionally take a
longer time to response but might improve these situations in some
cases.

commit | commitdiff | tree

dm4 [Mon, 3 Mar 2025 13:09:29 +0000 (21:09 +0800)]

tts: add speaker file support (#12048)

* tts: add speaker file support

Signed-off-by: dm4 <redacted>
* tts: handle outetts-0.3

* tts : add new line in error message

---------

Signed-off-by: dm4 <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Diego Devesa [Mon, 3 Mar 2025 13:00:46 +0000 (14:00 +0100)]

test-backend-ops : add option -p to filter by op params (#12155)

commit | commitdiff | tree

ag2s20150909 [Mon, 3 Mar 2025 12:54:08 +0000 (20:54 +0800)]

ggml : fix kleidiai build (#12159)

The libggml API has changed, but this has not been updated.

commit | commitdiff | tree

Eric Curtin [Mon, 3 Mar 2025 12:44:56 +0000 (12:44 +0000)]

Adding UTF-8 support to llama.cpp (#12111)

For emojis, non-alpha characters, etc.

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Mon, 3 Mar 2025 10:42:45 +0000 (11:42 +0100)]

webui : add ?m=... and ?q=... params (#12148)

* webui : add ?m=... and ?q=... params

* also clear prefilledMessage variable

* better approach

* fix comment

* test: bump timeout on GITHUB_ACTION

commit | commitdiff | tree

Akarshan Biswas [Mon, 3 Mar 2025 10:07:22 +0000 (15:37 +0530)]

SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133)

* SYCL: refactor and move cpy kernels to a separate file

* Add few missing cpy kernels

* refactor and add debug logs

commit | commitdiff | tree

Diego Devesa [Sun, 2 Mar 2025 21:11:00 +0000 (22:11 +0100)]

ggml-backend : keep paths in native string type when possible (#12144)

commit | commitdiff | tree

Sigbjørn Skjæret [Sun, 2 Mar 2025 13:53:48 +0000 (14:53 +0100)]

main: use jinja chat template system prompt by default (#12118)

* Use jinja chat template system prompt by default

* faster conditional order

* remove nested ternary

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Sat, 1 Mar 2025 14:22:27 +0000 (15:22 +0100)]

main: update outdated system prompt message (followup to #12131) (#12132)

* Update outdated message

* wording

Co-authored-by: Xuan-Son Nguyen <redacted>
---------

Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Sat, 1 Mar 2025 12:56:45 +0000 (13:56 +0100)]

common : add --system-prompt parameter, replace behavior of -p in conversation mode (#12131)

* Add --system-prompt parameter

* use user defined system prompt

* clarify

Co-authored-by: Xuan-Son Nguyen <redacted>
* add warning

* clarify

Co-authored-by: Xuan-Son Nguyen <redacted>
---------

Co-authored-by: Xuan-Son Nguyen <redacted>

Packaging of ggml-org/llama.cpp

RSS Atom