git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

commit | commitdiff | tree

Jiří Podivín [Thu, 15 Aug 2024 06:21:57 +0000 (08:21 +0200)]

server : init stop and error fields of the result struct (#9026)

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

0cc4m [Wed, 14 Aug 2024 16:32:53 +0000 (18:32 +0200)]

Vulkan Optimizations and Fixes (#8959)

* Optimize Vulkan REPEAT performance

* Use Vulkan GLSL fused multiply-add instruction where possible

* Add GGML_VULKAN_PERF option to output performance data per operator

* Rework and fix Vulkan descriptor set and descriptor pool handling

* Fix float32 concat f16 shader validation error

* Add Vulkan GROUP_NORM eps parameter

* Fix validation error with transfer queue memory barrier flags

* Remove trailing whitespaces

commit | commitdiff | tree

compilade [Wed, 14 Aug 2024 06:51:02 +0000 (02:51 -0400)]

server : fix segfault on long system prompt (#8987)

* server : fix segfault on long system prompt

* server : fix parallel generation with very small batch sizes

* server : fix typo in comment

commit | commitdiff | tree

Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]

cmake : remove unused option GGML_CURL (#9011)

commit | commitdiff | tree

Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]

ggml : move rope type enum to ggml.h (#8949)

* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Xuan Son Nguyen [Tue, 13 Aug 2024 09:41:14 +0000 (11:41 +0200)]

export-lora : throw error if lora is quantized (#9002)

commit | commitdiff | tree

Diogo Teles Sant'Anna [Mon, 12 Aug 2024 16:28:23 +0000 (13:28 -0300)]

ci : fix github workflow vulnerable to script injection (#9008)

Signed-off-by: Diogo Teles Sant'Anna <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Mon, 12 Aug 2024 16:17:03 +0000 (19:17 +0300)]

ci : enable RPC in all of the released builds (#9006)

ref: #8912

commit | commitdiff | tree

Nico Bosshard [Mon, 12 Aug 2024 15:13:59 +0000 (17:13 +0200)]

llama : model-based max number of graph nodes calculation (#8970)

* llama : model-based max number of graph nodes calculation

* Update src/llama.cpp

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Frank Mai [Mon, 12 Aug 2024 12:45:50 +0000 (20:45 +0800)]

docs: introduce gpustack and gguf-parser (#8873)

* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <redacted>
* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <redacted>
---------

Signed-off-by: thxCode <redacted>

commit | commitdiff | tree

DavidKorczynski [Mon, 12 Aug 2024 12:36:41 +0000 (13:36 +0100)]

grammar-parser : fix possible null-deref (#9004)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680
Signed-off-by: David Korczynski <redacted>

commit | commitdiff | tree

DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]

ggml: fix div-by-zero (#9003)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <redacted>

commit | commitdiff | tree

Liu Jia [Mon, 12 Aug 2024 09:46:03 +0000 (17:46 +0800)]

Fix a spelling mistake (#9001)

commit | commitdiff | tree

Georgi Gerganov [Mon, 12 Aug 2024 08:02:01 +0000 (11:02 +0300)]

py : fix requirements check '==' -> '~=' (#8982)

* py : fix requirements check '==' -> '~='

* cont : fix the fix

* ci : run on all requirements.txt

commit | commitdiff | tree

Georgi Gerganov [Mon, 12 Aug 2024 07:21:50 +0000 (10:21 +0300)]

server : handle models with missing EOS token (#8997)

ggml-ci

commit | commitdiff | tree

compilade [Sun, 11 Aug 2024 18:45:41 +0000 (14:45 -0400)]

gguf-py : Numpy dequantization for most types (#8939)

* gguf-py : Numpy dequantization for most types

* gguf-py : Numpy dequantization for grid-based i-quants

commit | commitdiff | tree

Georgi Gerganov [Sun, 11 Aug 2024 13:58:58 +0000 (16:58 +0300)]

flake.lock: Update (#8979)

commit | commitdiff | tree

Neo Zhang [Sun, 11 Aug 2024 08:37:43 +0000 (16:37 +0800)]

update guide (#8909)

Co-authored-by: Neo Zhang <>

commit | commitdiff | tree

fairydreaming [Sun, 11 Aug 2024 08:35:26 +0000 (10:35 +0200)]

llama : check all graph nodes when searching for result_embd_pooled (#8956)

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]

Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]

metal : fix uninitialized abort_callback (#8968)

commit | commitdiff | tree

Xuan Son Nguyen [Sat, 10 Aug 2024 11:04:40 +0000 (13:04 +0200)]

llama : default n_swa for phi-3 (#8931)

* default n_swa for phi-3

* fix

* double check swa

commit | commitdiff | tree

fairydreaming [Sat, 10 Aug 2024 09:43:26 +0000 (11:43 +0200)]

Add support for encoder-only T5 models (#8900)

* gguf-py : add T5ENCODER model architecture

* common : call llama_decode() during warmup only if the model has decoder

* convert-hf : add T5EncoderModel

* llama : add llama_model_has_decoder() API function

* llama : split build_t5() into build_t5_encoder() and build_t5_decoder()

* llama : add support for LLM_ARCH_T5ENCODER

* llama-embedding : add support for LLAMA_POOLING_TYPE_NONE

* llama-embedding : add support for encoder-only models

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Matteo Mortari [Sat, 10 Aug 2024 05:58:49 +0000 (07:58 +0200)]

gguf-py : fix double call to add_architecture() (#8952)

Signed-off-by: tarilabs <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]

Merge commit from fork

commit | commitdiff | tree

fairydreaming [Fri, 9 Aug 2024 16:53:09 +0000 (18:53 +0200)]

llama : add support for lora adapters in T5 model (#8938)

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 15:24:30 +0000 (18:24 +0300)]

make : fix llava obj file race (#8946)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 15:23:52 +0000 (18:23 +0300)]

llama : better replace_all (cont) (#8926)

* llama : better replace_all (cont)

ggml-ci

* code : deduplicate replace_all

ggml-ci

commit | commitdiff | tree

tc-mb [Fri, 9 Aug 2024 10:33:53 +0000 (18:33 +0800)]

llava : support MiniCPM-V-2.5 (#7599)

* init

* rename

* add run android for termux in readme

* add android readme

* add instructions in readme

* change name in readme

* Update README.md

* fixed line

* add result in readme

* random pos_embed

* add positions index

* change for ollama

* change for ollama

* better pos_embed in clip

* support ollama

* updata cmakelist

* updata cmakelist

* rename wrapper

* clear code

* replace and organize code

* add link

* sync master

* fix warnings

* fix warnings

* fix bug in bicubic resize when need resize iamge smaller

* receive review comments and modify

* receive review comments and modify

* put all code into llava dir

* fix quality problem in pr code

* change n_layer

* add space in "-1"

* imitate reshape bug of python code

* fix bug in clip

* fix issues for merging

* fix llama-minicpmv-cli in cmake file

* change pr readme

* fix code review

* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir

* fix cmakefile

* add warn

* fix KEY_HAS_MINICPMV_PROJ

* remove load_image_size into clip_ctx

* remove the extern "C", MINICPMV_API

* fix uhd code for review comment

* delete minicpmv-wrapper in pr

* remove uhd_image_embed

* Modify 2 notes

* clip : style changes

* del common.h in clip

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix makefile error

* fix ubuntu-make error

* try fix clip

* try fix 1

---------

Co-authored-by: Hongji Zhu <redacted>
Co-authored-by: harvestingmoon <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 07:03:48 +0000 (10:03 +0300)]

sync : ggml

commit | commitdiff | tree

Matt Stephenson [Tue, 16 Jul 2024 07:21:09 +0000 (03:21 -0400)]

whisper : use vulkan as gpu backend when available (whisper/2302)

* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <redacted>
* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <redacted>
---------

Signed-off-by: Matt Stephenson <redacted>

commit | commitdiff | tree

Daniel Bevenius [Fri, 9 Aug 2024 06:33:30 +0000 (08:33 +0200)]

embedding : add --pooling option to README.md [no ci] (#8934)

This commit adds the `--pooling` option to the README.md file in the
`examples/embedding` directory.

The motivation for adding this options is that currently if the model
used does not specify a pooling type the embedding example will fail
with the following error message:
```console
main: error: pooling type NONE not supported
```

This commit also updates the name of the executable in the examples
section.

commit | commitdiff | tree

Daniel Bevenius [Fri, 9 Aug 2024 06:32:23 +0000 (08:32 +0200)]

llama : fix typo in llama_tensor_get_type comment [no ci] (#8937)

commit | commitdiff | tree

Mathieu Geli [Fri, 9 Aug 2024 06:32:02 +0000 (08:32 +0200)]

server : add one level list nesting for embeddings (#8936)

commit | commitdiff | tree

compilade [Fri, 9 Aug 2024 03:54:00 +0000 (23:54 -0400)]

llama : reduce useless copies when saving session (#8916)

* llama : avoid useless copies in dummy session writer

* llama : avoid double tensor copy when saving session to buffer

commit | commitdiff | tree

compilade [Thu, 8 Aug 2024 17:33:09 +0000 (13:33 -0400)]

gguf-py : simplify support for quant types (#8838)

* gguf-py : use classes for quants

* convert_hf : simplify internal quantization type selection

* gguf-py : fix flake8 lint

* gguf-py : fix BF16 numpy view type

* gguf-py : remove LlamaFileTypeMap

Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.

* gguf-py : add generic quantize and dequantize functions

The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:56:52 +0000 (14:56 +0300)]

scripts : sync cann files (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:40:12 +0000 (14:40 +0300)]

scripts : fix sync filenames (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 10:19:47 +0000 (13:19 +0300)]

sync : ggml

commit | commitdiff | tree

Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]

ggml : ignore more msvc warnings (ggml/906)

commit | commitdiff | tree

Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]

metal : fix struct name (ggml/912)

ggml-ci

commit | commitdiff | tree

Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]

metal : add abort callback (ggml/905)

commit | commitdiff | tree

Pablo Duboue [Thu, 8 Aug 2024 08:44:51 +0000 (04:44 -0400)]

make : clean llamafile objects (#8923)

`ggml/src/llamafile/sgemm.o` was not deleted on `make clean`

commit | commitdiff | tree

slaren [Wed, 7 Aug 2024 16:24:05 +0000 (18:24 +0200)]

make : use C compiler to build metal embed object (#8899)

* make : use C compiler to build metal embed object

* use rm + rmdir to avoid -r flag in rm

commit | commitdiff | tree

slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]

ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

commit | commitdiff | tree

Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]

[SYCL] Updated SYCL device filtering (#8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

commit | commitdiff | tree

Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]

CUDA/HIP: fix tests/test-backend-ops (#8896)

commit | commitdiff | tree

Zhenwei Jin [Wed, 7 Aug 2024 01:01:06 +0000 (09:01 +0800)]

llama-bench : add support for getting cpu info on Windows (#8824)

* Add support for getting cpu info on Windows for llama_bench

* refactor

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Daniel Bevenius [Tue, 6 Aug 2024 23:43:00 +0000 (01:43 +0200)]

quantize : update usage comment in quantize.cpp (#8889)

This commit updates the usage comment in quantize.cpp to reflect the
new name of the executable, which is llama-quantize.

commit | commitdiff | tree

Nexes the Old [Tue, 6 Aug 2024 23:41:54 +0000 (01:41 +0200)]

typo correction (#8891)

commit | commitdiff | tree

Xuan Son Nguyen [Tue, 6 Aug 2024 15:33:39 +0000 (17:33 +0200)]

server : add lora hotswap endpoint (WIP) (#8857)

* server : add lora hotswap endpoint

* handle lora_no_apply

* fix build

* updae docs

* clean up struct def

* fix build

* add LoRA test

* fix style

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]

CUDA: fix padding logic for FP16/FP32 (#8884)

commit | commitdiff | tree

Daniel Bevenius [Tue, 6 Aug 2024 14:44:35 +0000 (16:44 +0200)]

simple : update name of executable to llama-simple (#8885)

This commit updates the name of the executable in README.md from
`simple` to `llama-simple`.

commit | commitdiff | tree

Jaeden Amero [Tue, 6 Aug 2024 13:21:47 +0000 (17:21 +0400)]

cmake : Link vulkan-shaders-gen with pthreads (#8835)

When using CMake to build with Vulkan support, compiling
vulkan-shaders-gen fails due to missing a CMakeLists.txt specification
to link vulkan-shaders-gen with the threading library, resulting in the
following error.

    [5/172] Linking CXX executable bin/vulkan-shaders-gen
    FAILED: bin/vulkan-shaders-gen
    : && /usr/bin/c++ ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o bin/vulkan-shaders-gen   && :
    ld: error: undefined symbol: pthread_create
    >>> referenced by vulkan-shaders-gen.cpp
    >>>               ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o:(std::__1::__libcpp_thread_create[abi:se180100](pthread**,
    >>>               void* (*)(void*), void*))
    c++: error: linker command failed with exit code 1 (use -v to see invocation)
    [6/172] Generating build details from Git
    -- Found Git: /usr/local/bin/git (found version "2.45.2")
    ninja: build stopped: subcommand failed.

Add the CMakeLists.txt specification to link vulkan-shaders-gen with the
threading library and fix the above error.

Fixes #8834

commit | commitdiff | tree

MaggotHATE [Tue, 6 Aug 2024 11:32:03 +0000 (16:32 +0500)]

[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `e31a4f6` (#8880)

* Fix compilation issue in `vulkan-shaders-gen`

https://github.com/ggerganov/llama.cpp/commit/e31a4f679779220312c165b0f5994c680a610e38 broke compilation on w64devkit. Including `algorithm` seems to fix that.

* Guard it under `#ifdef _WIN32`

commit | commitdiff | tree

Georgi Gerganov [Tue, 6 Aug 2024 08:48:01 +0000 (11:48 +0300)]

contributing : add note about write access

commit | commitdiff | tree

Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]

ggml : add epsilon as a parameter for group_norm (#8818)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Douglas Hanley [Tue, 6 Aug 2024 07:20:54 +0000 (02:20 -0500)]

convert : add support for XLMRoberta embedding models (#8658)

* add conversion for bge-m3; small fix in unigram tokenizer

* clean up and simplify XLMRoberta conversion

commit | commitdiff | tree

Mengqing Cao [Tue, 6 Aug 2024 04:42:42 +0000 (12:42 +0800)]

[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)

* cann: fix ggml_backend_cann_buffer_get_tensor

1. fix data ptr offset
2. enable the acquisition of incomplete tensors

* fix backend cann set_tensor

commit | commitdiff | tree

Neo Zhang [Tue, 6 Aug 2024 01:09:12 +0000 (09:09 +0800)]

[SYCL] correct cmd name (#8877)

commit | commitdiff | tree

Liu Jia [Mon, 5 Aug 2024 16:14:10 +0000 (00:14 +0800)]

common : Changed tuple to struct (TODO fix) (#8823)

* common : Changed tuple to struct (TODO fix)

Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>

* delete llama_init_default_params()

* delete the extra whitespace

commit | commitdiff | tree

wangshuai09 [Mon, 5 Aug 2024 13:10:37 +0000 (21:10 +0800)]

cann: fix buffer_num and runtime speed slowly error (#8865)

commit | commitdiff | tree

Eric Curtin [Mon, 5 Aug 2024 12:45:01 +0000 (13:45 +0100)]

readme : add ramalama to the availables UI (#8811)

ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]

ggml : fix overflows in elu function (#8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

commit | commitdiff | tree

Brian [Mon, 5 Aug 2024 11:15:28 +0000 (21:15 +1000)]

py: Add more authorship metadata from model card (#8810)

* py: add more authorship metadata from model card

* fixup! py: add more authorship metadata from model card

commit | commitdiff | tree

fairydreaming [Mon, 5 Aug 2024 07:38:01 +0000 (09:38 +0200)]

Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858)

* gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token

* llama : find Llama-3.1 <|eom_id|> token id during vocab loading

* llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

stduhpf [Mon, 5 Aug 2024 06:18:27 +0000 (08:18 +0200)]

cmake: fix paths for vulkan shaders compilation on Windows (#8573)

* Vulkan-shaders: attempt fix compilation on windows

* fix miss-matched parenthesis

commit | commitdiff | tree

BarfingLemurs [Mon, 5 Aug 2024 05:54:10 +0000 (01:54 -0400)]

readme : update model list (#8851)

commit | commitdiff | tree

Georgi Gerganov [Mon, 5 Aug 2024 05:53:39 +0000 (08:53 +0300)]

llama : better replace_all (#8852)

commit | commitdiff | tree

0cc4m [Mon, 5 Aug 2024 05:52:55 +0000 (07:52 +0200)]

vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)

* Fix Vulkan mul mat vec invalid results when ncols < warp size

* Only run backend ops mul mat vec block size test if block size not already covered

commit | commitdiff | tree

Georgi Gerganov [Sun, 4 Aug 2024 16:13:25 +0000 (19:13 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]

vulkan : implement Stable Diffusion operators (ggml/904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

commit | commitdiff | tree

Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]

ggml : move c parameter comment to ggml_rope_ext (ggml/901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

wangshuai09 [Mon, 5 Aug 2024 04:22:30 +0000 (12:22 +0800)]

cann: support q4_0 model (#8822)

commit | commitdiff | tree

Brandon Squizzato [Sun, 4 Aug 2024 18:17:16 +0000 (14:17 -0400)]

Install curl in runtime layer (#8693)

commit | commitdiff | tree

ardfork [Sun, 4 Aug 2024 18:16:23 +0000 (18:16 +0000)]

Server: Don't ignore llama.cpp params (#8754)

* Don't ignore llama.cpp params

* Add fallback for max_tokens

commit | commitdiff | tree

Brian Cunnie [Sun, 4 Aug 2024 10:55:03 +0000 (03:55 -0700)]

batched-bench : handle empty `-npl` (#8839)

* [example] batched-bench "segmentation fault"

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69       llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71       // ensure enough sequences are available
-> 72       ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```

* Update examples/batched-bench/batched-bench.cpp

Co-authored-by: compilade <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: compilade <redacted>

commit | commitdiff | tree

Daniel Bevenius [Sat, 3 Aug 2024 13:07:47 +0000 (15:07 +0200)]

baby-llama : remove duplicate vector include

commit | commitdiff | tree

Georgi Gerganov [Sun, 4 Aug 2024 02:53:20 +0000 (05:53 +0300)]

flake.lock: Update (#8847)

commit | commitdiff | tree

jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]

ggml : reading the runtime sve config of the cpu (#8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]

Fix conversion of unnormalized BF16->BF16 weights (#7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Mengqing Cao [Fri, 2 Aug 2024 08:50:53 +0000 (16:50 +0800)]

cann: Fix ggml_cann_im2col for 1D im2col (#8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]

[SYCL] Fixing wrong VDR iq4nl value (#8812)

commit | commitdiff | tree

matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]

ggml-cuda: Adding support for unified memory (#8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]

Build: Only include execinfo.h on linux systems that support it (#8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

commit | commitdiff | tree

slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]

cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

commit | commitdiff | tree

wangshuai09 [Thu, 1 Aug 2024 02:39:05 +0000 (10:39 +0800)]

cann: support q8_0 for Ascend\b backend (#8805)

commit | commitdiff | tree

Igor Okulist [Wed, 31 Jul 2024 23:59:09 +0000 (18:59 -0500)]

server : update llama-server embedding flag documentation (#8779)

Fixes #8763

commit | commitdiff | tree

Clint Herron [Wed, 31 Jul 2024 19:51:06 +0000 (15:51 -0400)]

Build: Fix potential race condition (#8781)

* Fix potential race condition as pointed out by @fairydreaming in #8776

* Reference the .o rather than rebuilding every time.

* Adding in CXXFLAGS and LDFLAGS

* Removing unnecessary linker flags.

commit | commitdiff | tree

pculliton [Wed, 31 Jul 2024 15:12:10 +0000 (11:12 -0400)]

Adding Gemma 2 2B configs (#8784)

* Adding Gemma 2 2B configs

Updates to Q scaling and Gemma 2 model sizes to match v2 2B model.

* Update src/llama.cpp

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Wed, 31 Jul 2024 13:40:08 +0000 (16:40 +0300)]

cmake : fix use of external ggml (#8787)

commit | commitdiff | tree

Someone [Tue, 30 Jul 2024 20:35:30 +0000 (23:35 +0300)]

nix: cuda: rely on propagatedBuildInputs (#8772)

Listing individual outputs no longer necessary to reduce the runtime closure size after https://github.com/NixOS/nixpkgs/pull/323056.

commit | commitdiff | tree

Brian [Tue, 30 Jul 2024 14:57:03 +0000 (00:57 +1000)]

py: add_array() will not add to kv store if value is an empty array (#8774)

* gguf_writer.py: add_array() should not add to kv store if empty

* Apply suggestions from code review

I was wondering if there was a specific reason for `if val` but good to hear we can safely use `len(val == 0`

Co-authored-by: compilade <redacted>
---------

Co-authored-by: compilade <redacted>

commit | commitdiff | tree

l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]

added android implementation of ggml_print_backtrace_symbols (#8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 30 Jul 2024 12:58:57 +0000 (15:58 +0300)]

flake.lock: Update (#8729)

commit | commitdiff | tree

wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]

cann: update cmake (#8765)

commit | commitdiff | tree

zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]

[SYCL] Add `TIMESTEP_EMBEDDING` OP (#8707)

Signed-off-by: zhentaoyu <redacted>

commit | commitdiff | tree

CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]

ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (#8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]

cuda : organize vendor-specific headers into vendors directory (#8746)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]

[SYCL] add conv support (#8688)

Packaging of ggml-org/llama.cpp

RSS Atom