]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
pkg/ggml/sources/llama.cpp
10 months agogguf-py : simplify support for quant types (#8838)
compilade [Thu, 8 Aug 2024 17:33:09 +0000 (13:33 -0400)]
gguf-py : simplify support for quant types (#8838)

* gguf-py : use classes for quants

* convert_hf : simplify internal quantization type selection

* gguf-py : fix flake8 lint

* gguf-py : fix BF16 numpy view type

* gguf-py : remove LlamaFileTypeMap

Too specific to 'llama.cpp', and would be a maintenance burden
to keep up to date.

* gguf-py : add generic quantize and dequantize functions

The quant classes no longer need to be known,
only the target or the source type,
for 'quantize' and 'dequantize', respectively.

10 months agoscripts : sync cann files (#0)
Georgi Gerganov [Thu, 8 Aug 2024 11:56:52 +0000 (14:56 +0300)]
scripts : sync cann files (#0)

10 months agoscripts : fix sync filenames (#0)
Georgi Gerganov [Thu, 8 Aug 2024 11:40:12 +0000 (14:40 +0300)]
scripts : fix sync filenames (#0)

10 months agosync : ggml
Georgi Gerganov [Thu, 8 Aug 2024 10:19:47 +0000 (13:19 +0300)]
sync : ggml

10 months agoggml : ignore more msvc warnings (ggml/906)
Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]
ggml : ignore more msvc warnings (ggml/906)

10 months agometal : fix struct name (ggml/912)
Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]
metal : fix struct name (ggml/912)

ggml-ci

10 months agometal : add abort callback (ggml/905)
Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]
metal : add abort callback (ggml/905)

10 months agomake : clean llamafile objects (#8923)
Pablo Duboue [Thu, 8 Aug 2024 08:44:51 +0000 (04:44 -0400)]
make : clean llamafile objects (#8923)

`ggml/src/llamafile/sgemm.o` was not deleted on `make clean`

10 months agomake : use C compiler to build metal embed object (#8899)
slaren [Wed, 7 Aug 2024 16:24:05 +0000 (18:24 +0200)]
make : use C compiler to build metal embed object (#8899)

* make : use C compiler to build metal embed object

* use rm + rmdir to avoid -r flag in rm

10 months agoggml-backend : fix async copy from CPU (#8897)
slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]
ggml-backend : fix async copy from CPU (#8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

10 months ago[SYCL] Updated SYCL device filtering (#8901)
Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]
[SYCL] Updated SYCL device filtering  (#8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

10 months agoCUDA/HIP: fix tests/test-backend-ops (#8896)
Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]
CUDA/HIP: fix tests/test-backend-ops (#8896)

10 months agollama-bench : add support for getting cpu info on Windows (#8824)
Zhenwei Jin [Wed, 7 Aug 2024 01:01:06 +0000 (09:01 +0800)]
llama-bench : add support for getting cpu info on Windows (#8824)

* Add support for getting cpu info on Windows for llama_bench

* refactor

---------

Co-authored-by: slaren <redacted>
10 months agoquantize : update usage comment in quantize.cpp (#8889)
Daniel Bevenius [Tue, 6 Aug 2024 23:43:00 +0000 (01:43 +0200)]
quantize : update usage comment in quantize.cpp (#8889)

This commit updates the usage comment in quantize.cpp to reflect the
new name of the executable, which is llama-quantize.

10 months agotypo correction (#8891)
Nexes the Old [Tue, 6 Aug 2024 23:41:54 +0000 (01:41 +0200)]
typo correction (#8891)

10 months agoserver : add lora hotswap endpoint (WIP) (#8857)
Xuan Son Nguyen [Tue, 6 Aug 2024 15:33:39 +0000 (17:33 +0200)]
server : add lora hotswap endpoint (WIP) (#8857)

* server : add lora hotswap endpoint

* handle lora_no_apply

* fix build

* updae docs

* clean up struct def

* fix build

* add LoRA test

* fix style

10 months agoCUDA: fix padding logic for FP16/FP32 (#8884)
Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]
CUDA: fix padding logic for FP16/FP32 (#8884)

10 months agosimple : update name of executable to llama-simple (#8885)
Daniel Bevenius [Tue, 6 Aug 2024 14:44:35 +0000 (16:44 +0200)]
simple : update name of executable to llama-simple (#8885)

This commit updates the name of the executable in README.md from
`simple` to `llama-simple`.

10 months agocmake : Link vulkan-shaders-gen with pthreads (#8835)
Jaeden Amero [Tue, 6 Aug 2024 13:21:47 +0000 (17:21 +0400)]
cmake : Link vulkan-shaders-gen with pthreads (#8835)

When using CMake to build with Vulkan support, compiling
vulkan-shaders-gen fails due to missing a CMakeLists.txt specification
to link vulkan-shaders-gen with the threading library, resulting in the
following error.

    [5/172] Linking CXX executable bin/vulkan-shaders-gen
    FAILED: bin/vulkan-shaders-gen
    : && /usr/bin/c++ ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o bin/vulkan-shaders-gen   && :
    ld: error: undefined symbol: pthread_create
    >>> referenced by vulkan-shaders-gen.cpp
    >>>               ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o:(std::__1::__libcpp_thread_create[abi:se180100](pthread**,
    >>>               void* (*)(void*), void*))
    c++: error: linker command failed with exit code 1 (use -v to see invocation)
    [6/172] Generating build details from Git
    -- Found Git: /usr/local/bin/git (found version "2.45.2")
    ninja: build stopped: subcommand failed.

Add the CMakeLists.txt specification to link vulkan-shaders-gen with the
threading library and fix the above error.

Fixes #8834

10 months ago[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `e31a4f6` (#8880)
MaggotHATE [Tue, 6 Aug 2024 11:32:03 +0000 (16:32 +0500)]
[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `e31a4f6` (#8880)

* Fix compilation issue in `vulkan-shaders-gen`

https://github.com/ggerganov/llama.cpp/commit/e31a4f679779220312c165b0f5994c680a610e38 broke compilation on w64devkit. Including `algorithm` seems to fix that.

* Guard it under `#ifdef _WIN32`

10 months agocontributing : add note about write access
Georgi Gerganov [Tue, 6 Aug 2024 08:48:01 +0000 (11:48 +0300)]
contributing : add note about write access

10 months agoggml : add epsilon as a parameter for group_norm (#8818)
Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]
ggml : add epsilon as a parameter for group_norm (#8818)

Signed-off-by: Molly Sophia <redacted>
10 months agoconvert : add support for XLMRoberta embedding models (#8658)
Douglas Hanley [Tue, 6 Aug 2024 07:20:54 +0000 (02:20 -0500)]
convert : add support for XLMRoberta embedding models (#8658)

* add conversion for bge-m3; small fix in unigram tokenizer

* clean up and simplify XLMRoberta conversion

10 months ago[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)
Mengqing Cao [Tue, 6 Aug 2024 04:42:42 +0000 (12:42 +0800)]
[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)

* cann: fix ggml_backend_cann_buffer_get_tensor

 1. fix data ptr offset
 2. enable the acquisition of incomplete tensors

* fix backend cann set_tensor

10 months ago[SYCL] correct cmd name (#8877)
Neo Zhang [Tue, 6 Aug 2024 01:09:12 +0000 (09:09 +0800)]
[SYCL] correct cmd name (#8877)

10 months agocommon : Changed tuple to struct (TODO fix) (#8823)
Liu Jia [Mon, 5 Aug 2024 16:14:10 +0000 (00:14 +0800)]
common : Changed tuple to struct (TODO fix) (#8823)

* common : Changed tuple to struct (TODO fix)

Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>

* delete llama_init_default_params()

* delete the extra whitespace

10 months agocann: fix buffer_num and runtime speed slowly error (#8865)
wangshuai09 [Mon, 5 Aug 2024 13:10:37 +0000 (21:10 +0800)]
cann: fix buffer_num and runtime speed slowly error (#8865)

10 months agoreadme : add ramalama to the availables UI (#8811)
Eric Curtin [Mon, 5 Aug 2024 12:45:01 +0000 (13:45 +0100)]
readme : add ramalama to the availables UI (#8811)

ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.

Signed-off-by: Eric Curtin <redacted>
10 months agoggml : fix overflows in elu function (#8866)
Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]
ggml : fix overflows in elu function (#8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

10 months agopy: Add more authorship metadata from model card (#8810)
Brian [Mon, 5 Aug 2024 11:15:28 +0000 (21:15 +1000)]
py: Add more authorship metadata from model card (#8810)

* py: add more authorship metadata from model card

* fixup! py: add more authorship metadata from model card

10 months agoStop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool...
fairydreaming [Mon, 5 Aug 2024 07:38:01 +0000 (09:38 +0200)]
Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858)

* gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token

* llama : find Llama-3.1 <|eom_id|> token id during vocab loading

* llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation

---------

Co-authored-by: Stanisław Szymczyk <redacted>
10 months agocmake: fix paths for vulkan shaders compilation on Windows (#8573)
stduhpf [Mon, 5 Aug 2024 06:18:27 +0000 (08:18 +0200)]
cmake: fix paths for vulkan shaders compilation on Windows (#8573)

* Vulkan-shaders: attempt fix compilation on windows

* fix miss-matched parenthesis

10 months agoreadme : update model list (#8851)
BarfingLemurs [Mon, 5 Aug 2024 05:54:10 +0000 (01:54 -0400)]
readme : update model list (#8851)

10 months agollama : better replace_all (#8852)
Georgi Gerganov [Mon, 5 Aug 2024 05:53:39 +0000 (08:53 +0300)]
llama : better replace_all (#8852)

10 months agovulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)
0cc4m [Mon, 5 Aug 2024 05:52:55 +0000 (07:52 +0200)]
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)

* Fix Vulkan mul mat vec invalid results when ncols < warp size

* Only run backend ops mul mat vec block size test if block size not already covered

10 months agosync : ggml
Georgi Gerganov [Sun, 4 Aug 2024 16:13:25 +0000 (19:13 +0300)]
sync : ggml

ggml-ci

10 months agovulkan : implement Stable Diffusion operators (ggml/904)
0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]
vulkan : implement Stable Diffusion operators (ggml/904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

10 months agoggml : move c parameter comment to ggml_rope_ext (ggml/901)
Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]
ggml : move c parameter comment to ggml_rope_ext (ggml/901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>
10 months agocann: support q4_0 model (#8822)
wangshuai09 [Mon, 5 Aug 2024 04:22:30 +0000 (12:22 +0800)]
cann: support q4_0 model (#8822)

10 months agoInstall curl in runtime layer (#8693)
Brandon Squizzato [Sun, 4 Aug 2024 18:17:16 +0000 (14:17 -0400)]
Install curl in runtime layer (#8693)

10 months agoServer: Don't ignore llama.cpp params (#8754)
ardfork [Sun, 4 Aug 2024 18:16:23 +0000 (18:16 +0000)]
Server: Don't ignore llama.cpp params (#8754)

* Don't ignore llama.cpp params

* Add fallback for max_tokens

10 months agobatched-bench : handle empty `-npl` (#8839)
Brian Cunnie [Sun, 4 Aug 2024 10:55:03 +0000 (03:55 -0700)]
batched-bench : handle empty `-npl` (#8839)

* [example] batched-bench "segmentation fault"

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69       llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71       // ensure enough sequences are available
-> 72       ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```

* Update examples/batched-bench/batched-bench.cpp

Co-authored-by: compilade <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: compilade <redacted>
10 months agobaby-llama : remove duplicate vector include
Daniel Bevenius [Sat, 3 Aug 2024 13:07:47 +0000 (15:07 +0200)]
baby-llama : remove duplicate vector include

10 months agoflake.lock: Update (#8847)
Georgi Gerganov [Sun, 4 Aug 2024 02:53:20 +0000 (05:53 +0300)]
flake.lock: Update (#8847)

10 months agoggml : reading the runtime sve config of the cpu (#8709)
jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]
ggml : reading the runtime sve config of the cpu (#8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>
10 months agoFix conversion of unnormalized BF16->BF16 weights (#7843)
Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]
Fix conversion of unnormalized BF16->BF16 weights (#7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
10 months agocann: Fix ggml_cann_im2col for 1D im2col (#8819)
Mengqing Cao [Fri, 2 Aug 2024 08:50:53 +0000 (16:50 +0800)]
cann: Fix ggml_cann_im2col for 1D im2col (#8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

10 months ago[SYCL] Fixing wrong VDR iq4nl value (#8812)
Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]
[SYCL] Fixing wrong VDR iq4nl value (#8812)

10 months agoggml-cuda: Adding support for unified memory (#8035)
matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]
ggml-cuda: Adding support for unified memory (#8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>
10 months agoBuild: Only include execinfo.h on linux systems that support it (#8783)
Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]
Build: Only include execinfo.h on linux systems that support it (#8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

10 months agocuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800)
slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

10 months agocann: support q8_0 for Ascend\b backend (#8805)
wangshuai09 [Thu, 1 Aug 2024 02:39:05 +0000 (10:39 +0800)]
cann: support q8_0 for Ascend\b backend (#8805)

10 months agoserver : update llama-server embedding flag documentation (#8779)
Igor Okulist [Wed, 31 Jul 2024 23:59:09 +0000 (18:59 -0500)]
server : update llama-server embedding flag documentation (#8779)

Fixes #8763

10 months agoBuild: Fix potential race condition (#8781)
Clint Herron [Wed, 31 Jul 2024 19:51:06 +0000 (15:51 -0400)]
Build: Fix potential race condition (#8781)

* Fix potential race condition as pointed out by @fairydreaming in #8776

* Reference the .o rather than rebuilding every time.

* Adding in CXXFLAGS and LDFLAGS

* Removing unnecessary linker flags.

10 months agoAdding Gemma 2 2B configs (#8784)
pculliton [Wed, 31 Jul 2024 15:12:10 +0000 (11:12 -0400)]
Adding Gemma 2 2B configs (#8784)

* Adding Gemma 2 2B configs

Updates to Q scaling and Gemma 2 model sizes to match v2 2B model.

* Update src/llama.cpp

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
10 months agocmake : fix use of external ggml (#8787)
Borislav Stanimirov [Wed, 31 Jul 2024 13:40:08 +0000 (16:40 +0300)]
cmake : fix use of external ggml (#8787)

11 months agonix: cuda: rely on propagatedBuildInputs (#8772)
Someone [Tue, 30 Jul 2024 20:35:30 +0000 (23:35 +0300)]
nix: cuda: rely on propagatedBuildInputs (#8772)

Listing individual outputs no longer necessary to reduce the runtime closure size after https://github.com/NixOS/nixpkgs/pull/323056.

11 months agopy: add_array() will not add to kv store if value is an empty array (#8774)
Brian [Tue, 30 Jul 2024 14:57:03 +0000 (00:57 +1000)]
py: add_array() will not add to kv store if value is an empty array (#8774)

* gguf_writer.py: add_array() should not add to kv store if empty

* Apply suggestions from code review

I was wondering if there was a specific reason for `if val` but good to hear we can safely use `len(val == 0`

Co-authored-by: compilade <redacted>
---------

Co-authored-by: compilade <redacted>
11 months agoadded android implementation of ggml_print_backtrace_symbols (#8751)
l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]
added android implementation of ggml_print_backtrace_symbols (#8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
11 months agoflake.lock: Update (#8729)
Georgi Gerganov [Tue, 30 Jul 2024 12:58:57 +0000 (15:58 +0300)]
flake.lock: Update (#8729)

11 months agocann: update cmake (#8765)
wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]
cann: update cmake (#8765)

11 months ago[SYCL] Add `TIMESTEP_EMBEDDING` OP (#8707)
zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]
[SYCL] Add `TIMESTEP_EMBEDDING` OP (#8707)

Signed-off-by: zhentaoyu <redacted>
11 months agoggml: bugfix: fix the inactive elements is agnostic for risc-v vector (#8748)
CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (#8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>
11 months agocuda : organize vendor-specific headers into vendors directory (#8746)
R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]
cuda : organize vendor-specific headers into vendors directory (#8746)

Signed-off-by: Xiaodong Ye <redacted>
11 months ago[SYCL] add conv support (#8688)
Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]
[SYCL] add conv support (#8688)

11 months agocmake: use 1 more thread for non-ggml in CI (#8740)
Johannes Gäßler [Sun, 28 Jul 2024 20:32:44 +0000 (22:32 +0200)]
cmake: use 1 more thread for non-ggml in CI (#8740)

11 months agochore : Fix vulkan related compiler warnings, add help text, improve CLI options...
Austin [Sun, 28 Jul 2024 07:52:42 +0000 (03:52 -0400)]
chore : Fix vulkan related compiler warnings, add help text, improve CLI options (#8477)

* chore: Fix compiler warnings, add help text, improve CLI options

* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions

* chore : Add ignore rule for vulkan shader generator

Signed-off-by: teleprint-me <redacted>
* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp

Co-authored-by: 0cc4m <redacted>
* chore : Remove void and apply C++ style empty parameters

* chore : Remove void and apply C++ style empty parameters

---------

Signed-off-by: teleprint-me <redacted>
Co-authored-by: 0cc4m <redacted>
11 months agollama : refactor session file management (#8699)
compilade [Sun, 28 Jul 2024 04:42:05 +0000 (00:42 -0400)]
llama : refactor session file management (#8699)

* llama : refactor session file management

* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.

* llama : fix uint64_t format type

* llama : various integer type cast and format string fixes

Some platforms use "%lu" and others "%llu" for uint64_t.
Not sure how to handle that, so casting to size_t when displaying errors.

* llama : remove _context suffix for llama_data_context

* llama : fix session file loading

llama_state_get_size cannot be used to get the max size anymore.

* llama : more graceful error handling of invalid session files

* llama : remove LLAMA_MAX_RNG_STATE

It's no longer necessary to limit the size of the RNG state,
because the max size of session files is not estimated anymore.

* llama : cast seq_id in comparison with unsigned n_seq_max

11 months agofeat: Support Moore Threads GPU (#8383)
R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]
feat: Support Moore Threads GPU  (#8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
11 months agoscripts : sync vulkan-shaders (#0)
Georgi Gerganov [Sat, 27 Jul 2024 15:08:31 +0000 (18:08 +0300)]
scripts : sync vulkan-shaders (#0)

11 months agoscripts : sync ggml-aarch64 sources
Georgi Gerganov [Sat, 27 Jul 2024 14:19:35 +0000 (17:19 +0300)]
scripts : sync ggml-aarch64 sources

11 months agoggml : add missing semicolon (#0)
Georgi Gerganov [Sat, 27 Jul 2024 12:57:09 +0000 (15:57 +0300)]
ggml : add missing semicolon (#0)

ggml-ci

11 months agosync : ggml
Georgi Gerganov [Sat, 27 Jul 2024 12:53:48 +0000 (15:53 +0300)]
sync : ggml

ggml-ci

11 months agoggml : loop tiling optimizations for scalar path (ggml/898)
Mahesh Madhav [Thu, 25 Jul 2024 07:54:08 +0000 (00:54 -0700)]
ggml : loop tiling optimizations for scalar path (ggml/898)

Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.

11 months agoggml: add support for float16 input tensors in pooling operations (ggml/895)
Ivan Filipov [Mon, 22 Jul 2024 11:32:02 +0000 (14:32 +0300)]
ggml: add support for float16 input tensors in pooling operations (ggml/895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <redacted>
11 months agovulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)
Tony Wasserka [Sat, 20 Jul 2024 18:49:44 +0000 (20:49 +0200)]
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <redacted>
11 months agocmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885)
Borislav Stanimirov [Fri, 12 Jul 2024 14:24:20 +0000 (17:24 +0300)]
cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885)

11 months agoggml : remove unnecessary UNUSED macro call (ggml/880)
Daniel Bevenius [Mon, 8 Jul 2024 10:03:42 +0000 (12:03 +0200)]
ggml : remove unnecessary UNUSED macro call (ggml/880)

This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <redacted>
11 months agollama : add support for llama 3.1 rope scaling factors (#8676)
Jeffrey Morgan [Sat, 27 Jul 2024 12:03:45 +0000 (05:03 -0700)]
llama : add support for llama 3.1 rope scaling factors (#8676)

* Add llama 3.1 rope scaling factors to llama conversion and inference

This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <redacted>
* address comments

* address comments

* Update src/llama.cpp

Co-authored-by: compilade <redacted>
* Update convert_hf_to_gguf.py

Co-authored-by: compilade <redacted>
---------

Co-authored-by: compilade <redacted>
11 months agollama : add function for model-based max number of graph nodes (#8622)
Georgi Gerganov [Sat, 27 Jul 2024 11:59:29 +0000 (14:59 +0300)]
llama : add function for model-based max number of graph nodes (#8622)

* llama : model-based max number of graph nodes

ggml-ci

* llama : disable 405B max_nodes path due to lack of complaints

ggml-ci

11 months agocommon : add --no-warmup option for main/llama-cli (#8712)
Daniel Bevenius [Sat, 27 Jul 2024 10:45:02 +0000 (12:45 +0200)]
common : add --no-warmup option for main/llama-cli (#8712)

This commit adds a --no-warmup option for llama-cli.

The motivation for this is that it can be convenient to skip the
warmup llama_decode call when debugging.

Signed-off-by: Daniel Bevenius <redacted>
11 months agocann: Fix Multi-NPU execution error (#8710)
wangshuai09 [Sat, 27 Jul 2024 08:36:44 +0000 (16:36 +0800)]
cann: Fix Multi-NPU execution error (#8710)

* cann: fix multi-npu exec error

* cann: update comment  for ggml_backend_cann_supports_buft

11 months agoggml : reduce hash table reset cost (#8698)
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (#8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

11 months agollama : fix order of parameters (#8706)
Judd [Fri, 26 Jul 2024 08:38:12 +0000 (16:38 +0800)]
llama : fix order of parameters (#8706)

usage of `aclrtGetMemInfo` is correct:

https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html

Co-authored-by: Judd <redacted>
11 months agoserver : add Speech Recognition & Synthesis to UI (#8679)
Yaiko [Thu, 25 Jul 2024 22:10:16 +0000 (18:10 -0400)]
server : add Speech Recognition & Synthesis to UI (#8679)

* server : add Speech Recognition & Synthesis to UI

* server : add Speech Recognition & Synthesis to UI (fixes)

11 months agoexamples : export-lora : fix issue with quantized base models (#8687)
Xuan Son Nguyen [Thu, 25 Jul 2024 21:49:39 +0000 (23:49 +0200)]
examples : export-lora : fix issue with quantized base models (#8687)

11 months agoggml: handle ggml_init failure to fix NULL pointer deref (#8692)
DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]
ggml: handle ggml_init failure to fix NULL pointer deref (#8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

11 months agollama : fix build + fix fabs compile warnings (#8683)
Georgi Gerganov [Thu, 25 Jul 2024 16:57:31 +0000 (19:57 +0300)]
llama : fix build + fix fabs compile warnings (#8683)

ggml-ci

11 months agoggml : fix build on Windows with Snapdragon X (#8531)
Andreas (Andi) Kunar [Thu, 25 Jul 2024 16:01:00 +0000 (18:01 +0200)]
ggml : fix build on Windows with Snapdragon X (#8531)

* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5ea7c73d3494db3b91505599909227d.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <redacted>
11 months agotests : fix printfs (#8068)
Georgi Gerganov [Thu, 25 Jul 2024 15:57:44 +0000 (18:57 +0300)]
tests : fix printfs (#8068)

11 months ago[SYCL] fix multi-gpu issue on sycl (#8554)
Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]
[SYCL] fix multi-gpu issue on sycl (#8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
11 months agoggml : add and use ggml_cpu_has_llamafile() (#8664)
Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]
ggml : add and use ggml_cpu_has_llamafile() (#8664)

11 months agoexamples : remove `finetune` and `train-text-from-scratch` (#8669)
Xuan Son Nguyen [Thu, 25 Jul 2024 08:39:04 +0000 (10:39 +0200)]
examples : remove `finetune` and `train-text-from-scratch` (#8669)

* examples : remove finetune and train-text-from-scratch

* fix build

* update help message

* fix small typo for export-lora

11 months agodocs : Quantum -> Quantized (#8666)
Ujjawal Panchal [Thu, 25 Jul 2024 08:13:27 +0000 (13:43 +0530)]
docs : Quantum -> Quantized (#8666)

* docfix: imatrix readme, quantum models -> quantized models.

* docfix: server readme: quantum models -> quantized models.

11 months agollama: use sliding window for phi3 (#8627)
Fan Shupei [Thu, 25 Jul 2024 07:21:09 +0000 (15:21 +0800)]
llama: use sliding window for phi3 (#8627)

* use sliding window for phi3

* fix typo, "data_swa" -> "data"

* [conver_hf_to_gguf.py] add phi3 sliding window

11 months agoreadme : update games list (#8673)
MorganRO8 [Wed, 24 Jul 2024 16:48:00 +0000 (12:48 -0400)]
readme : update games list (#8673)

Added link to game I made that depends on llama

11 months agoBuild Llama SYCL Intel with static libs (#8668)
Joe Todd [Wed, 24 Jul 2024 13:36:00 +0000 (14:36 +0100)]
Build Llama SYCL Intel with static libs (#8668)

Ensure SYCL CI builds both static & dynamic libs for testing purposes

Signed-off-by: Joe Todd <redacted>
11 months agoreadme : update UI list [no ci] (#8505)
Thorsten Sommer [Wed, 24 Jul 2024 12:52:30 +0000 (14:52 +0200)]
readme : update UI list [no ci] (#8505)

11 months agollama : fix `llama_chat_format_single` for mistral (#8657)
Xuan Son Nguyen [Wed, 24 Jul 2024 11:48:46 +0000 (13:48 +0200)]
llama : fix `llama_chat_format_single` for mistral (#8657)

* fix `llama_chat_format_single` for mistral

* fix typo

* use printf

11 months agoRe-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667)
Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667)