git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

Daniel Bevenius [Mon, 16 Sep 2024 11:07:13 +0000 (13:07 +0200)]

llama : rename n_embed to n_embd in rwkv6_time_mix (#9504)

This commit renames n_embed to n_embd in llm_build_rwkv6_time_mix.

The motivation for this change is consistency with the other rwkv6
functions like build_rwkv6 (and other parts of the code base).

commit | commitdiff | tree

Michael Podvitskiy [Mon, 16 Sep 2024 11:06:50 +0000 (13:06 +0200)]

ggml : link MATH_LIBRARY not by its full path (#9339)

commit | commitdiff | tree

compilade [Mon, 16 Sep 2024 07:30:22 +0000 (03:30 -0400)]

convert : identify missing model files (#9397)

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Sep 2024 07:27:50 +0000 (10:27 +0300)]

cmake : do not hide GGML options + rename option (#9465)

* cmake : do not hide GGML options

ggml-ci

* build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS

for consistency

ggml-ci

commit | commitdiff | tree

Eve [Mon, 16 Sep 2024 06:48:24 +0000 (06:48 +0000)]

ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422)

* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before

commit | commitdiff | tree

Shane A [Mon, 16 Sep 2024 06:47:37 +0000 (23:47 -0700)]

llama : support OLMoE (#9462)

commit | commitdiff | tree

CarryFun [Mon, 16 Sep 2024 06:45:20 +0000 (14:45 +0800)]

llama : support MiniCPM3 (#9322)

Co-authored-by: 范睿凯 <redacted>

commit | commitdiff | tree

Vinesh Janarthanan [Mon, 16 Sep 2024 06:20:01 +0000 (01:20 -0500)]

main : option to disable context shift (#9484)

* added cli arg to disable context shift

* reverted precommit

* updated README.md for main

* white space

* allow disabling context shift in the server

* Update common/arg.cpp

no-context-shift only works for main example

Co-authored-by: Georgi Gerganov <redacted>
* added server example to --no-context-shift args

* removed server changes

* white space

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Sep 2024 06:05:56 +0000 (09:05 +0300)]

metal : handle zero-sized allocs (#9466)

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Sep 2024 02:14:23 +0000 (05:14 +0300)]

flake.lock: Update (#9488)

commit | commitdiff | tree

Georgi Gerganov [Sun, 15 Sep 2024 17:46:12 +0000 (20:46 +0300)]

common : reimplement logging (#9418)

https://github.com/ggerganov/llama.cpp/pull/9418

commit | commitdiff | tree

slaren [Sun, 15 Sep 2024 17:02:27 +0000 (19:02 +0200)]

gguf-split : add basic checks (#9499)

* gguf-split : do not overwrite existing files when merging

* gguf-split : error when too many arguments are passed

commit | commitdiff | tree

Michael Podvitskiy [Sun, 15 Sep 2024 16:55:52 +0000 (18:55 +0200)]

cmake : correct order of sycl flags (#9497)

commit | commitdiff | tree

Csaba Kecskemeti [Sun, 15 Sep 2024 07:48:25 +0000 (00:48 -0700)]

py : add "LLaMAForCausalLM" conversion support (#9485)

Co-authored-by: Csaba Kecskemeti <redacted>

commit | commitdiff | tree

OSecret [Sun, 15 Sep 2024 07:36:53 +0000 (10:36 +0300)]

readme : update tools list (#9475)

* Added link to proprietary wrapper for Unity3d into README.md

Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html) and [that](https://d23myu0xfn2ttc.cloudfront.net/).

* Update README.md

Fixes upon review

commit | commitdiff | tree

Michael Podvitskiy [Sun, 15 Sep 2024 07:06:38 +0000 (09:06 +0200)]

cmake : try to fix sycl+intel build (#9487)

commit | commitdiff | tree

Yuri Khrustalev [Sat, 14 Sep 2024 09:54:37 +0000 (05:54 -0400)]

ggml : ggml_type_name return "NONE" for invalid values (#9458)

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

commit | commitdiff | tree

VoidIsVoid [Sat, 14 Sep 2024 09:36:44 +0000 (17:36 +0800)]

server: add data: [DONE] to /chat/completions stream response (#9459)

commit | commitdiff | tree

Georgi Gerganov [Sat, 14 Sep 2024 07:55:05 +0000 (10:55 +0300)]

cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)

* cmake : use list(APPEND ...) instead of set() + dedup linker

ggml-ci

* cmake : try fix sycl

* cmake : try to fix sycl 2

* cmake : fix sycl build (#9469)

* try fix sycl build

* use CMAKE_CXX_FLAGS as a string variable

---------

Co-authored-by: Georgi Gerganov <redacted>
* one more CMAKE_CXX_FLAGS fix (#9471)

---------

Co-authored-by: Michael Podvitskiy <redacted>

commit | commitdiff | tree

Daniel Bevenius [Sat, 14 Sep 2024 07:50:12 +0000 (09:50 +0200)]

llama : make cell_id const in inp_s_mask block (#9470)

This commit makes the cell_id variable const in the inp_s_mask block.

The motivation for this change is consistency with the code in the
inp_s_copy block.

commit | commitdiff | tree

Xuan Son Nguyen [Fri, 13 Sep 2024 12:23:11 +0000 (14:23 +0200)]

server : add loading html page while model is loading (#9468)

* Adding loading page for '/' server requests

* set content when model is loading

* removed loading html file

* updated cmakelist

* updated makefile

* cleaned up whitespace

* cleanup for PR removed error

* updated server test to handle 503 HTML

* updated server test to handle 503 HTML

* ca†ch 503 before parsing json

* revert test

* account for both api and web browser requests

* precommit corrections

* eol fix

* revert changes to pre-commit

* removed print statement

* made loading message more descriptive

* also support .html files

---------

Co-authored-by: VJHack <redacted>
Co-authored-by: Vinesh Janarthanan <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 13 Sep 2024 06:53:38 +0000 (09:53 +0300)]

llama : llama_perf + option to disable timings during decode (#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Gilad S. [Fri, 13 Sep 2024 01:54:49 +0000 (04:54 +0300)]

feat: remove a sampler from a chain (#9445)

* feat: remove a sampler from a chain

* fix: return removed sampler

* fix: safer casting

commit | commitdiff | tree

Mathijs Henquet [Thu, 12 Sep 2024 20:30:11 +0000 (22:30 +0200)]

server : Add option to return token pieces in /tokenize endpoint (#9108)

* server : added with_pieces functionality to /tokenize endpoint

* server : Add tokenize with pieces tests to server.feature

* Handle case if tokenizer splits along utf8 continuation bytes

* Add example of token splitting

* Remove trailing ws

* Fix trailing ws

* Maybe fix ci

* maybe this fix windows ci?

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Dou Xinpeng [Thu, 12 Sep 2024 11:46:43 +0000 (19:46 +0800)]

cann: Add host buffer type for Ascend NPU (#9406)

* feat: Add host buffer type for Ascend NPU(CANN backend)

* fix some checking errors

* Add a few comments

commit | commitdiff | tree

fengerhu1 [Thu, 12 Sep 2024 11:34:22 +0000 (19:34 +0800)]

llava : fix the script error in MobileVLM README (#9054)

Signed-off-by: Erhu Feng <redacted>

commit | commitdiff | tree

Xuan Son Nguyen [Thu, 12 Sep 2024 11:33:57 +0000 (13:33 +0200)]

lora : raise error if lm_head is ignored (#9103)

* lora : raise error if lm_head is ignored

* fix style

* clarify comment

commit | commitdiff | tree

Michael Podvitskiy [Thu, 12 Sep 2024 11:30:01 +0000 (13:30 +0200)]

cmake : fix for builds without `GGML_CDEF_PUBLIC` (#9338)

* `GGML_TARGET_DEFINES-NOTFOUND` fix for builds without `GGML_CDEF_PUBLIC`

* Update CMakeLists.txt, spaces fix

commit | commitdiff | tree

Huang Qi [Thu, 12 Sep 2024 11:28:43 +0000 (19:28 +0800)]

ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329)

commit | commitdiff | tree

daminho [Thu, 12 Sep 2024 11:28:20 +0000 (20:28 +0900)]

py : add Phi-1.5/Phi-2 tokenizer (#9361)

* add phi2 tokenizer

* add phi name to convert_hf_to_gguf_update.py

* make tokenizer_pre consistent; llama.cpp work

commit | commitdiff | tree

Trivikram Kamat [Thu, 12 Sep 2024 11:27:45 +0000 (04:27 -0700)]

ci : bump actions/checkout to v4 (#9377)

commit | commitdiff | tree

Michael Podvitskiy [Thu, 12 Sep 2024 11:27:14 +0000 (13:27 +0200)]

cmake : fixed the order of linking libraries for llama-quantize (#9450)

commit | commitdiff | tree

Molly Sophia [Thu, 12 Sep 2024 11:25:16 +0000 (19:25 +0800)]

py : add special tokens in hf_converter for RWKV v6 (#9428)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Ahmad Tameem [Thu, 12 Sep 2024 11:24:31 +0000 (16:24 +0500)]

riscv : modify Makefile and add a RISCV_VECT to print log info (#9442)

- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Sep 2024 11:23:49 +0000 (14:23 +0300)]

ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408)

* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set

ggml-ci

* ggml : add ggml-impl.h to backends

* ggml : fix compiler warnings

ggml-ci

* ggml : add assert upon adding nodes

commit | commitdiff | tree

Neo Zhang Jianyu [Thu, 12 Sep 2024 09:44:17 +0000 (17:44 +0800)]

enhance run script to be easy to change the parameters (#9448)

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

Xinpeng Dou [Thu, 12 Sep 2024 01:02:35 +0000 (09:02 +0800)]

cann: Fix error when running a non-exist op (#9424)

commit | commitdiff | tree

Faisal Zaghloul [Thu, 12 Sep 2024 00:29:53 +0000 (20:29 -0400)]

Add Jais to list of supported models (#9439)

Co-authored-by: fmz <redacted>

commit | commitdiff | tree

slaren [Wed, 11 Sep 2024 15:52:13 +0000 (17:52 +0200)]

llama : skip token bounds check when evaluating embeddings (#9437)

commit | commitdiff | tree

Pavel Zloi [Wed, 11 Sep 2024 12:29:51 +0000 (15:29 +0300)]

py : support converting local models (#7547)

* Support of converting local models added to convert-hf-to-gguf-update.py

* Description fixed

* shutil added to imports

commit | commitdiff | tree

Xuan Son Nguyen [Wed, 11 Sep 2024 10:59:13 +0000 (12:59 +0200)]

llava : correct args for minicpmv-cli (#9429)

commit | commitdiff | tree

Xuan Son Nguyen [Wed, 11 Sep 2024 10:02:09 +0000 (12:02 +0200)]

files : remove accidentally added `lora_test` submodule (#9430)

commit | commitdiff | tree

Farbod Bijary [Wed, 11 Sep 2024 09:22:37 +0000 (12:52 +0330)]

feat: Implements retrying logic for downloading models using --model-url flag (#9255)

* feat: Implements retrying logic for downloading models using --model-url flag

* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* Update common/common.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* apply comments

* implements a retry function to avoid duplication

* fix editorconfig

* change function name

---------

Co-authored-by: farbod <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Co-authored-by: slaren <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Johannes Gäßler [Wed, 11 Sep 2024 08:22:40 +0000 (10:22 +0200)]

CUDA: fix --split-mode row race condition (#9413)

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Sep 2024 07:03:54 +0000 (10:03 +0300)]

batched-bench : remove unused code (#9305)

commit | commitdiff | tree

R0CKSTAR [Wed, 11 Sep 2024 01:46:55 +0000 (09:46 +0800)]

musa: remove Clang builtins mapping (#9421)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Alberto Cabrera Pérez [Wed, 11 Sep 2024 00:53:42 +0000 (01:53 +0100)]

sycl : update support conditions (#9394)

* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <redacted>
* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 10 Sep 2024 22:46:59 +0000 (01:46 +0300)]

flake.lock: Update (#9360)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
  → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/a5d394176e64ab29c852d03346c1fc9b0b7d33eb.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01)
  → 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)
  → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

Xuan Son Nguyen [Tue, 10 Sep 2024 20:41:29 +0000 (22:41 +0200)]

arg : bring back missing ifdef (#9411)

* arg : bring back missing ifdef

* replace with llama_supports_gpu_offload

commit | commitdiff | tree

matteo [Tue, 10 Sep 2024 20:40:59 +0000 (22:40 +0200)]

enable --special arg for llama-server (#9419)

Co-authored-by: matteo serva <redacted>

commit | commitdiff | tree

slaren [Tue, 10 Sep 2024 16:04:25 +0000 (18:04 +0200)]

llama : move random seed generation to the samplers (#9398)

* llama_sampler_penalties : clamp penalty_last_n to zero

commit | commitdiff | tree

Georgi Gerganov [Tue, 10 Sep 2024 07:17:03 +0000 (10:17 +0300)]

metal : fix compile warning with GGML_METAL_NDEBUG (#0)

commit | commitdiff | tree

Daniel Bevenius [Tue, 10 Sep 2024 07:03:21 +0000 (09:03 +0200)]

llama : update llm_build_copy_mask_state comment [no ci] (#9385)

This commit updates the comment, which seems to contain a typo or be an
outdated comment, in the copy_mask_state function changing the variable
n_rs to n_kv.

I believe this change is correct and what the comment wants to
convey is to copy the states that are not going to be used in the
upcoming processing, which are the tokens states from n_seqs up to
the number of possible token states n_kv.

commit | commitdiff | tree

Molly Sophia [Tue, 10 Sep 2024 07:02:30 +0000 (15:02 +0800)]

RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

slaren [Tue, 10 Sep 2024 06:23:33 +0000 (08:23 +0200)]

make : do not run llama-gen-docs when building (#9399)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 9 Sep 2024 21:36:09 +0000 (23:36 +0200)]

common : move arg parser code to `arg.cpp` (#9388)

* common : move arg parser to arg.cpp

* better categorize args

* add cmake

* missing climits

* missing cstdarg

* common : more explicit includes

* fix build

* refactor gpt_params_parse

* update server readme

* fix test

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Mon, 9 Sep 2024 15:40:10 +0000 (18:40 +0300)]

rpc : fix segfault with nkvo (#9389)

* rpc : fix nkvo

* rpc : buf_size must not be static

ref: #9337

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Prashant Vithule [Mon, 9 Sep 2024 15:37:18 +0000 (21:07 +0530)]

ggml : vector length agnostic SVE support (#9290)

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Removed WhiteSpaces

* ggml : style changes + fix 512-bit nb loop check

- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts

* Update ggml/src/ggml-quants.c

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

slaren [Mon, 9 Sep 2024 15:10:46 +0000 (17:10 +0200)]

llama : minor sampling refactor (2) (#9386)

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Sep 2024 12:51:37 +0000 (15:51 +0300)]

readme : update hot topics

commit | commitdiff | tree

Johannes Gäßler [Mon, 9 Sep 2024 12:22:53 +0000 (14:22 +0200)]

CUDA: fix variable name conflict for Windows build (#9382)

commit | commitdiff | tree

Antonis Makropoulos [Mon, 9 Sep 2024 11:21:38 +0000 (14:21 +0300)]

readme : add LLMUnity to UI projects (#9381)

* add LLMUnity to UI projects

* add newline to examples/rpc/README.md to fix editorconfig-checker unit test

commit | commitdiff | tree

Radoslav Gerganov [Mon, 9 Sep 2024 08:04:39 +0000 (11:04 +0300)]

rpc : update README [no ci] (#9320)

Update README with instructions how to offload model layers to both
local and remote devices

commit | commitdiff | tree

Dan Johansson [Mon, 9 Sep 2024 07:02:45 +0000 (09:02 +0200)]

Arm AArch64: Documentation updates (#9321)

* Arm AArch64: Documentation updates

* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels

* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats

* Add newline to the end of docs/build.md

commit | commitdiff | tree

Markus Tavenrath [Sun, 8 Sep 2024 19:43:48 +0000 (21:43 +0200)]

Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118)

* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Sep 2024 19:01:02 +0000 (22:01 +0300)]

cuda : fix FA Q src index (1 -> 0) (#9374)

commit | commitdiff | tree

Xuan Son Nguyen [Sun, 8 Sep 2024 16:08:55 +0000 (18:08 +0200)]

common : bring back missing args, add env var duplication check (#9375)

* common : bring back missing args

* move duplication check to test-arg-parser

* add check for duplicated env var

* correct default values

commit | commitdiff | tree

slaren [Sun, 8 Sep 2024 14:44:42 +0000 (16:44 +0200)]

common : restore --n-gpu-layers (#9371)

commit | commitdiff | tree

slaren [Sun, 8 Sep 2024 13:52:07 +0000 (15:52 +0200)]

llama : refactor samplers internal implementation (#9370)

commit | commitdiff | tree

Neo Zhang Jianyu [Sun, 8 Sep 2024 11:05:29 +0000 (19:05 +0800)]

[SYCL] add check malloc result on device (#9346)

* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

slaren [Sun, 8 Sep 2024 10:41:51 +0000 (12:41 +0200)]

llama : sanitize tokens in the upper bound (#9359)

commit | commitdiff | tree

Xuan Son Nguyen [Sun, 8 Sep 2024 10:12:17 +0000 (12:12 +0200)]

imatrix : fix arg parser for imatrix (#9366)

* imatrix : fix arg parser

* beautify printing first arg

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Sep 2024 06:57:57 +0000 (09:57 +0300)]

metal : update support condition for im2col + fix warning (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Sep 2024 06:38:56 +0000 (09:38 +0300)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Sep 2024 06:38:42 +0000 (09:38 +0300)]

scripts : option to increase git patch context

commit | commitdiff | tree

Salvatore Mesoraca [Fri, 6 Sep 2024 12:34:25 +0000 (14:34 +0200)]

vulkan: add dryrun support to sin and cos ops (ggml/947)

sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Salvatore Mesoraca [Fri, 6 Sep 2024 12:34:07 +0000 (14:34 +0200)]

vulkan: correctly report support for OP_CONT (ggml/946)

test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 3 Sep 2024 15:21:46 +0000 (17:21 +0200)]

tests: add gradient tests for all backends (ggml/932)

* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible

commit | commitdiff | tree

Johannes Gäßler [Sat, 31 Aug 2024 12:35:42 +0000 (14:35 +0200)]

ggml: fix ggml_graph_cpy undefined behavior (ggml/943)

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 15:45:01 +0000 (18:45 +0300)]

cann : fix doxy (ggml/0)

commit | commitdiff | tree

Mengqing Cao [Fri, 9 Aug 2024 12:21:56 +0000 (20:21 +0800)]

cann : add Ascend NPU support (whisper/2336)

* enable Ascend NPU in src/whisper.cpp
* sync test-backend-ops with llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 14:08:03 +0000 (17:08 +0300)]

cuda : mark BF16 CONT as unsupported

commit | commitdiff | tree

Salvatore Mesoraca [Wed, 28 Aug 2024 08:23:02 +0000 (10:23 +0200)]

ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)

* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <redacted>
* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <redacted>
* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <redacted>
* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <redacted>
Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Kevin Gibbons [Sun, 8 Sep 2024 05:51:00 +0000 (22:51 -0700)]

llama : set attrs of mislabelled EOT/EOM tokens (#9348)

commit | commitdiff | tree

Georgi Gerganov [Sat, 7 Sep 2024 21:33:50 +0000 (00:33 +0300)]

llama.android : fix build (#9350)

commit | commitdiff | tree

Georgi Gerganov [Sat, 7 Sep 2024 21:33:33 +0000 (00:33 +0300)]

llama : fix empty ring buffer push (#9358)

commit | commitdiff | tree

Georgi Gerganov [Sat, 7 Sep 2024 21:33:13 +0000 (00:33 +0300)]

llama : sanitize invalid tokens (#9357)

* common : do not add null tokens during warmup

ggml-ci

* llama : check that the input tokens are valid

ggml-ci

* tests : fix batch size of bert model

ggml-ci

commit | commitdiff | tree

Eve [Sat, 7 Sep 2024 19:02:26 +0000 (19:02 +0000)]

llamafile : disable sgemm for batch-size 1 (#9330)

commit | commitdiff | tree

Xuan Son Nguyen [Sat, 7 Sep 2024 18:43:51 +0000 (20:43 +0200)]

common : refactor arg parser (#9308)

* (wip) argparser v3

* migrated

* add test

* handle env

* fix linux build

* add export-docs example

* fix build (2)

* skip build test-arg-parser on windows

* update server docs

* bring back missing --alias

* bring back --n-predict

* clarify test-arg-parser

* small correction

* add comments

* fix args with 2 values

* refine example-specific args

* no more lamba capture

Co-authored-by: slaren@users.noreply.github.com
* params.sparams

* optimize more

* export-docs --> gen-docs

commit | commitdiff | tree

slaren [Sat, 7 Sep 2024 18:23:07 +0000 (20:23 +0200)]

ggml : always check bounds on get_rows operations (#9354)

commit | commitdiff | tree

Georgi Gerganov [Sat, 7 Sep 2024 12:16:19 +0000 (15:16 +0300)]

llama : refactor sampling v2 (#9294)

- Add `struct llama_sampler` and `struct llama_sampler_i`
- Add `llama_sampler_` API
- Add `llama_sampler_chain_` API for chaining multiple samplers
- Remove `LLAMA_API_INTERNAL`
- Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`

commit | commitdiff | tree

Xuan Son Nguyen [Sat, 7 Sep 2024 10:01:34 +0000 (12:01 +0200)]

ggml : fix missing `cpu_set_t` on emscripten (#9336)

* ggml : fix missing cpu_set_t on emscripten

* better version

* bring back android part

commit | commitdiff | tree

slaren [Sat, 7 Sep 2024 07:48:54 +0000 (09:48 +0200)]

ci : disable rocm image creation (#9340)

commit | commitdiff | tree

Xuan Son Nguyen [Fri, 6 Sep 2024 21:21:29 +0000 (23:21 +0200)]

server : simplify state machine for slot (#9283)

* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Aarni Koskela [Fri, 6 Sep 2024 21:03:01 +0000 (00:03 +0300)]

llama-bench : log benchmark progress (#9287)

* llama-bench : add optional progress messages

commit | commitdiff | tree

Aarni Koskela [Fri, 6 Sep 2024 15:59:58 +0000 (18:59 +0300)]

batched-bench : add `--output-format jsonl` option (#9293)

`--output-format` is modeled after `llama-bench`'s options

commit | commitdiff | tree

Changyeon Kim [Fri, 6 Sep 2024 12:54:50 +0000 (21:54 +0900)]

ggml : fix build break for the vulkan-debug (#9265)

- windows build : Ok.
- linux build : Ok.

Signed-off-by: Changyeon Kim <redacted>

commit | commitdiff | tree

Xuan Son Nguyen [Fri, 6 Sep 2024 12:06:04 +0000 (14:06 +0200)]

server : fix missing lock (#9334)

commit | commitdiff | tree

Markus Tavenrath [Fri, 6 Sep 2024 06:56:17 +0000 (08:56 +0200)]

Improve Vulkan shader build system (#9239)

* Improve Vulkan shader builds system

- Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility.
- Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools

* remove not required self dependency

commit | commitdiff | tree

compilade [Fri, 6 Sep 2024 01:48:47 +0000 (21:48 -0400)]

ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151)

* ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b

* ggml-quants : faster 1.625 bpw AVX2 vec_dot

Not using a lookup table anymore makes it match q4_0 speed.

* gguf-py : fix formatting

* llama : remove spaces on empty line

* ggml-quants : subtract 1 when back in epi8

This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.

* ggml-quants : Q2_2 now faster than Q4_K on with AVX2

* ggml-quants : cleanup Q1_3 code formatting

* ggml-quants : ARM NEON vec_dot for q2_2 and q1_3

* ggml-quants : use ceiling division when quantizing q1_3

* convert-hf : simplify BitNet pre-quantization

This still results in the exact same tensor weights and scales,
but it reveals some weirdness in the current algorithm.

* convert-hf : allow converting the weird BitNet 1.3B

Its FFN size is 5460 which is not convenient.
The offending tensors are kept in F16,
which makes the final model 5.01 bpw.

* bitnet : replace 1.58b with b1.58, as in the paper

* ggml-quants : fix build failure on Windows

* ggml-quants : attempt to fix Arm 32-bit support

* ggml : add some informative comments in q1_3 vec_dot

* ggml : add TQ1_0 and TQ2_0 ternary quantization types

* ggml : even faster TQ2_0

* ggml : also faster TQ1_0

Same optimization as for TQ2_0 by offsetting the sum instead of the weights.
This makes TQ1_0 almost as fast as Q8_0 on AVX2.

* ggml : fix build issues in certain environments

* ggml : add NEON vec_dot implementation for TQ1_0 and TQ2_0

* ggml : avoid directly using vmlal_high_s8, for 32-bit ARM compat

The compiler seems smart enough to use the same instruction
even when using vget_high_s8 instead.

* ggml : remove q1_3 and q2_2

No more 1.625 bpw and 2.000 bpw,
now instead using 1.6875 bpw and 2.0625 bpw
with TQ1_0 and TQ2_0, respectively.

* llama : remove the separate scale tensors of BitNet b1.58

They won't be needed, since the remaining ternary quant types have
built-in scales.

* ggml-quants : rename fields of TQ1_0 and TQ2_0 structs for consistency

* ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot

Not yet tested on hardware which supports it,
might not work or might not even compile. But also it might.
It should make the performance better on recent ARM CPUs.

* ggml-quants : remove comment about possible format change of TQ2_0

Making it slightly more convenient for AVX512
but less convenient for everything else is not worth the trouble.

* gguf-py : Numpy (de)quantization for TQ1_0 and TQ2_0

* ggml-quants : use roundf instead of nearest_int for TQ1_0 and TQ2_0

This does not change anything for ternary models,
since their values should never end up being in halfway cases anyway.

* convert : allow direct conversion to TQ1_0 and TQ2_0

The token embeddings and output tensors are kept in F16
to allow quantizing them to Q4_K and Q6_K with llama-quantize.

* llama : handle fallback for TQ1_0 and TQ2_0 with Q4_0

Q4_0 is not completely symmetric (so not lossless for ternary models),
but it should be good enough.

* ggml-quants : allow using ARM dot product instructions for TQ1_0

* ggml-quants : deduplicate TQ1_0 and TQ2_0 __ARM_FEATURE_DOTPROD support

* ggml : remove unused ggml_mul special case

It would otherwise conflict with the more general
optimization coming with Mamba-2.

* ggml : handle TQ1_0 and TQ2_0 in dequantization-based operators

* test-backend-ops : add TQ1_0 and TQ2_0 comments for later

Not yet adding uncommented, because some backends like SYCL and Metal
do not properly handle unknown types in supports_op for GGML_OP_MUL_MAT.
(and Metal also doesn't handle it with GGML_OP_GET_ROWS)
Support for TQ1_0 and TQ2_0 for other backends than CPU
will be added in follow-up pull requests.

Packaging of ggml-org/llama.cpp

RSS Atom