]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Diego Devesa [Sun, 3 Nov 2024 18:34:08 +0000 (19:34 +0100)]
ggml : move CPU backend to a separate file (#10144)
Georgi Gerganov [Sun, 3 Nov 2024 13:18:40 +0000 (15:18 +0200)]
metal : minor fixup in FA kernel (#10143)
* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var
Georgi Gerganov [Sun, 3 Nov 2024 13:14:15 +0000 (15:14 +0200)]
flake.lock: Update (#10146)
Christian Köhnenkamp [Sat, 2 Nov 2024 22:35:31 +0000 (23:35 +0100)]
Add apple arm to presets (#10134)
* Add apple arm to presets
* Add final new line
sasha0552 [Sat, 2 Nov 2024 16:34:56 +0000 (16:34 +0000)]
server : fix slot selection by lru (#10126)
* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix
Georgi Gerganov [Sat, 2 Nov 2024 16:34:00 +0000 (18:34 +0200)]
server : fix endpoint checks (#10135)
ggml-ci
Georgi Gerganov [Sat, 2 Nov 2024 13:18:56 +0000 (15:18 +0200)]
llama : adjust default context size + print warnings (#10136)
* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes
Diego Devesa [Sat, 2 Nov 2024 12:08:53 +0000 (13:08 +0100)]
simple-chat : only add bos on first prompt (#10129)
Xuan Son Nguyen [Sat, 2 Nov 2024 11:53:17 +0000 (12:53 +0100)]
convert-lora : make `--base` optional (#10110)
* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci
Diego Devesa [Fri, 1 Nov 2024 22:50:59 +0000 (23:50 +0100)]
llama : add simple-chat example (#10124)
* llama : add simple-chat example
---------
Co-authored-by: Xuan Son Nguyen <redacted>
Diego Devesa [Fri, 1 Nov 2024 22:48:26 +0000 (23:48 +0100)]
llama : use smart pointers for ggml resources (#10117)
Shupei Fan [Fri, 1 Nov 2024 18:33:14 +0000 (02:33 +0800)]
vulkan : improve ggml_vk_create_buffer error handling (#9898)
Georgi Gerganov [Fri, 1 Nov 2024 15:31:51 +0000 (17:31 +0200)]
readme : update hot topics
sasha0552 [Fri, 1 Nov 2024 13:33:14 +0000 (13:33 +0000)]
server : fix smart selection of available slot (#10120)
* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands
Georgi Gerganov [Fri, 1 Nov 2024 10:58:45 +0000 (12:58 +0200)]
ggml : remove ggml_scratch (#10121)
ggml-ci
Georgi Gerganov [Fri, 1 Nov 2024 08:28:24 +0000 (10:28 +0200)]
sync : ggml
Georgi Gerganov [Fri, 1 Nov 2024 08:23:05 +0000 (10:23 +0200)]
ggml : alloc ggml_contexts on the heap (whisper/2525)
Zhenwei Jin [Fri, 1 Nov 2024 03:09:59 +0000 (11:09 +0800)]
build: fix build error in Windows env with OneAPI setup (#10107)
Diego Devesa [Thu, 31 Oct 2024 23:49:53 +0000 (00:49 +0100)]
llama : improve output buffer type selection (#10098)
Diego Devesa [Thu, 31 Oct 2024 23:45:34 +0000 (00:45 +0100)]
quantize : fix --keep-split (#10114)
Diego Devesa [Thu, 31 Oct 2024 21:54:23 +0000 (22:54 +0100)]
llama : fix buffer checks for mamba and rwk (#10111)
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
Zhenwei Jin [Thu, 31 Oct 2024 18:50:39 +0000 (02:50 +0800)]
loader: refactor tensor weights storage (#9935)
* loader: refactor tensor weights storage
* use sorted map, sort weights by layer
---------
Co-authored-by: slaren <redacted>
Kevin Gibbons [Thu, 31 Oct 2024 13:02:35 +0000 (06:02 -0700)]
server : include scheme when printing URL (#10106)
Diego Devesa [Thu, 31 Oct 2024 10:40:59 +0000 (11:40 +0100)]
ggml : check tensor name lengths in gguf files (#10100)
Sergio López [Thu, 31 Oct 2024 09:09:52 +0000 (10:09 +0100)]
kompute: add mul_mat_q4_k shader (#10097)
This is a more or less direct translation from the Metal implementation
to GLSL.
Signed-off-by: Sergio Lopez <redacted>
Sergio López [Wed, 30 Oct 2024 16:01:52 +0000 (17:01 +0100)]
kompute: add backend registry / device interfaces (#10045)
Get in line with the other backends by supporting the newer
backend/device registry interfaces.
Signed-off-by: Sergio Lopez <redacted>
Diego Devesa [Wed, 30 Oct 2024 13:51:21 +0000 (14:51 +0100)]
ggml : fix memory leaks when loading invalid gguf files (#10094)
* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file
Rich Dougherty [Wed, 30 Oct 2024 12:22:39 +0000 (01:22 +1300)]
readme : more lora detail in main example readme (#10064)
Rich Dougherty [Wed, 30 Oct 2024 12:22:21 +0000 (01:22 +1300)]
convert : more detailed convert lora usage docs (#10065)
xctan [Wed, 30 Oct 2024 07:00:40 +0000 (15:00 +0800)]
ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
* ggml : RISC-V vector gemv for q4_0_8x8
* ggml : Added WIP rvv q4_0_8x8 gemm
* ggml : Added initial implementation of rvv gemm
* ggml : optimize gemm to avoid register spillover
* ggml : Fix GCC rvv load alignment issue
* ggml : Format gemm rvv code
* ggml : Fix a typo in RVV q4_0_8_8 GEMM
Diego Devesa [Wed, 30 Oct 2024 01:01:23 +0000 (02:01 +0100)]
llama : refactor model loader with backend registry (#10026)
Changyeon Kim [Tue, 29 Oct 2024 08:52:56 +0000 (17:52 +0900)]
ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <redacted>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <redacted>
---------
Signed-off-by: Changyeon Kim <redacted>
Georgi Gerganov [Tue, 29 Oct 2024 08:42:05 +0000 (10:42 +0200)]
llama : remove Tail-Free sampling (#10071)
ggml-ci
arch-btw [Mon, 28 Oct 2024 17:45:33 +0000 (10:45 -0700)]
llama : Add IBM granite template (#10013)
* Add granite template to llama.cpp
* Add granite template to test-chat-template.cpp
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <redacted>
* Update tests/test-chat-template.cpp
Co-authored-by: Xuan Son Nguyen <redacted>
* Added proper template and expected output
* Small change to \n
Small change to \n
* Add code space &
Co-authored-by: Xuan Son Nguyen <redacted>
* Fix spacing
* Apply suggestions from code review
* Update src/llama.cpp
---------
Co-authored-by: Xuan Son Nguyen <redacted>
Georgi Gerganov [Mon, 28 Oct 2024 15:41:24 +0000 (17:41 +0200)]
flake.lock: Update (#10063)
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0 ?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)
→ 'github:NixOS/nixpkgs/
2768c7d042a37de65bb1b5b3268fc987e534c49d ?narHash=sha256-AlcmCXJZPIlO5dmFzV3V2XF6x/OpNWUV8Y/FMPGd8Z4%3D' (2024-10-23)
Co-authored-by: github-actions[bot] <redacted>
R0CKSTAR [Mon, 28 Oct 2024 09:02:48 +0000 (17:02 +0800)]
musa: workaround for Guilty Lockup in cleaning src0 (#10042)
Signed-off-by: Xiaodong Ye <redacted>
Georgi Gerganov [Mon, 28 Oct 2024 06:49:32 +0000 (08:49 +0200)]
server : don't overfill the batch during infill (#10018)
ggml-ci
Georgi Gerganov [Sun, 27 Oct 2024 18:59:58 +0000 (20:59 +0200)]
llama : switch KQ multiplication to F32 precision by default (#10015)
ggml-ci
Georgi Gerganov [Sat, 26 Oct 2024 07:34:08 +0000 (10:34 +0300)]
sync : ggml
bssrdf [Wed, 23 Oct 2024 18:34:00 +0000 (14:34 -0400)]
increase cuda_cpy block size (ggml/996)
Co-authored-by: bssrdf <redacted>
Georgi Gerganov [Sat, 26 Oct 2024 07:33:31 +0000 (10:33 +0300)]
scripts : fix amx sync [no ci]
Georgi Gerganov [Fri, 25 Oct 2024 19:26:15 +0000 (22:26 +0300)]
metal : support permuted matrix multiplicaions (#10033)
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
wwoodsTM [Fri, 25 Oct 2024 16:07:34 +0000 (10:07 -0600)]
llama : add DRY sampler (#9702)
* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <redacted>
Co-authored-by: pi6am <redacted>
Michael Podvitskiy [Fri, 25 Oct 2024 15:57:54 +0000 (17:57 +0200)]
llama: string_split fix (#10022)
* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string
Srihari-mcw [Fri, 25 Oct 2024 07:27:41 +0000 (12:57 +0530)]
llamafile : extend sgemm.cpp support for Q5_0 models (#10010)
Georgi Gerganov [Fri, 25 Oct 2024 07:13:46 +0000 (10:13 +0300)]
server : check that the prompt fits in the slot's context (#10030)
ggml-ci
Xuan Son Nguyen [Thu, 24 Oct 2024 19:51:22 +0000 (21:51 +0200)]
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere
Georgi Gerganov [Thu, 24 Oct 2024 18:23:33 +0000 (21:23 +0300)]
ci : fix cmake flags for SYCL
Johannes Gäßler [Thu, 24 Oct 2024 12:40:23 +0000 (14:40 +0200)]
CUDA: fix insufficient buffer clearing for MMQ (#10032)
Johannes Gäßler [Thu, 24 Oct 2024 09:09:36 +0000 (11:09 +0200)]
CUDA: fix MMQ for non-contiguous src0, add tests (#10021)
* CUDA: fix MMQ for non-contiguous src0, add tests
* revise test code
wwoodsTM [Wed, 23 Oct 2024 19:27:51 +0000 (13:27 -0600)]
server : samplers accept the prompt correctly (#10019)
Georgi Gerganov [Wed, 23 Oct 2024 14:23:55 +0000 (17:23 +0300)]
sync : ggml
Georgi Gerganov [Wed, 23 Oct 2024 14:16:56 +0000 (17:16 +0300)]
llama.vim : bump generation time limit to 3s [no ci]
Johannes Gäßler [Fri, 18 Oct 2024 07:24:44 +0000 (09:24 +0200)]
CUDA: fix 1D im2col, add tests (ggml/993)
Daniel Bevenius [Wed, 16 Oct 2024 18:10:01 +0000 (20:10 +0200)]
ggml : remove redundant set of contexts used field (ggml/978)
This commit removes the setting of the `used` field of the contexts in
the global state (g_state) in `ggml_init`.
The motivation for this change is that I believe that this additional
initialization might not be required after the changes in Commit
45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from
whisper.cpp"), which changed the initialization of the contexts field
from `{ 0 }` to `{ { 0 } }`:
```console
g_state = (struct ggml_state) {
- /*.contexts =*/ { 0 },
+ /*.contexts =*/ { { 0 } },
};
```
My understanding is that the `{0}` initialization might not have
zero-initialized all the nested fields in every array element because of
compiler differences, and might have been the reason for having the
explicit setting of the `used` fields to false.
Michael Coppola [Wed, 23 Oct 2024 11:09:26 +0000 (07:09 -0400)]
llama.vim : add classic vim support (#9995)
* added classic vim support
* fixed ring update, removed blank line
* minor
* minor
* minor doc update
* removed uneeded var
* minor
* minor
* fixed job_start creating new scratch buffers
* fixed job_start creating new scratch buffers
* fixed ghost text indenting when expandtab is on
* removed unused code
* minor
* unified fim_on_exit
* minor
* vim ghost text rendering now uses pos_x and pos_y parameters
* renamed *_hlgroup to hlgroup_*
* renamed *_ghost_text to ghost_text_*, moved nvim/vim detection to llama#init()
* minor
---------
Co-authored-by: Michael Coppola <redacted>
Jun Hee Yoo [Wed, 23 Oct 2024 10:33:45 +0000 (19:33 +0900)]
metal : add POOL2D and fix IM2COL (#9943)
* add pool_2d
Signed-off-by: Junhee Yoo <redacted>
* fix im2col and add unittest for N>=1024
Signed-off-by: Junhee Yoo <redacted>
* add tests for N % 1024 != 0
Signed-off-by: Junhee Yoo <redacted>
* remove trailing whitespaces
Signed-off-by: Junhee Yoo <redacted>
* apply suggestions
Signed-off-by: Junhee Yoo <redacted>
* apply more optimization
- original IM2COL kernel + _ext with MIN()
Signed-off-by: Junhee Yoo <redacted>
* apply review: change kernel name of pool_2d
Signed-off-by: Junhee Yoo <redacted>
* apply review
Signed-off-by: Junhee Yoo <redacted>
* fix more formatting and enhance readability
Signed-off-by: Junhee Yoo <redacted>
---------
Signed-off-by: Junhee Yoo <redacted>
github-actions[bot] [Sun, 20 Oct 2024 00:22:59 +0000 (00:22 +0000)]
flake.lock: Update
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
5633bcff0c6162b9e4b5f1264264611e950c8ec7 ?narHash=sha256-9UTxR8eukdg%2BXZeHgxW5hQA9fIKHsKCdOIUycTryeVw%3D' (2024-10-09)
→ 'github:NixOS/nixpkgs/
4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0 ?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)
Xuan Son Nguyen [Tue, 22 Oct 2024 14:59:02 +0000 (16:59 +0200)]
llama : fix empty batch causing llama_batch_allocr to crash (#9966)
* llama : fix empty batch cause llama_batch_allocr to crash
* move batch_allocr inside decode/encode_internal
* fix build
* add GGML_ASSERT
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Daniel Bevenius [Tue, 22 Oct 2024 13:31:06 +0000 (15:31 +0200)]
llama : rename batch to ubatch (#9950)
This commit renames the member field batch in llm_build_context to
ubatch, and also the parameter batch in llama_build_graph, and
llama_set_inputs to ubatch.
The motivation for this change is to make the code more readable
(considering there are the structs llama_batch and llama_sbatch), and
consistent with other parts of the code base where parameters/fields of
type llama_ubatch are named ubatch.
Molly Sophia [Tue, 22 Oct 2024 13:22:26 +0000 (21:22 +0800)]
Rwkv chat template fix (#10001)
* llama: remove useless template matching for rwkv-world
Signed-off-by: Molly Sophia <redacted>
* converter: Add comment about the hack for rwkv models
Signed-off-by: Molly Sophia <redacted>
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <redacted>
---------
Signed-off-by: Molly Sophia <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Xuan Son Nguyen [Tue, 22 Oct 2024 11:08:41 +0000 (13:08 +0200)]
lora : warn user if new token is added in the adapter (#9948)
Molly Sophia [Tue, 22 Oct 2024 10:33:37 +0000 (18:33 +0800)]
llama : add chat template for RWKV-World + fix EOT (#9968)
* Add chat template for RWKV-World
Signed-off-by: Molly Sophia <redacted>
* RWKV: Fix the chat template not being used
Signed-off-by: Molly Sophia <redacted>
* RWKV v6: Set EOT token to ``\n\n``
Signed-off-by: Molly Sophia <redacted>
* readme: add rwkv into supported model list
Signed-off-by: Molly Sophia <redacted>
---------
Signed-off-by: Molly Sophia <redacted>
leo-pony [Tue, 22 Oct 2024 08:16:01 +0000 (16:16 +0800)]
[CANN] Adapt to dynamically loadable backends mechanism (#9970)
* [CANN] Adapt to dynamically loadable backends mechanism
* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class
* Handle the review comments of this pull request
Daniel Bevenius [Tue, 22 Oct 2024 07:40:02 +0000 (09:40 +0200)]
arg : fix typo in embeddings argument help [no ci] (#9994)
This commit fixes two typos in the help text for the `--embd-normalize`
and `--embd-separator` arguments. It also updates common.h which contain
the same typo in two comments.
Georgi Gerganov [Mon, 21 Oct 2024 21:35:25 +0000 (00:35 +0300)]
llama.vim : fix info text display [no ci] (#9787)
Georgi Gerganov [Mon, 21 Oct 2024 19:52:22 +0000 (22:52 +0300)]
llama.vim : move info to the right of screen [no ci] (#9787)
'eol' messes up the rendering with nvim v0.10.2 for some reason
Asghar Ghorbani [Mon, 21 Oct 2024 18:20:59 +0000 (20:20 +0200)]
readme : update UI list (#9972)
add PocketPal AI app
Daniel Bevenius [Mon, 21 Oct 2024 18:12:52 +0000 (20:12 +0200)]
arg : fix attention non-causal arg value hint (#9985)
This commit updates the argument value hint for the `--attention`
argument to `non-causal`.
The motivation for this change is that the only values for this argument
are `causal` and `non-causal`.
Georgi Gerganov [Mon, 21 Oct 2024 17:25:02 +0000 (20:25 +0300)]
llama.vim : plugin for Neovim (#9787)
Georgi Gerganov [Mon, 21 Oct 2024 13:20:46 +0000 (16:20 +0300)]
ggml : add asserts for type conversion in fattn kernels (#9971)
ggml-ci
Radoslav Gerganov [Mon, 21 Oct 2024 10:35:40 +0000 (13:35 +0300)]
rpc : pack only RPC structs (#9959)
Georgi Gerganov [Mon, 21 Oct 2024 06:46:40 +0000 (09:46 +0300)]
llama : default sampling changes + greedy update (#9897)
* llama : deprecate softmax sampler + fix dist sampler
ggml-ci
* tests : replace macros with functions
ggml-ci
* sampling : change temperature sampler logic
For t <= 0.0f, keep the max logit intact and set the rest to -inf
* cont : no need for special "greedy" logic
top-k == 1 is the same
* tests : init prob correctly
* llama : handle temp <= 0.0 in the temp_ext sampler too
ggml-ci
* cont : avoid extra loop in temperature sampler for sub-zero temp
ggml-ci
Georgi Gerganov [Mon, 21 Oct 2024 06:37:12 +0000 (09:37 +0300)]
speculative : fix handling of some input params (#9963)
* speculative : fix batch sizes at initialization
ggml-ci
* speculative : handle params.n_predict == -1
* speculative : limit batch size to llama_n_batch
Neo Zhang Jianyu [Mon, 21 Oct 2024 06:26:09 +0000 (14:26 +0800)]
fix mul_mat_vec_q and *_vec_q error (#9939)
Co-authored-by: arthw <redacted>
Loïc Carrère [Sun, 20 Oct 2024 16:25:41 +0000 (18:25 +0200)]
readme : update bindings list (#9951)
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
icppWorld [Sun, 20 Oct 2024 16:01:34 +0000 (12:01 -0400)]
readme : update infra list (#9942)
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
Xuan Son Nguyen [Fri, 18 Oct 2024 21:18:01 +0000 (23:18 +0200)]
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
* refactor llama_batch_get_one
* adapt all examples
* fix simple.cpp
* fix llama_bench
* fix
* fix context shifting
* free batch before return
* use common_batch_add, reuse llama_batch in loop
* null terminated seq_id list
* fix save-load-state example
* fix perplexity
* correct token pos in llama_batch_allocr
Radoslav Gerganov [Fri, 18 Oct 2024 11:33:58 +0000 (14:33 +0300)]
rpc : backend refactoring (#9912)
* rpc : refactor backend
Use structs for RPC request/response messages
* rpc : refactor server
Ouadie EL FAROUKI [Fri, 18 Oct 2024 05:46:16 +0000 (06:46 +0100)]
[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
* implemented missing SYCL event APIs
* sycl : Added device and backend reg interfaces
* Restructured ggml-sycl.cpp
Ma Mingfei [Fri, 18 Oct 2024 05:34:36 +0000 (13:34 +0800)]
add amx kernel for gemm (#8998)
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
Georgi Gerganov [Fri, 18 Oct 2024 04:32:19 +0000 (07:32 +0300)]
server : add n_indent parameter for line indentation requirement (#9929)
ggml-ci
Daniel Bevenius [Thu, 17 Oct 2024 23:41:51 +0000 (01:41 +0200)]
llama : rename batch_all to batch (#8881)
This commit addresses the TODO in the code to rename the `batch_all`
parameter to `batch` in `llama_decode_internal`.
Georgi Gerganov [Thu, 17 Oct 2024 20:43:05 +0000 (23:43 +0300)]
readme : remove --memory-f32 references (#9925)
Georgi Gerganov [Thu, 17 Oct 2024 20:26:32 +0000 (23:26 +0300)]
llama : change warning to debug log
Georgi Gerganov [Thu, 17 Oct 2024 19:32:47 +0000 (22:32 +0300)]
llama : infill sampling handle very long tokens (#9924)
* llama : infill sampling handle very long tokens
ggml-ci
* cont : better indices
ggml-ci
Tim Wang [Thu, 17 Oct 2024 06:57:14 +0000 (17:57 +1100)]
readme : update bindings list (#9918)
Co-authored-by: Tim Wang <redacted>
Diego Devesa [Thu, 17 Oct 2024 00:46:58 +0000 (02:46 +0200)]
vulkan : add backend registry / device interfaces (#9721)
* vulkan : add backend registry / device interfaces
* llama : print devices used on model load
Gilad S. [Wed, 16 Oct 2024 23:34:22 +0000 (02:34 +0300)]
fix: allocating CPU buffer with size `0` (#9917)
Gilad S. [Wed, 16 Oct 2024 22:36:51 +0000 (01:36 +0300)]
fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS
* fix: switch to `posix_memalign` to keep existing `free()` usages work
* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS
* style: formatting
* fix: move const outside of `#ifndef`
* style: formatting
* fix: unused var
* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`
* fix: unused var
* fix: page align to `GGUF_DEFAULT_ALIGNMENT`
* fix: page align to `TENSOR_ALIGNMENT`
* fix: convert `TENSOR_ALIGNMENT` to a macro
* fix: increase page size to `32` on iOS
* fix: iOS page size
* fix: `hbw_posix_memalign` alignment
Daniel Bevenius [Wed, 16 Oct 2024 17:34:28 +0000 (19:34 +0200)]
llama : suppress conversion from 'size_t' to 'int' (#9046)
* llama : suppress conversion from 'size_t' to 'int'
This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:
```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
conversion from 'size_t' to 'int', possible loss of data
```
This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.
The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.
* squash! llama : suppress conversion from 'size_t' to 'int'
Move cast into for loop.
Daniel Bevenius [Wed, 16 Oct 2024 17:24:05 +0000 (19:24 +0200)]
llava : fix typo in error message [no ci] (#9884)
Joe Eli McIlvain [Wed, 16 Oct 2024 16:03:24 +0000 (09:03 -0700)]
grammar : fix JSON Schema for string regex with top-level alt. (#9903)
Prior to this commit, using a JSON Schema containing a string
with `pattern` regular expression that uses top-level alternation
(e.g. `"pattern": "^A|B|C|D$"`) would result in invalid JSON
output from the constrained sampling grammar, because it
ended up creating a grammar rule like this for the string:
```
thing ::= "\"" "A" | "B" | "C" | "D" "\"" space
```
Note that this rule will only match a starting quote for the "A" case,
and will only match an ending quote for the "D" case,
so this rule will always produce invalid JSON when used for sampling
(that is, the JSON will always be lacking the starting quote,
the ending quote, or both).
This was fixed in a simple way by adding parentheses to the
generated rule (for all string pattern rules, to keep it simple),
such that the new generated rule looks like this (correct):
```
thing ::= "\"" ("A" | "B" | "C" | "D") "\"" space
```
Molly Sophia [Wed, 16 Oct 2024 10:10:21 +0000 (18:10 +0800)]
llama : add tensor name for "result_norm" (#9907)
Signed-off-by: Molly Sophia <redacted>
Alexey Parfenov [Wed, 16 Oct 2024 08:35:53 +0000 (08:35 +0000)]
server : fix the disappearance of the end of the text (#9867)
* server: fix the disappearance of the end of the text when streaming with stop strings
* simplify "send text" checks
Georgi Gerganov [Wed, 16 Oct 2024 08:28:14 +0000 (11:28 +0300)]
sync : ggml
Daniel Bevenius [Wed, 9 Oct 2024 14:40:35 +0000 (16:40 +0200)]
ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
This commit removes the buffer_id field from the leaf_alloc struct.
The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.
leo-pony [Wed, 16 Oct 2024 00:51:46 +0000 (08:51 +0800)]
[CANN] Fix cann compilation error (#9891)
Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.
Georgi Gerganov [Tue, 15 Oct 2024 13:35:33 +0000 (16:35 +0300)]
llama : add infill sampler (#9896)
ggml-ci
Georgi Gerganov [Tue, 15 Oct 2024 13:28:55 +0000 (16:28 +0300)]
server : improve infill context reuse (#9894)
ggml-ci