]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Xuan Son Nguyen [Fri, 5 Jul 2024 16:08:32 +0000 (18:08 +0200)]
Reorganize documentation pages (#8325)
* re-organize docs
* add link among docs
* add link to build docs
* fix style
* de-duplicate sections
Georgi Gerganov [Fri, 5 Jul 2024 14:32:09 +0000 (17:32 +0300)]
llama : fix compile warning (#8304)
Natsu [Fri, 5 Jul 2024 14:29:35 +0000 (22:29 +0800)]
cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281)
Ouadie EL FAROUKI [Fri, 5 Jul 2024 12:23:25 +0000 (13:23 +0100)]
Enabled more data types for oneMKL gemm_batch (#8236)
Georgi Gerganov [Fri, 5 Jul 2024 07:15:36 +0000 (10:15 +0300)]
convert : remove AWQ remnants (#8320)
Georgi Gerganov [Fri, 5 Jul 2024 07:15:24 +0000 (10:15 +0300)]
llama : minor indentation during tensor loading (#8304)
* llama : minor indentation during tensor loading
ggml-ci
* llama : use int for layer iterators [no ci]
Johannes Gäßler [Fri, 5 Jul 2024 07:06:31 +0000 (09:06 +0200)]
CUDA: MMQ support for iq4_nl, iq4_xs (#8278)
Daniele [Fri, 5 Jul 2024 07:06:09 +0000 (07:06 +0000)]
CUDA: revert part of the RDNA1 optimizations (#8309)
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
Douglas Hanley [Fri, 5 Jul 2024 07:05:56 +0000 (02:05 -0500)]
llama : streamline embeddings from "non-embedding" models (#8087)
Johannes Gäßler [Fri, 5 Jul 2024 07:05:34 +0000 (09:05 +0200)]
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311)
Pieter Ouwerkerk [Fri, 5 Jul 2024 06:58:41 +0000 (02:58 -0400)]
readme : fix minor typos [no ci] (#8314)
Daniel Bevenius [Fri, 5 Jul 2024 06:14:24 +0000 (08:14 +0200)]
passkey : add short intro to README.md [no-ci] (#8317)
* passkey : add short intro to README.md [no-ci]
This commit adds a short introduction to the README.md file in the
examples/passkey directory.
Signed-off-by: Daniel Bevenius <redacted>
* Update examples/passkey/README.md
---------
Signed-off-by: Daniel Bevenius <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 5 Jul 2024 06:10:03 +0000 (09:10 +0300)]
llama : prefer n_ over num_ prefix (#8308)
Georgi Gerganov [Fri, 5 Jul 2024 06:09:47 +0000 (09:09 +0300)]
contributing : update guidelines (#8316)
luoyu-intel [Fri, 5 Jul 2024 05:06:13 +0000 (05:06 +0000)]
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266)
* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp
Georgi Gerganov [Fri, 5 Jul 2024 04:53:33 +0000 (07:53 +0300)]
py : switch to snake_case (#8305)
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <redacted>
Neo Zhang Jianyu [Fri, 5 Jul 2024 02:32:29 +0000 (10:32 +0800)]
rm get_work_group_size() by local cache for performance (#8286)
Co-authored-by: arthw <redacted>
Xuan Son Nguyen [Thu, 4 Jul 2024 18:55:03 +0000 (20:55 +0200)]
cli: add EOT when user hit Ctrl+C (#8296)
* main: add need_insert_eot
* do not format system prompt if it is empty
Icecream95 [Thu, 4 Jul 2024 17:14:21 +0000 (05:14 +1200)]
llama : add OpenELM support (#7359)
* Initial OpenELM support (270M only so far)
* Fill out missing entries in llama_model_type_name
* fixup! Initial OpenELM support (270M only so far)
Fix formatting
* llama : support all OpenELM models
* llama : add variable GQA and variable FFN sizes
Some metadata keys can now also be arrays to support setting
their value per-layer for models like OpenELM.
* llama : minor spacing changes
Co-authored-by: Georgi Gerganov <redacted>
* llama : use std::array for per-layer hparams
* llama : fix save/load state
* llama : do not print hparams for vocab-only models
* llama : handle n_head == 0
* llama : use const ref for print_f and fix division by zero
* llama : fix t5 uses of n_head and n_ff
* llama : minor comment
---------
Co-authored-by: Francis Couture-Harpin <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Daniel Bevenius [Thu, 4 Jul 2024 16:38:58 +0000 (18:38 +0200)]
tokenize : add --show-count (token) option (#8299)
This commit adds a new option to the tokenize example, --show-count.
When this is set the total number of tokens are printed to stdout.
This was added as an option as I was concerned that there might be
scripts that use the output from this program and it might be better to
not print this information by default.
The motivation for this is that can be useful to find out how many
tokens a file contains, for example when trying to determine prompt
input file sizes for testing.
Signed-off-by: Daniel Bevenius <redacted>
ditsuke [Thu, 4 Jul 2024 15:24:35 +0000 (20:54 +0530)]
build: Export hf-to-gguf as snakecase
ditsuke [Tue, 2 Jul 2024 19:32:56 +0000 (01:02 +0530)]
doc: Add context for why we add an explicit pytorch source
ditsuke [Tue, 2 Jul 2024 10:18:13 +0000 (15:48 +0530)]
chore: Remove rebase artifacts
ditsuke [Tue, 2 Jul 2024 10:05:43 +0000 (15:35 +0530)]
chore: Fixup requirements and build
ditsuke [Tue, 2 Jul 2024 09:48:13 +0000 (15:18 +0530)]
chore: ignore all __pychache__
ditsuke [Sun, 10 Mar 2024 17:51:46 +0000 (23:21 +0530)]
fix: Update script paths in CI scripts
ditsuke [Wed, 28 Feb 2024 20:17:15 +0000 (01:47 +0530)]
fix: Actually include scripts in build
Not namespaced though :(
ditsuke [Tue, 27 Feb 2024 06:31:02 +0000 (12:01 +0530)]
build(python): Package scripts with pip-0517 compliance
fairydreaming [Thu, 4 Jul 2024 13:46:11 +0000 (15:46 +0200)]
Inference support for T5 and FLAN-T5 model families (#5763)
* llama : add inference support and model types for T5 and FLAN-T5 model families
* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()
* common, llama-cli, llama-batched : add support for encoder-decoder models
* convert-hf : handle shared token embeddings tensors in T5Model
* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)
* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model
* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5
---------
Co-authored-by: Stanisław Szymczyk <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Daniel Bevenius [Thu, 4 Jul 2024 10:53:42 +0000 (12:53 +0200)]
tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231)
This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS`
to the root cmake subproject.
The motivation for this is that currently the following warnings are
displayed when compiling the tests and common cmake subprojects:
```console
test-llama-grammar.cpp
C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror':
This function or variable may be unsafe. Consider using strerror_s
instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See
online help for details.
[C:\llama.cpp\build\tests\test-llama-grammar.vcxproj]
...
```
This compile definition is currently set for the `src` subproject
and this change moves into the root cmake project so that it is applied
to all cmake subprojects.
Daniel Bevenius [Thu, 4 Jul 2024 10:50:57 +0000 (12:50 +0200)]
llama : suppress unref var in Windows MSVC (#8150)
* llama : suppress unref var in Windows MSVC
This commit suppresses two warnings that are currently generated for
src/llama.cpp when building on Windows MSVC
```console
C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex':
unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj]
C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e':
unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj]
```
* Update src/llama.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Thu, 4 Jul 2024 07:41:03 +0000 (10:41 +0300)]
convert : fix gemma v1 tokenizer convert (#8248)
ggml-ci
AidanBeltonS [Thu, 4 Jul 2024 01:07:19 +0000 (02:07 +0100)]
[SYCL] Remove unneeded semicolons (#8280)
Daniele [Wed, 3 Jul 2024 23:02:58 +0000 (23:02 +0000)]
Define and optimize RDNA1 (#8085)
slaren [Wed, 3 Jul 2024 17:33:31 +0000 (19:33 +0200)]
ppl : fix n_seq_max for perplexity (#8277)
* ppl : fix n_seq_max for perplexity
* use 1 seq for kl_divergence
Xuan Son Nguyen [Wed, 3 Jul 2024 14:01:54 +0000 (16:01 +0200)]
fix phi 3 conversion (#8262)
Judd [Wed, 3 Jul 2024 12:40:16 +0000 (20:40 +0800)]
fix typo (#8267)
Co-authored-by: Judd <redacted>
AidanBeltonS [Wed, 3 Jul 2024 01:55:34 +0000 (02:55 +0100)]
Dequant improvements rebase (#8255)
* Single load for half2
* Store scales in local mem
* Vec load quantized values
MistApproach [Tue, 2 Jul 2024 20:56:46 +0000 (22:56 +0200)]
fix: add missing short command line argument -mli for multiline-input (#8261)
Clint Herron [Tue, 2 Jul 2024 17:19:56 +0000 (13:19 -0400)]
Adding step to `clean` target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257)
Clint Herron [Tue, 2 Jul 2024 16:18:10 +0000 (12:18 -0400)]
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258)
Faisal Zaghloul [Tue, 2 Jul 2024 14:36:00 +0000 (10:36 -0400)]
Add `JAIS` model(s) (#8118)
* Add `JAIS` model(s)
* cleanup
* address review comments
* remove hack
* un-hardcode max-alibi-bias
* minor tweaks
---------
Co-authored-by: fmz <redacted>
Daniel Bevenius [Tue, 2 Jul 2024 06:40:49 +0000 (08:40 +0200)]
convert-hf : print output file name when completed (#8181)
* convert-hf : print output file name when completed
This commit adds the output file name to the log message when the
conversion is completed.
The motivation for this change is that when `--outfile` option is not
specified it migth not be obvious where the output file is written.
With this change the output of running the script will be something like
the following:
```console
INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf.
```
Signed-off-by: Daniel Bevenius <redacted>
* squash! convert-hf : print output file name when completed
Updates the output of to support printing the directory if the output is
split into multiple files. Also the output file name is now retrieved
from the model_instance object.
Signed-off-by: Daniel Bevenius <redacted>
* squash! convert-hf : print output file name when completed
Use parent attribute of Path object and string interpolation.
Signed-off-by: Daniel Bevenius <redacted>
* squash! convert-hf : print output file name when completed
Use os.sep instead of hardcoding the path separator.
Signed-off-by: Daniel Bevenius <redacted>
---------
Signed-off-by: Daniel Bevenius <redacted>
slaren [Tue, 2 Jul 2024 06:39:38 +0000 (08:39 +0200)]
cuda : update supports_op for matrix multiplication (#8245)
luoyu-intel [Tue, 2 Jul 2024 04:50:07 +0000 (04:50 +0000)]
[SYCL] Fix win build conflict of math library (#8230)
* fix win build conflict of math library
* fix the condition: !(win32 & SYCL)
* revert warp_size=16
luoyu-intel [Tue, 2 Jul 2024 02:16:00 +0000 (02:16 +0000)]
[SYCL] Fix the sub group size of Intel (#8106)
* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size
Xuan Son Nguyen [Mon, 1 Jul 2024 23:07:23 +0000 (01:07 +0200)]
Fix gemma2 tokenizer convert (#8244)
* fix gemma2 tokenizer convert
* remove scores
* improve code, fix new line issue
Johannes Gäßler [Mon, 1 Jul 2024 18:39:06 +0000 (20:39 +0200)]
CUDA: refactor and optimize IQ MMVQ (#8215)
* CUDA: refactor and optimize IQ MMVQ
* uint -> uint32_t
* __dp4a -> ggml_cuda_dp4a
* remove MIN_CC_DP4A checks
* change default
* try CI fix
Mateusz Charytoniuk [Mon, 1 Jul 2024 17:13:22 +0000 (19:13 +0200)]
readme: add Paddler to the list of projects (#8239)
Xuan Son Nguyen [Mon, 1 Jul 2024 16:48:34 +0000 (18:48 +0200)]
gemma2: add sliding window mask (#8227)
* gemma2: add sliding window mask
* fix data_swa uninitialized
* better naming
* add co-author
Co-authored-by: Arlo Phoenix <redacted>
* replace list with single tensor
* update
* llama : minor styling
* convert : add sanity check for query_pre_attn_scalar
* fix small typo in README
---------
Co-authored-by: Arlo Phoenix <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Roni [Mon, 1 Jul 2024 12:48:16 +0000 (14:48 +0200)]
readme : update tool list (#8209)
* Added gppm to Tool list in README
* Update README.md
---------
Co-authored-by: Georgi Gerganov <redacted>
Michael Francis [Mon, 1 Jul 2024 11:47:04 +0000 (07:47 -0400)]
nix : enable curl (#8043)
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Mon, 1 Jul 2024 11:46:18 +0000 (14:46 +0300)]
nix : remove OpenCL remnants (#8235)
* nix : remove OpenCL remnants
* minor : remove parentheses
iacore [Mon, 1 Jul 2024 11:40:58 +0000 (11:40 +0000)]
Document BERT support. (#8205)
* Update README.md
document BERT support
* Update README.md
zhentaoyu [Mon, 1 Jul 2024 11:39:06 +0000 (19:39 +0800)]
[SYCL] Update SYCL-Rope op and Refactor (#8157)
* align with rope.cu and move sycl-op to a single file
Georgi Gerganov [Sun, 30 Jun 2024 23:09:34 +0000 (02:09 +0300)]
flake.lock: Update (#8218)
Xuan Son Nguyen [Sun, 30 Jun 2024 18:27:13 +0000 (20:27 +0200)]
Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
* preserve new line llama_chat_format_single
* disable chat template if in-prefix/suffix is set
* remove redundant change
Andrei [Sun, 30 Jun 2024 03:44:08 +0000 (20:44 -0700)]
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)
* Add attention and final logit softcapping.
* fix
* Add custom add_ functions
* Disable flash attention for Gemma2
* Update src/llama.cpp
Co-authored-by: slaren <redacted>
* Add default value for attention and final logit softcap value
* Add custom kq scaling from Gemma2Attention
* Remove custom pre attention scaling and use computed value instead.
---------
Co-authored-by: slaren <redacted>
Xuan Son Nguyen [Fri, 28 Jun 2024 22:14:20 +0000 (00:14 +0200)]
fix code typo in llama-cli (#8198)
Olivier Chafik [Fri, 28 Jun 2024 17:02:05 +0000 (18:02 +0100)]
json: attempt to skip slow tests when running under emulator (#8189)
Xuan Son Nguyen [Fri, 28 Jun 2024 13:11:44 +0000 (15:11 +0200)]
Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172)
* tmp_contains
* minicpm chat template
* add DeepSeek Lite template
* change deepseek-lite to deepseek2
* correct code comment
* correct code from master branch
Sigbjørn Skjæret [Fri, 28 Jun 2024 10:53:43 +0000 (12:53 +0200)]
Add SPM infill support (#8016)
* add --spm-infill option
* support --spm-infill
* support --spm-infill
slaren [Fri, 28 Jun 2024 10:37:45 +0000 (12:37 +0200)]
cmake : allow user to override default options (#8178)
Olivier Chafik [Fri, 28 Jun 2024 08:26:45 +0000 (09:26 +0100)]
`json`: restore default additionalProperties to false, fix some pattern escapes (#8180)
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
pculliton [Fri, 28 Jun 2024 04:00:43 +0000 (00:00 -0400)]
llama: Add support for Gemma2ForCausalLM (#8156)
* Inference support for Gemma 2 model family
* Update convert-hf-to-gguf.py, constants, and tensor mappings
* cleanup
* format fix
* Fix special token vocab bug
* Don't add space prefix
* fix deleted lines
* Update src/llama.cpp
Co-authored-by: slaren <redacted>
* Add model type names
* Add control vector
* Fix model type identification
---------
Co-authored-by: Andrei Betlen <redacted>
Co-authored-by: slaren <redacted>
Xuan Son Nguyen [Fri, 28 Jun 2024 00:19:11 +0000 (02:19 +0200)]
Add missing items in makefile (#8177)
Olivier Chafik [Thu, 27 Jun 2024 21:08:42 +0000 (22:08 +0100)]
`json`: update grammars/README w/ examples & note about additionalProperties (#8132)
* json: update grammars/README
* mention broken prefixItems
* add mention to llama-gbnf-validator
* json: explicit type: object for nested items object in cli example
loonerin [Thu, 27 Jun 2024 19:01:23 +0000 (15:01 -0400)]
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)
PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.
* CI: fix release build (Mac)
---------
Co-authored-by: loonerin <redacted>
slaren [Thu, 27 Jun 2024 18:04:39 +0000 (20:04 +0200)]
cmake : fix deprecated option names not working (#8171)
* cmake : fix deprecated option names not working
* remove LlAMA_OPENMP
Xuan Son Nguyen [Thu, 27 Jun 2024 16:14:19 +0000 (18:14 +0200)]
Add chatml fallback for cpp `llama_chat_apply_template` (#8160)
* add chatml fallback for cpp `llama_chat_apply_template`
* remove redundant code
Georgi Gerganov [Thu, 27 Jun 2024 15:37:29 +0000 (18:37 +0300)]
flake.lock: Update (#8071)
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
e9ee548d90ff586a6471b4ae80ae9cfcbceb3420 ?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13)
→ 'github:NixOS/nixpkgs/
d603719ec6e294f034936c0d0dc06f689d91b6c3 ?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20)
Co-authored-by: github-actions[bot] <redacted>
Co-authored-by: Philip Taron <redacted>
jukofyork [Thu, 27 Jun 2024 14:48:07 +0000 (15:48 +0100)]
Control vector loading fixes (#8137)
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow
* refactored `llama_control_vector_load_one()`
* allow multiple directions for same layer in same file
* llama_control_vector_load_one() and llama_control_vector_load() now break on error
* removed unnecessary ggml_free() call
Raj Hammeer Singh Hada [Thu, 27 Jun 2024 14:39:29 +0000 (20:09 +0530)]
Delete examples/llama.android/llama/CMakeLists.txt (#8165)
* Delete examples/llama.android/llama/CMakeLists.txt
https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-
2194534244
This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead.
* Update CMakeLists.txt
Pick local llama.cpp files instead of fetching content from git
Sigbjørn Skjæret [Thu, 27 Jun 2024 14:27:41 +0000 (16:27 +0200)]
Add Qwen2MoE 57B-A14B model identifier (#8158)
* Add Qwen2MoE 57B-A14B
* Add Qwen2MoE 57B-A14B
Johannes Gäßler [Thu, 27 Jun 2024 14:26:05 +0000 (16:26 +0200)]
CUDA: fix MMQ stream-k for --split-mode row (#8167)
kustaaya [Thu, 27 Jun 2024 08:58:54 +0000 (11:58 +0300)]
Added support for Viking pre-tokenizer (#8135)
Co-authored-by: kustaaya <redacted>
Sigbjørn Skjæret [Thu, 27 Jun 2024 07:46:41 +0000 (09:46 +0200)]
llama : fix CodeLlama FIM token checks (#8144)
* account for space prefix character
* use find instead
Raj Hammeer Singh Hada [Thu, 27 Jun 2024 01:57:57 +0000 (07:27 +0530)]
Fix llama-android.cpp for error - "common/common.h not found" (#8145)
- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"
Daniel Bevenius [Wed, 26 Jun 2024 23:50:09 +0000 (01:50 +0200)]
clip : suppress unused variable warnings (#8105)
* clip : suppress unused variable warnings
This commit suppresses unused variable warnings for the variables e in
the catch blocks.
The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.
Signed-off-by: Daniel Bevenius <redacted>
* squash! clip : suppress unused variable warnings
Remove e (/*e*/) instead instead of using GGML_UNUSED.
---------
Signed-off-by: Daniel Bevenius <redacted>
Georgi Gerganov [Wed, 26 Jun 2024 20:25:22 +0000 (23:25 +0300)]
scripts : fix filename sync
slaren [Wed, 26 Jun 2024 19:59:28 +0000 (21:59 +0200)]
ci : publish new docker images only when the files change (#8142)
slaren [Wed, 26 Jun 2024 19:34:14 +0000 (21:34 +0200)]
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140)
slaren [Wed, 26 Jun 2024 18:20:22 +0000 (20:20 +0200)]
make : fix missing -O3 (#8143)
Georgi Gerganov [Wed, 26 Jun 2024 16:39:19 +0000 (19:39 +0300)]
sync : ggml
Georgi Gerganov [Wed, 26 Jun 2024 16:36:44 +0000 (19:36 +0300)]
authors : regen
Georgi Gerganov [Wed, 26 Jun 2024 16:32:07 +0000 (19:32 +0300)]
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139)
ggml-ci
Georgi Gerganov [Wed, 26 Jun 2024 16:26:13 +0000 (19:26 +0300)]
readme : update API notes
Georgi Gerganov [Wed, 26 Jun 2024 15:33:02 +0000 (18:33 +0300)]
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122)
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <redacted>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <redacted>
Isaac McFadyen [Wed, 26 Jun 2024 06:29:28 +0000 (02:29 -0400)]
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (#8115)
* Add message about int8 support
* Add suggestions from review
Co-authored-by: Johannes Gäßler <redacted>
---------
Co-authored-by: Johannes Gäßler <redacted>
Johannes Gäßler [Wed, 26 Jun 2024 06:28:02 +0000 (08:28 +0200)]
CUDA: fix misaligned shared memory read (#8123)
Eddie-Wang [Wed, 26 Jun 2024 06:27:46 +0000 (14:27 +0800)]
llama : extend llm_build_ffn() to support _scale tensors (#8103)
Olivier Chafik [Wed, 26 Jun 2024 00:46:35 +0000 (01:46 +0100)]
`json`: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863)
* json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`)
* json: add test for type: [array, null] fix
* update tests
Olivier Chafik [Wed, 26 Jun 2024 00:45:58 +0000 (01:45 +0100)]
`json`: fix additionalProperties, allow space after enum/const (#7840)
* json: default additionalProperty to true
* json: don't force additional props after normal properties!
* json: allow space after enum/const
* json: update pydantic example to set additionalProperties: false
* json: prevent additional props to redefine a typed prop
* port not_strings to python, add trailing space
* fix not_strings & port to js+py
* Update json-schema-to-grammar.cpp
* fix _not_strings for substring overlaps
* json: fix additionalProperties default, uncomment tests
* json: add integ. test case for additionalProperties
* json: nit: simplify condition
* reformat grammar integ tests w/ R"""()""" strings where there's escapes
* update # tokens in server test: consts can now have trailing space
jukofyork [Tue, 25 Jun 2024 20:47:40 +0000 (21:47 +0100)]
fixes #7999 (adds control vectors to all `build_XXX()` functions in `llama.cpp` [needs testing] (#8060)
* fixes #7999
The `build_command_r` forgot to add the control vector.
* Fixes qwen2 too
* Fixed all models' control vectors
* Removed double calls to `cb(cur, "l_out", il)`
* Moved control vector logic to llama_control_vector:apply_to()
fairydreaming [Tue, 25 Jun 2024 19:14:35 +0000 (21:14 +0200)]
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model families (#5763)
* llama : add T5 model architecture, tensors and model header parameters
* llama : add implementation of Unigram tokenizer with SentencePiece-like text normalization using precompiled charsmap
---------
Co-authored-by: Stanisław Szymczyk <redacted>
Daniel Bevenius [Tue, 25 Jun 2024 19:07:28 +0000 (21:07 +0200)]
llama : return nullptr from llama_grammar_init (#8093)
* llama : return nullptr from llama_grammar_init
This commit updates llama_grammar_init to return nullptr instead of
throwing an exception.
The motivation for this is that this function is declared inside an
extern "C" block and is intended/may be used from C code which will not
be able to handle exceptions thrown, and results in undefined behavior.
On Windows and using MSVC the following warning is currently generated:
```console
C:\llama.cpp\llama.cpp(13998,1): warning C4297: 'llama_grammar_init':
function assumed not to throw an exception but does
C:\llama.cpp\llama.cpp(13998,1): message :
__declspec(nothrow), throw(), noexcept(true), or noexcept was specified
on the function
```
Signed-off-by: Daniel Bevenius <redacted>
* squash! llama : return nullptr from llama_grammar_init
Add checks for nullptr when calling llama_grammar_init.
Signed-off-by: Daniel Bevenius <redacted>
---------
Signed-off-by: Daniel Bevenius <redacted>
Co-authored-by: Clint Herron <redacted>
Olivier Chafik [Tue, 25 Jun 2024 19:06:20 +0000 (20:06 +0100)]
`json`: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797)
* json: support minimum for positive integer values
* json: fix min 0
* json: min + max integer constraints
* json: handle negative min / max integer bounds
* json: fix missing paren min/max bug
* json: proper paren fix
* json: integration test for schemas
* json: fix bounds tests
* Update json-schema-to-grammar.cpp
* json: fix negative max
* json: fix negative min (w/ more than 1 digit)
* Update test-grammar-integration.cpp
* json: nit: move string rules together
* json: port min/max integer support to Python & JS
* nit: move + rename _build_min_max_int
* fix min in [1, 9]
* Update test-grammar-integration.cpp
* add C++11-compatible replacement for std::string_view
* add min/max constrained int field to pydantic json schema example
* fix merge
* json: add integration tests for min/max bounds
* reshuffle/merge min/max integ test cases
* nits / cleanups
* defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)
slaren [Tue, 25 Jun 2024 17:20:06 +0000 (19:20 +0200)]
disable docker CI on pull requests (#8110)
joecryptotoo [Tue, 25 Jun 2024 15:13:27 +0000 (08:13 -0700)]
Add healthchecks to llama-server containers (#8081)
* added healthcheck
* added healthcheck
* added healthcheck
* added healthcheck
* added healthcheck
* moved curl to base
* moved curl to base
Brian [Tue, 25 Jun 2024 12:03:25 +0000 (22:03 +1000)]
Gguf dump start data offset via --data-offset and some extra refactor (#8054)
* gguf-dump: add --data-offset
* gguf-dump: add tensor data offset table
* gguf-dump: refactor GGUFReader for clarity
* gguf-dump: add --data-alignment
* gguf-dump.py: Rename variables and adjust comments
start_data_offset --> data_offset
_build_tensors_info_fields --> _build_tensor_info