git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

sharpHL [Sun, 28 Jan 2024 08:00:30 +0000 (16:00 +0800)]

llama : add support for Orion-14B (#5118)

* add support for Orion-14B(https://huggingface.co/OrionStarAI/Orion-14B-Chat)

* flake8 support

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: slaren <redacted>
* Update llama.cpp

* Update llama.cpp

---------

Co-authored-by: lixiaopu <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Kyle Mistele [Sun, 28 Jan 2024 07:55:31 +0000 (01:55 -0600)]

docker : add server-first container images (#5157)

* feat: add Dockerfiles for each platform that user ./server instead of ./main

* feat: update .github/workflows/docker.yml to build server-first docker containers

* doc: add information about running the server with Docker to README.md

* doc: add information about running with docker to the server README

* doc: update n-gpu-layers to show correct GPU usage

* fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA

commit | commitdiff | tree

John [Sat, 27 Jan 2024 15:09:18 +0000 (16:09 +0100)]

llava : support for Yi-VL and fix for mobileVLM (#5093)

* Support for Yi-VL, templating fix for mobileVLM

* ws

* Update examples/llava/clip.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llava-cli.cpp

* Update clip.cpp

bugfix for new conversions

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jan 2024 14:59:20 +0000 (16:59 +0200)]

sync : ggml

commit | commitdiff | tree

Judd [Fri, 26 Jan 2024 13:04:01 +0000 (21:04 +0800)]

ggml : check ggml_add src1 type (ggml/708)

Co-authored-by: Judd <redacted>

commit | commitdiff | tree

Michael Klimenko [Sat, 27 Jan 2024 14:25:55 +0000 (15:25 +0100)]

Remove unused data and add fixes (#5154)

* Remove unused data and add fixes

* Add missing file

* Address review comments

* Replace the scope of vq allocation

commit | commitdiff | tree

Maximilian Winter [Sat, 27 Jan 2024 13:38:05 +0000 (14:38 +0100)]

server : add self-extend support (#5104)

* Ported self extension to server example

* Update server.cpp

* Fixed prompt caching without self extend

* Update server.cpp

* Added description to server readme.

* Update server.cpp

* Update server.cpp

* Update server.cpp

* Update server.cpp

* Update README.md

* Changed descriptions

* server : formatting

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update server.cpp

* Update server.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

0cc4m [Fri, 26 Jan 2024 22:07:32 +0000 (23:07 +0100)]

Add OpenCL add kernel (#5151)

* Add OpenCL add kernel

* Put add kernel into different string to stay within MSVC string length limit, disable float16 support due to bad results

commit | commitdiff | tree

Jared Van Bortel [Fri, 26 Jan 2024 20:34:06 +0000 (15:34 -0500)]

cmake : pass CPU architecture flags to nvcc (#5146)

commit | commitdiff | tree

slaren [Fri, 26 Jan 2024 17:59:43 +0000 (18:59 +0100)]

cuda : fix tensor size calculation for non-split buffer (#5145)

commit | commitdiff | tree

slaren [Fri, 26 Jan 2024 17:18:26 +0000 (18:18 +0100)]

ggml-alloc : add 10% margin to the buffer sizes (#5149)

commit | commitdiff | tree

snadampal [Fri, 26 Jan 2024 17:17:59 +0000 (11:17 -0600)]

ggml : update softmax n_task calculation (#5126)

updated the n_task calculation to use max number of
threads possible. This has improved the prompt eval
performance by around 5% for DOT kernels and by
around 10% for MMLA kernels on AWS Graviton3.

commit | commitdiff | tree

Georgi Gerganov [Fri, 26 Jan 2024 15:09:44 +0000 (17:09 +0200)]

scripts : move run-with-preset.py from root to scripts folder

commit | commitdiff | tree

Georgi Gerganov [Fri, 26 Jan 2024 12:48:15 +0000 (14:48 +0200)]

tests : gitignore test-c.o

commit | commitdiff | tree

Xuan Son Nguyen [Fri, 26 Jan 2024 12:42:20 +0000 (13:42 +0100)]

server : refactored the task processing logic (#5065)

* server: add llama_server_queue struct

* server: add llama_server_response_event

* server: add comments

* server: move all mutexes away from server.cpp

* server: correct multitask response

* server: only add back deferred tasks when one slot is available

* server: fix a race condition cause by "request_completion"

commit | commitdiff | tree

crasm [Fri, 26 Jan 2024 12:18:00 +0000 (07:18 -0500)]

ci : add model tests + script wrapper (#4586)

* scripts : add lib.sh and lib_test.sh

* scripts : stub out new ci-run.sh script

* scripts : switch to PascalCase for functions

This looks a little odd at first, but I find it very useful as a
convention to know if a command is part of our code vs a builtin.

* scripts : add some fancy conversion from snake_case to PascalCase

* Add venv to ci/run.sh

* Revert scripts work

* scripts : add wrapper script for local use of ci/run.sh

* Simplify .gitignore for tests, clang-tidy fixes

* Label all ctest tests

* ci : ctest uses -L main

* Attempt at writing ctest_with_model

* Update test-model-load-cancel

* ci : add ctest_with_model for debug and release

ggml-ci

* Fix gg_get_model function

ggml-ci

* got stuck on CMake

* Add get_model.cpp to tests/CMakeLists.txt

ggml-ci

* Fix README.md output for ctest_with_model

ggml-ci

* workflows : use `-L main` for all ctest

ggml-ci

* Fixes

* GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE
* Always show warning rather than failing if model file variable is not
set

* scripts : update usage text for ci-run.sh

commit | commitdiff | tree

Paul Tsochantaris [Fri, 26 Jan 2024 12:16:07 +0000 (12:16 +0000)]

metal : remove unused `n_buffers` and `buffers` (#5129)

commit | commitdiff | tree

Riceball LEE [Fri, 26 Jan 2024 09:10:28 +0000 (17:10 +0800)]

gguf : fix "general.alignment" type in gguf_reader.py (#5136)

commit | commitdiff | tree

Georgi Gerganov [Fri, 26 Jan 2024 08:52:33 +0000 (10:52 +0200)]

readme : update hot topics

commit | commitdiff | tree

Kawrakow [Fri, 26 Jan 2024 07:14:39 +0000 (09:14 +0200)]

Another bucket sort (#5109)

* Initial bucket sort

* Bucket sort: slightly better version

* Bucket sort: another minor improvement

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

XiaotaoChen [Thu, 25 Jan 2024 20:14:32 +0000 (04:14 +0800)]

readme : add MobileVLM 1.7B/3B to the supported models list (#5107)

Co-authored-by: Chenxiaotao03 <redacted>

commit | commitdiff | tree

l3utterfly [Thu, 25 Jan 2024 20:06:22 +0000 (05:06 +0900)]

llama : dynamic temperature sampling (#4972)

* implemented dynamic temperature sampling from koboldcpp

* removed trailing whitespace

* removed unused temp parameter in llama_sample_entropy

* exposed exponent_val in dynamic temp sampler

* added debug check for printf statements

* use nullptr in llama_sample_softmax call during llama_sample_entropy

this avoids counting the time taken stats twice

Co-authored-by: Georgi Gerganov <redacted>
* return earlier if there is only 1 candiate (i.e. max_entropy == 0)

* reformat 't' case in llama_sample_queue

Co-authored-by: Jared Van Bortel <redacted>
* check for one or zero candidates case in llama_sample_entropy

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Jared Van Bortel <redacted>

commit | commitdiff | tree

Jared Van Bortel [Thu, 25 Jan 2024 19:51:24 +0000 (14:51 -0500)]

examples : make pydantic scripts pass mypy and support py3.8 (#5099)

commit | commitdiff | tree

Valentin Konovalov [Thu, 25 Jan 2024 17:05:51 +0000 (12:05 -0500)]

android : use release cmake build type by default (#5123)

commit | commitdiff | tree

Kawrakow [Thu, 25 Jan 2024 15:58:53 +0000 (17:58 +0200)]

Fix Q3_K_XS for MoE models (#5113)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 25 Jan 2024 09:26:17 +0000 (11:26 +0200)]

metal : show compile log messages

commit | commitdiff | tree

Engininja2 [Wed, 24 Jan 2024 22:18:15 +0000 (16:18 -0600)]

cuda : fix 2-bit quants on amd hip (#5105)

* cuda : fix 2-bit quants on amd hip

* use __low2float intrinsic function for new quants

commit | commitdiff | tree

Michael Hueschen [Mon, 22 Jan 2024 23:44:10 +0000 (16:44 -0700)]

nix-shell: use addToSearchPath

thx to @SomeoneSerge for the suggestion!

commit | commitdiff | tree

Michael Hueschen [Mon, 22 Jan 2024 10:17:05 +0000 (03:17 -0700)]

nix: add cc to devShell LD_LIBRARY_PATH

this fixes the error I encountered when trying to run the convert.py
script in a venv:

```
$ nix develop

[...]$ source .venv/bin/activate
(.venv)
[...]$ pip3 install -r requirements.txt
<... clipped ...>
[...]$ python3 ./convert.py
Traceback (most recent call last):
  File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module>
    from sentencepiece import SentencePieceProcessor
  File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module>
    from . import _sentencepiece
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory
```

however, I am not sure this is the cleanest way to address this linker
issue...

commit | commitdiff | tree

slaren [Wed, 24 Jan 2024 11:48:14 +0000 (12:48 +0100)]

llama : pre-allocate input tensors in a separate buffer (#5100)

commit | commitdiff | tree

Georgi Gerganov [Tue, 23 Jan 2024 13:50:56 +0000 (15:50 +0200)]

metal : disable support for MUL_MAT F32 x F16

commit | commitdiff | tree

Kawrakow [Tue, 23 Jan 2024 13:17:20 +0000 (15:17 +0200)]

Additional KL-divergence statistics (#5081)

* perplexity: add top-token probability

* perplexity: add additional KL-divergence statistics

* perplexity: a better organized KL-divergence statistics output

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 23 Jan 2024 12:31:56 +0000 (13:31 +0100)]

CUDA: more info when no device code (#5088)

commit | commitdiff | tree

Georgi Gerganov [Tue, 23 Jan 2024 12:12:57 +0000 (14:12 +0200)]

minor : clean-up some warnings and style (#5094)

* minor : clean-up some warnings and style

ggml-ci

* ggml : add comment

commit | commitdiff | tree

Xuan Son Nguyen [Tue, 23 Jan 2024 07:11:39 +0000 (08:11 +0100)]

devops : add intel oneapi dockerfile (#5068)

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Michael Coppola [Tue, 23 Jan 2024 06:51:27 +0000 (01:51 -0500)]

llama.vim : added api key support (#5090)

Co-authored-by: Michael Coppola <redacted>

commit | commitdiff | tree

slaren [Mon, 22 Jan 2024 22:42:41 +0000 (23:42 +0100)]

llama : fix not enough space in buffer with Qwen (#5086)

commit | commitdiff | tree

Kawrakow [Mon, 22 Jan 2024 14:10:14 +0000 (16:10 +0200)]

KL-divergence (#5076)

* kl-divergence: be able to save all logits to a file

* Add ability to compute KL-divergence

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Reinforce-II [Mon, 22 Jan 2024 13:15:08 +0000 (21:15 +0800)]

ggml : parallelize FP32 conversion when using BLAS (#5045)

* make GGML_TASK_INIT phase can be run in multithread

* multithreaded dequantize in mul_mat when using blas library

* minor fixes

* update outdated comment
* fix coding style

* simplify code

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

XiaotaoChen [Mon, 22 Jan 2024 13:09:35 +0000 (21:09 +0800)]

llava : MobileVLM support (#4954)

* MobileVLM native implementation

* delete depthwise_conv_2d and permute_cpy relative code, replace the two by the existed functions, and opt ldp definition, support LLAMA_PERF option for CMake

* move android script to example/llava directory

* Fix the editor config checks

---------

Co-authored-by: Chenxiaotao03 <redacted>

commit | commitdiff | tree

Someone Serge [Sun, 21 Jan 2024 03:41:37 +0000 (03:41 +0000)]

flake.nix: add a comment about flakes vs nix

commit | commitdiff | tree

Someone Serge [Sun, 21 Jan 2024 03:29:38 +0000 (03:29 +0000)]

nix: add a comment on the many nixpkgs-with-cuda instances

commit | commitdiff | tree

Someone Serge [Sun, 21 Jan 2024 03:15:13 +0000 (03:15 +0000)]

nix: add a comment about makeScope

commit | commitdiff | tree

Someone Serge [Sat, 13 Jan 2024 17:45:01 +0000 (17:45 +0000)]

nix: refactor the cleanSource rules

commit | commitdiff | tree

Someone Serge [Sat, 13 Jan 2024 17:38:32 +0000 (17:38 +0000)]

workflows: nix-ci: drop the redundant "paths" filter

commit | commitdiff | tree

Someone Serge [Sat, 13 Jan 2024 17:16:54 +0000 (17:16 +0000)]

workflows: nix-build-aarch64: rate limit

commit | commitdiff | tree

Someone Serge [Sat, 13 Jan 2024 17:10:19 +0000 (17:10 +0000)]

workflows: nix-ci: rebuild on flake.lock updates

commit | commitdiff | tree

Kawrakow [Mon, 22 Jan 2024 12:18:43 +0000 (14:18 +0200)]

imatrix : keep intermediate imatrix results (#5077)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

compilade [Mon, 22 Jan 2024 11:21:52 +0000 (06:21 -0500)]

llama : support StableLM 2 1.6B (#5052)

* llama : support StableLM 2 1.6B

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}]

* convert : refactor Qwen's set_vocab to use it for StableLM 2 too

* nix : add tiktoken to llama-python-extra

* convert : use presence of tokenizer.json to determine StableLM tokenizer loader

It's a less arbitrary heuristic than the vocab size.

commit | commitdiff | tree

Daniel Bevenius [Mon, 22 Jan 2024 11:11:01 +0000 (12:11 +0100)]

finetune : print sample-start/include-sample-start (#5072)

This commit adds `--sample-start` and `--include-sample-start` to the
output from the main function in finetune.cpp.

The motivation for this is that even though these are set explicitly by
the user via the command line, if one forgets to set them then it is
useful to have their values printed out. Otherwise it is possible to go
through the whole training process before realizing that the values are
not what one expected.

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Kawrakow [Mon, 22 Jan 2024 10:43:33 +0000 (12:43 +0200)]

llama : add Q3_K_XS (#5060)

* Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S

* Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K

Together with an importance matrix, this brings perplexity
for LLaMA-v2-70B below the perplexity of the former Q2_K
with a 800 MB smaller quantized model size.

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

bobqianic [Mon, 22 Jan 2024 08:55:05 +0000 (08:55 +0000)]

ci : fix Windows CI by updating Intel SDE version (#5053)

commit | commitdiff | tree

Shijie [Mon, 22 Jan 2024 07:33:19 +0000 (15:33 +0800)]

llama : add more qwen2 models (#5071)

commit | commitdiff | tree

iSma [Sun, 21 Jan 2024 21:37:13 +0000 (22:37 +0100)]

Revert LLAMA_NATIVE to OFF in flake.nix (#5066)

commit | commitdiff | tree

kuronekosaiko [Sun, 21 Jan 2024 16:28:14 +0000 (00:28 +0800)]

add safetensors support to convert-lora-to-ggml.py (#5062)

* add safetensors support to convert-lora-to-ggml.py

* Update convert-lora-to-ggml.py

Remove white space in line 69.

commit | commitdiff | tree

bobqianic [Sun, 21 Jan 2024 15:17:35 +0000 (15:17 +0000)]

add `#include <string>` to unicode.h (#5051)

Co-authored-by: Jared Van Bortel <redacted>

commit | commitdiff | tree

Kawrakow [Sun, 21 Jan 2024 12:42:44 +0000 (14:42 +0200)]

Add ability to evauate multiple choice tasks (#5047)

* TruthfulQA: 1st attempt, does not look like it is working

The same implementation can be used for HellaSwag as well,
so I converted a HellaSwag validation dataset to the binary
format used here and tested with that. The score is only
around 50, so something is not quite right.

* TruthfulQA: works but the result is bad

I know it works because if I convert the HellaSwag validation
data to the binary format used in the truthful_qa_score() function
I get the exact same result as from the hellaswag_score() function.
But I guess, the questions are tricky and the way I have done
the combination of question + answer is very likely not the best.
The TruthfulQA validation dataset contains 817 questions, with
random chance result around 19%. With this version I get
29.1% for Mistral-7B and 55.2% for Mistral-7B-Instruct-v0.2.
The HF leader board results for these two models are
42.2% and 68.3%, respectively.

* TruthfulQA: fix random sample

* TruthfulQA: prepare tasks in parallel for large test datasets

* Rename truthful_qa to multiple_choice

* Make MSVC happy

I had forgotten that MSVC does not make constexpr's available
inside a lambda.

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Kawrakow [Sun, 21 Jan 2024 06:01:20 +0000 (08:01 +0200)]

Slightly faster imatrix (#5050)

* imatrix: speedup by avoiding unnecessary allocations and copies

* imatrix: add --no-ppl option to skip PPL calculations altogether

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 21 Jan 2024 03:17:27 +0000 (05:17 +0200)]

flake.lock: Update (#5054)

Flake lock file updates:

• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/9b19f5e77dd906cb52dade0b7bd280339d2a1f3d' (2024-01-13)
→ 'github:NixOS/nixpkgs/bbe7d8f876fbbe7c959c90ba2ae2852220573261' (2024-01-19)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

Jared Van Bortel [Sat, 20 Jan 2024 23:14:18 +0000 (18:14 -0500)]

convert : partially revert PR #4818 (#5041)

commit | commitdiff | tree

Jared Van Bortel [Sat, 20 Jan 2024 15:08:08 +0000 (10:08 -0500)]

perplexity : fix MSVC build after #5020 (#5043)

* perplexity : fix MSVC build after #5020

* try a differerent fix

commit | commitdiff | tree

slaren [Sat, 20 Jan 2024 15:05:49 +0000 (16:05 +0100)]

llama : run all KQV ops on the CPU with no KV offload (#5049)

ggml-ci

commit | commitdiff | tree

Herman Semenov [Sat, 20 Jan 2024 08:11:31 +0000 (08:11 +0000)]

cmake : add support for ccache (#5002)

* Added support ccache for speedup recompilation

* cmake : option to disable ccache

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

adel boussaken [Sat, 20 Jan 2024 08:05:43 +0000 (09:05 +0100)]

Add a dart/flutter binding to README.md (#4882)

commit | commitdiff | tree

Kylin [Sat, 20 Jan 2024 07:01:46 +0000 (15:01 +0800)]

cuda : fix compile error in jetson platform (#4975)

* cuda: fix compile error in jetson platform

* cuda: update comment in ggml-cuda.cu

* cuda: update ggml-cuda.cu comment

commit | commitdiff | tree

Uzo Nweke [Fri, 19 Jan 2024 18:20:50 +0000 (13:20 -0500)]

finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033)

* Fix issue with alloc causing max_compute_size to be calculated

* remove ggml_allocr_free as suggested in issue #4791

commit | commitdiff | tree

Georgi Gerganov [Fri, 19 Jan 2024 13:24:47 +0000 (15:24 +0200)]

imatrix : add README.md

commit | commitdiff | tree

Shijie [Fri, 19 Jan 2024 11:53:13 +0000 (19:53 +0800)]

llama : support upcoming Qwen2 (#5037)

commit | commitdiff | tree

Georgi Gerganov [Fri, 19 Jan 2024 11:52:22 +0000 (13:52 +0200)]

py : fix flake8 lint

commit | commitdiff | tree

Kawrakow [Fri, 19 Jan 2024 09:39:11 +0000 (11:39 +0200)]

winogrande: evaluate log-probs in parallel (#5036)

This is a relatively minor performance tweak resulting in
~10% speedup on my system.

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

chiranko [Fri, 19 Jan 2024 09:07:27 +0000 (17:07 +0800)]

llama : add CodeShell support (#5016)

* llama: add codeshell support

* llama.cpp: fix codeshell with NeoX rope

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Kawrakow [Fri, 19 Jan 2024 09:02:39 +0000 (11:02 +0200)]

perplexity: avoid unnecessary alloocations and logit copies (#5035)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 19 Jan 2024 08:45:06 +0000 (10:45 +0200)]

perplexity : faster Winogrande via batching (#5024)

* perplexity : faster Winogrande via batching

ggml-ci

* perplexity : remove unused function

* perplexity : only tokenize selected tasks for Winogrande

commit | commitdiff | tree

John [Thu, 18 Jan 2024 22:12:15 +0000 (23:12 +0100)]

llama : fix falcon arch for tied output embeddings (#4978)

* falcon arch fix for tied output embeddings

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 21:36:07 +0000 (23:36 +0200)]

cmake : add ggml public headers (#5011)

commit | commitdiff | tree

Xuan Son Nguyen [Thu, 18 Jan 2024 20:33:05 +0000 (21:33 +0100)]

server : defer tasks when "slot unavailable" (#5018)

* server: defer task when no slot is available

* remove unnecessary log

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

slaren [Thu, 18 Jan 2024 20:12:15 +0000 (21:12 +0100)]

llama : fix mlock with no-mmap with Metal (#5025)

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 19:45:51 +0000 (21:45 +0200)]

imatrix : fix assert for src0 non-cont check

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 18:49:00 +0000 (20:49 +0200)]

perplexity : fix winogrande N tasks option

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 18:45:39 +0000 (20:45 +0200)]

scripts : add get-winogrande.sh

commit | commitdiff | tree

David Sommers [Thu, 18 Jan 2024 17:20:59 +0000 (12:20 -0500)]

convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019)

PR #4818 (merged last week) reintroduced a config check for vocab_size that was addressed in PR #4258 (merged 2023-11-30).

Without the fix, llama2 models can't be converted. The error is:

`ValueError: The model's vocab size is set to -1 in params.json. Please update it manually. Maybe 32000?`

commit | commitdiff | tree

Kawrakow [Thu, 18 Jan 2024 17:18:21 +0000 (19:18 +0200)]

HellaSwag: speed up by parallelizing log-prob evaluation (#5020)

For Mistral-7B and fp16, time on my system goes down from 536 seconds
to 423 seconds for the full evaluation dataset (10042 tasks).

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 13:33:01 +0000 (15:33 +0200)]

perplexity : faster HellaSwag via batching (#5017)

* perplexity : faster HellaSwag

ggml-ci

* perplexity : clean-up

ggml-ci

* perplexity : no need for decode_helper

ggml-ci

* perplexity : add comments

* perplexity : option to specify max batched tasks via `n_parallel`

* perplexity : remove HellaSwag restruction for n_batch

commit | commitdiff | tree

Kawrakow [Thu, 18 Jan 2024 11:46:27 +0000 (13:46 +0200)]

Add Winogrande evaluation (#5015)

* winogrande: simple implementation

It doesn't look like it is working - why?
For Mistral-7B it is barely better than
random chance (score ~60% for 1267 tasks), while I see
Mistral-7B scoring 78.4% on the HF leader board.
1-sigma statistical uncertainty for 1267 tasks is ~1.4,
so no way the difference is due to statistics.

* winogrande: somewhat better

Score for Mistrali7-B is now 68.9 on the validation set of
winogrande_debiased. Still far from the reported 78.4, but
better than what I had before.

* winogrande: improving

Mistral-7B score is now 73.56.
Still not quite 78.4 but getting there.
We are also getting a lower score on HellaSwag
compared to HF leader board, so I'm not expecting
we will get up to 78.4 anyway.

It looks like it is better to skip the choice word(s)
when evaluating the average log-likelihood. This kind of
makes sense because a more common word (in Winogrande this is
often a name) will have a higher probability without knowing
about the follow up context, and this will skew the log-likelihood
towards the more common word. We can only do this if the
choice words are not last in the sentence.

It also looks like it is better to skip the punctuation at the
end of the sentence, provided the choice words are not last.

* winogrande: add dataset instructions

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 18 Jan 2024 09:44:49 +0000 (11:44 +0200)]

scritps : add helper script to get hellaswag data in txt format

commit | commitdiff | tree

Paul Tsochantaris [Thu, 18 Jan 2024 08:47:24 +0000 (08:47 +0000)]

metal : fix memory leak, dangling pointer and unused autorel (#5007)

* Metal memory: Small memory leak on init, dangling pointer, and unused autorelease pool in graph compute

* SPM header potential fix

* Reverting symlinks

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 18:54:50 +0000 (20:54 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:54:56 +0000 (18:54 +0200)]

ggml : add IQ2 to test-backend-ops + refactoring (#4990)

* ggml : add IQ2 to test-backend-ops + refactoring

ggml-ci

* cuda : update supports_op for IQ2

ggml-ci

* ci : enable LLAMA_CUBLAS=1 for CUDA nodes

ggml-ci

* cuda : fix out-of-bounds-access in `mul_mat_vec_q`

ggml-ci

* tests : avoid creating RNGs for each Q tensor

ggml-ci

* tests : avoid creating RNGs for each tensor

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:46:30 +0000 (18:46 +0200)]

imatrix : offload to GPU support (#4957)

* backend : add eval callback

ggml-ci

* backend : group nodes in a single compute when user don't need them

* backend : clean-up the implementation

ggml-ci

* simple : do not perform tensor data copy if not needed

* simple : fix

* imatrix : offload to GPU support

* imatrix : fix ggml_mul_mat_id hanlding

ggml-ci

* ci : add imatrix test

ggml-ci

* ci : rearrange output

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:39:41 +0000 (18:39 +0200)]

backend : add eval callback (#4935)

* backend : add eval callback

ggml-ci

* backend : group nodes in a single compute when user don't need them

* backend : clean-up the implementation

ggml-ci

* simple : do not perform tensor data copy if not needed

* simple : fix

* simple : no need for ggml_is_contiguous + fix bool parse

* llama : fix callback placement in llama_context_params

* backend : avoid double-ask callback calls

* simple : restore examples, imatrix will serve as a demo

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:38:39 +0000 (18:38 +0200)]

metal : create autorelease pool during library build (#4970)

* metal : create autorelease pool during library build

ggml-ci

* test : simplify

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:37:36 +0000 (18:37 +0200)]

py : fix whitespace

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 13:45:03 +0000 (15:45 +0200)]

py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971)

* py : fix missing added_tokens_dict for SPM vocab

* py : pad with unknown tokens when data is missing

ggml-ci

* py : fix BPE vocab conversion

ggml-ci

* py : fix padded dummy tokens (I hope)

commit | commitdiff | tree

Kawrakow [Wed, 17 Jan 2024 10:36:37 +0000 (12:36 +0200)]

llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Paul Tsochantaris [Wed, 17 Jan 2024 08:07:24 +0000 (08:07 +0000)]

metal : remove unnecessary nil check (#4986)

commit | commitdiff | tree

David Renshaw [Wed, 17 Jan 2024 07:17:50 +0000 (02:17 -0500)]

llama : fix copy/paste error in llama_sampling_params comment (#4994)

commit | commitdiff | tree

Georgi Gerganov [Tue, 16 Jan 2024 18:59:31 +0000 (20:59 +0200)]

py : remove unnecessary hasattr (#4903)

commit | commitdiff | tree

Philip Taron [Tue, 16 Jan 2024 17:56:21 +0000 (09:56 -0800)]

nix: remove nixConfig from flake.nix (#4984)

commit | commitdiff | tree

Daniel Bevenius [Tue, 16 Jan 2024 17:54:24 +0000 (18:54 +0100)]

finetune : add training data file to log message (#4979)

This commit adds the name of the training data file to the log message
printed when the training data is tokenized.

The motivation for this change is that it can be useful to show which
file is being tokenized when running the finetune example.

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Kawrakow [Tue, 16 Jan 2024 17:51:26 +0000 (19:51 +0200)]

ggml : importance matrix support for legacy quants (#4969)

* imatrix: adding support for legacy quants

* imatrix: guard Q4_0/Q5_0 against ffn_down craziness

---------

Co-authored-by: Iwan Kawrakow <redacted>

Packaging of ggml-org/llama.cpp