git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]

sycl : Add support for non-release DPC++ & oneMKL (#8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 23 Jul 2024 10:10:17 +0000 (13:10 +0300)]

llama : move vocab, grammar and sampling into separate files (#8508)

* llama : move sampling code into llama-sampling

ggml-ci

* llama : move grammar code into llama-grammar

ggml-ci

* cont

ggml-ci

* cont : pre-fetch rules

* cont

ggml-ci

* llama : deprecate llama_sample_grammar

* llama : move tokenizers into llama-vocab

ggml-ci

* make : update llama.cpp deps [no ci]

* llama : redirect external API to internal APIs

ggml-ci

* llama : suffix the internal APIs with "_impl"

ggml-ci

* llama : clean-up

commit | commitdiff | tree

0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]

Vulkan IQ4_NL Support (#8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

commit | commitdiff | tree

Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]

Allow all RDNA2 archs to use sdot4 intrinsic (#8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

commit | commitdiff | tree

Georgi Gerganov [Tue, 23 Jul 2024 08:28:38 +0000 (11:28 +0300)]

contrib : clarify PR squashing + module names (#8630)

* contrib : clarify PR squashing

* contrib : fix typo + add list of modules

commit | commitdiff | tree

luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]

[SYCL] fix scratch size of softmax (#8642)

commit | commitdiff | tree

Keke Han [Mon, 22 Jul 2024 16:43:43 +0000 (00:43 +0800)]

llama : fix codeshell support (#8599)

* llama : fix codeshell support

* llama : move codeshell after smollm below to respect the enum order

commit | commitdiff | tree

Jason Stillerman [Mon, 22 Jul 2024 14:43:01 +0000 (10:43 -0400)]

llama : add support for SmolLm pre-tokenizer (#8609)

* Adding SmolLM Pre Tokenizer

* Update convert_hf_to_gguf_update.py

Co-authored-by: compilade <redacted>
* Update src/llama.cpp

Co-authored-by: compilade <redacted>
* handle regex

* removed .inp and out .out ggufs

---------

Co-authored-by: compilade <redacted>

commit | commitdiff | tree

Jiří Podivín [Mon, 22 Jul 2024 13:44:53 +0000 (15:44 +0200)]

*.py: Stylistic adjustments for python (#8233)

* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 22 Jul 2024 10:33:22 +0000 (13:33 +0300)]

llama : allow overrides for tokenizer flags (#8614)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 22 Jul 2024 10:32:49 +0000 (13:32 +0300)]

tests : re-enable tokenizer tests (#8611)

* models : remove duplicated gpt-2 vocab

* models : remove old stablelm vocab

* tests : re-enable MPT tokenizer tests

* tests : re-enable DeepSeek tokenizer tests

* cmake : sort

ggml-ci

commit | commitdiff | tree

Douglas Hanley [Mon, 22 Jul 2024 08:06:17 +0000 (03:06 -0500)]

llama : add Mistral Nemo inference support (#8604)

commit | commitdiff | tree

Jan Boon [Mon, 22 Jul 2024 08:02:09 +0000 (16:02 +0800)]

server : update doc to clarify n_keep when there is bos token (#8619)

commit | commitdiff | tree

Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]

ggml: fix compile error for RISC-V (#8623)

commit | commitdiff | tree

devojony [Mon, 22 Jul 2024 06:54:42 +0000 (14:54 +0800)]

examples: fix android example cannot be generated continuously (#8621)

When generation ends `completion_loop()` should return a NULL, not the empty string

commit | commitdiff | tree

Georgi Gerganov [Sun, 21 Jul 2024 13:45:10 +0000 (16:45 +0300)]

flake.lock: Update (#8610)

commit | commitdiff | tree

M-A [Sun, 21 Jul 2024 02:09:17 +0000 (22:09 -0400)]

examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)

Changes:

- Move each example into its own function. This makes the code much
  easier to read and understand.
- Make the program easy to only run one test by commenting out function
  calls in main().
- Make the output easy to parse by indenting the output for each example.
- Add shebang and +x bit to make it clear it's an executable.
- Make the host configurable via --host with a default 127.0.0.1:8080.
- Make the code look in the tools list to call the registered tool,
  instead of hardcoding the returned values. This makes the code more
  copy-pastable.
- Add error checking, so that the program exits 1 if the LLM didn't
  returned expected values. It's super useful to check for correctness.

Testing:

- Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and
  Meta-Llama-3-8B-Instruct in F16 and Q5_K_M.
  - I did not observe a failure even once in Mistral-7B-Instruct-v0.3.
  - Llama-3 failed about a third of the time in example_concurrent: it
    only returned one call instead of 3. Even for F16.

Potential follow ups:

- Do not fix the prompt encoding yet. Surprisingly it mostly works even
  if the prompt encoding is not model optimized.
- Add chained answer and response.

Test only change.

commit | commitdiff | tree

compilade [Sun, 21 Jul 2024 01:58:49 +0000 (21:58 -0400)]

gguf-py : fix some metadata name extraction edge cases (#8591)

* gguf-py : fix some metadata name extraction edge cases

* convert_lora : use the lora dir for the model card path

* gguf-py : more metadata edge cases fixes

Multiple finetune versions are now joined together,
and the removal of the basename annotation on trailing versions
is more robust.

* gguf-py : add more name metadata extraction tests

* convert_lora : fix default filename

The default filename was previously hardcoded.

* convert_hf : Model.fname_out can no longer be None

* gguf-py : do not use title case for naming convention

Some models use acronyms in lowercase,
which can't be title-cased like other words,
so it's best to simply use the same case
as in the original model name.

Note that the size label still has an uppercased suffix
to make it distinguishable from the context size of a finetune.

commit | commitdiff | tree

compilade [Sun, 21 Jul 2024 01:53:01 +0000 (21:53 -0400)]

convert_hf : fix Gemma v1 conversion (#8597)

* convert_hf : fix Gemma v1 conversion

* convert_hf : allow renaming tokens, but with a warning

* convert_hf : fix Gemma v1 not setting BOS and EOS tokens

commit | commitdiff | tree

Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]

CUDA: MMQ code deduplication + iquant support (#8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]

gguf : handle null name during init (#8587)

commit | commitdiff | tree

Michael Coppola [Sat, 20 Jul 2024 13:43:51 +0000 (09:43 -0400)]

llama : add support for Tekken pre-tokenizer (#8579)

* llama : Added support for Tekken pre-tokenizer (#8577)

Removed uneeded `vocab.tokenizer_clean_spaces` assignment

* llama : fix order of pre-tokenizers

* * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
* Updated chkhsh for Tekken tokenizer

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Huifeng Ou [Sat, 20 Jul 2024 13:09:37 +0000 (09:09 -0400)]

llama.swiftui: fix end of generation bug (#8268)

* fix continuing generating blank lines after getting EOT token or EOS token from LLM

* change variable name to is_done (variable name suggested by ggerganov)

* minor : fix trailing whitespace

* minor : add space

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Brian [Sat, 20 Jul 2024 07:35:25 +0000 (17:35 +1000)]

gguf_dump.py: fix markddown kv array print (#8588)

* gguf_dump.py: fix markddown kv array print

* Update gguf-py/scripts/gguf_dump.py

Co-authored-by: compilade <redacted>
* gguf_dump.py: refactor kv array string handling

* gguf_dump.py: escape backticks inside of strings

* gguf_dump.py: inline code markdown escape handler added

>>> escape_markdown_inline_code("hello world")
'`hello world`'
>>> escape_markdown_inline_code("hello ` world")
'``hello ` world``'

* gguf_dump.py: handle edge case about backticks on start or end of a string

---------

Co-authored-by: compilade <redacted>

commit | commitdiff | tree

slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]

ggml : fix quant dot product with odd number of blocks (#8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (#8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Brian [Fri, 19 Jul 2024 14:04:38 +0000 (00:04 +1000)]

convert-*.py: remove add_name from ChatGLMModel class (#8590)

commit | commitdiff | tree

Georgi Gerganov [Fri, 19 Jul 2024 13:50:47 +0000 (16:50 +0300)]

llama : bump max layers from 256 to 512 (#8530)

* llama : bump max layers from 256 to 512

* llama : replace asserts with exceptions

commit | commitdiff | tree

Georgi Gerganov [Fri, 19 Jul 2024 11:34:55 +0000 (14:34 +0300)]

readme : fix server badge

commit | commitdiff | tree

Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]

ggml : add friendlier error message to fopen errors (#8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

commit | commitdiff | tree

Frank Mai [Fri, 19 Jul 2024 09:44:41 +0000 (17:44 +0800)]

fix: typo of chatglm4 chat tmpl (#8586)

Signed-off-by: thxCode <redacted>

commit | commitdiff | tree

Brian [Fri, 19 Jul 2024 07:51:51 +0000 (17:51 +1000)]

convert-*.py: add general.name kv override (#8571)

commit | commitdiff | tree

Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]

CUDA: fix partial offloading for ne0 % 256 != 0 (#8572)

commit | commitdiff | tree

65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]

cmake : install all ggml public headers (#8480)

Co-authored-by: 65a <redacted>

commit | commitdiff | tree

Eric Zhang [Thu, 18 Jul 2024 10:43:49 +0000 (18:43 +0800)]

server: use relative routes for static files in new UI (#8552)

* server: public: fix api_url on non-index pages

* server: public: use relative routes for static files in new UI

commit | commitdiff | tree

Brian [Thu, 18 Jul 2024 10:40:15 +0000 (20:40 +1000)]

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499)

Main thing is that the default output filename will take this form

{name}{parameters}{finetune}{version}{encoding}{kind}

In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content

* No Change:
  - Internal GGUF Spec
    - `general.architecture`
    - `general.quantization_version`
    - `general.alignment`
    - `general.file_type`
  - General Model Details
    - `general.name`
    - `general.author`
    - `general.version`
    - `general.description`
  - Licensing details
    - `general.license`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.url`
  - Model Source during conversion
    - `general.source.url`

* Removed:
  - Model Source during conversion
    - `general.source.huggingface.repository`

* Added:
  - General Model Details
    - `general.organization`
    - `general.finetune`
    - `general.basename`
    - `general.quantized_by`
    - `general.size_label`
  - Licensing details
    - `general.license.name`
    - `general.license.link`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.doi`
    - `general.uuid`
    - `general.repo_url`
  - Model Source during conversion
    - `general.source.doi`
    - `general.source.uuid`
    - `general.source.repo_url`
  - Base Model Source
    - `general.base_model.count`
    - `general.base_model.{id}.name`
    - `general.base_model.{id}.author`
    - `general.base_model.{id}.version`
    - `general.base_model.{id}.organization`
    - `general.base_model.{id}.url` (Model Website/Paper)
    - `general.base_model.{id}.doi`
    - `general.base_model.{id}.uuid`
    - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...))
  - Array based KV stores
    - `general.tags`
    - `general.languages`
    - `general.datasets`

---------

Co-authored-by: compilade <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

RunningLeon [Thu, 18 Jul 2024 08:06:22 +0000 (16:06 +0800)]

server : respect `--special` cli arg (#8553)

commit | commitdiff | tree

Johannes Gäßler [Wed, 17 Jul 2024 21:35:44 +0000 (23:35 +0200)]

lookup: fibonacci hashing, fix crashes (#8548)

commit | commitdiff | tree

Al Mochkin [Wed, 17 Jul 2024 18:21:55 +0000 (20:21 +0200)]

build : Fix docker build warnings (#8535) (#8537)

commit | commitdiff | tree

Brian [Wed, 17 Jul 2024 14:57:06 +0000 (00:57 +1000)]

CONTRIBUTING.md : remove mention of noci (#8541)

commit | commitdiff | tree

hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]

[CANN] Add Ascend NPU backend (#6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>

commit | commitdiff | tree

Masaya, Kato [Wed, 17 Jul 2024 07:34:28 +0000 (16:34 +0900)]

batched: fix n_predict parameter (#8527)

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jul 2024 07:32:59 +0000 (10:32 +0300)]

llama : disable context-shift for DeepSeek v2 (#8501)

commit | commitdiff | tree

Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]

make/cmake: add missing force MMQ/cuBLAS for HIP (#8515)

commit | commitdiff | tree

Brian [Tue, 16 Jul 2024 07:14:16 +0000 (17:14 +1000)]

gguf-hash : update clib.json to point to original xxhash repo (#8491)

* Update clib.json to point to Cyan4973 original xxhash

Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata

https://github.com/Cyan4973/xxHash/pull/954

* gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]

commit | commitdiff | tree

Steve Bonds [Tue, 16 Jul 2024 07:04:45 +0000 (00:04 -0700)]

export-lora : handle help argument (#8497)

The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.

commit | commitdiff | tree

Georgi Gerganov [Tue, 16 Jul 2024 07:00:30 +0000 (10:00 +0300)]

llama : valign + remove unused ftype (#8502)

commit | commitdiff | tree

compilade [Tue, 16 Jul 2024 03:13:10 +0000 (23:13 -0400)]

convert_hf : faster lazy safetensors (#8482)

* convert_hf : faster lazy safetensors

This makes '--dry-run' much, much faster.

* convert_hf : fix memory leak in lazy MoE conversion

The '_lazy' queue was sometimes self-referential,
which caused reference cycles of objects old enough
to avoid garbage collection until potential memory exhaustion.

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]

Refactor lora adapter support (#8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 17:23:10 +0000 (19:23 +0200)]

fix ci (#8494)

commit | commitdiff | tree

Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]

ggml : suppress unknown pragma 'GCC' on windows (#8460)

This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```

commit | commitdiff | tree

M-A [Mon, 15 Jul 2024 12:04:56 +0000 (08:04 -0400)]

server: update README.md with llama-server --help output [no ci] (#8472)

The README.md had a stale information. In particular, the --ctx-size
"defaults to 512" confused me and I had to check the code to confirm
this was false. This the server is evolving rapidly, it's probably
better to keep the source of truth at a single place (in the source) and
generate the README.md based on that.

Did:

    make llama-server
    ./llama-server --help > t.txt
    vimdiff t.txt examples/server/README.md

I copied the content inside a backquote block. I would have preferred
proper text but it would require a fair amount of surgery to make the
current output compatible with markdown. A follow up could be to
automate this process with a script.

No functional change.

commit | commitdiff | tree

Georgi Gerganov [Mon, 15 Jul 2024 11:54:58 +0000 (14:54 +0300)]

common : add --no-cont-batching arg (#6358)

commit | commitdiff | tree

NikolaiLyssogor [Mon, 15 Jul 2024 11:46:39 +0000 (04:46 -0700)]

docs: fix links in development docs [no ci] (#8481)

Fixes a few links to within the repo that were broken in the reorganization of the
documentation in #8325.

commit | commitdiff | tree

Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]

[SYCL] add concat through dim 1/2 (#8483)

* add concat through dim 1/2

commit | commitdiff | tree

Georgi Gerganov [Mon, 15 Jul 2024 11:10:39 +0000 (14:10 +0300)]

llama : de-duplicate deepseek2 norm

commit | commitdiff | tree

0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]

Vulkan MMQ Fix (#8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

commit | commitdiff | tree

compilade [Sun, 14 Jul 2024 23:51:21 +0000 (19:51 -0400)]

pydantic : replace uses of __annotations__ with get_type_hints (#8474)

* pydantic : replace uses of __annotations__ with get_type_hints

* pydantic : fix Python 3.9 and 3.10 support

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jul 2024 15:54:02 +0000 (18:54 +0300)]

flake.lock: Update (#8475)

Flake lock file updates:

• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03)
→ 'github:NixOS/nixpkgs/7e7c39ea35c5cdd002cd4588b03a3fb9ece6fad9?narHash=sha256-EYekUHJE2gxeo2pM/zM9Wlqw1Uw2XTJXOSAO79ksc4Y%3D' (2024-07-12)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jul 2024 11:05:09 +0000 (14:05 +0300)]

llama : fix Gemma-2 Query scaling factors (#8473)

* 9B - query_pre_attn_scalar = 256 not 224

See https://github.com/google/gemma_pytorch/commit/03e657582d17cb5a8617ebf333c1c16f3694670e

Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

* llama : fix Gemma-2 Query scaling factor

ggml-ci

---------

Co-authored-by: Daniel Han <redacted>

commit | commitdiff | tree

Brian [Sun, 14 Jul 2024 06:47:14 +0000 (16:47 +1000)]

gguf_hash.py: Add sha256 (#8470)

* gguf_hash.py: Add sha256

* gguf_hash.py: rename string UUIDv5 --> uuid

* Apply suggestions from code review

Co-authored-by: compilade <redacted>
---------

Co-authored-by: compilade <redacted>

commit | commitdiff | tree

compilade [Sun, 14 Jul 2024 03:35:10 +0000 (23:35 -0400)]

llama : fix pre-tokenization of non-special added tokens (#8228)

* llama : fix mpt and olmo pre-tokenizer

* llama : pre-tokenize non-special user-defined tokens first

* llama : fix detection of control-like user-defined tokens

* convert_hf : identify which user-defined tokens are control tokens

Only used in _set_vocab_gpt2() for now.

* convert_hf : identify more added control tokens for SPM tokenziers

This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly,
including HTML tags and consecutive spaces,
but it unfortunately requires model re-conversion.

There seems to be a weird behavior of the HF tokenizer for Gemma,
which prefers to use the 16-space token over more lengthy space tokens,
while using the SentencePiece tokenizer does not do this.
(the implementation in llama.cpp has the same behavior as SentencePiece)

* llama : fix wrong pre-tokenization of byte tokens

* llama : fix Viking pre-tokenizer regex

The order was previously wrong, which caused errors in some tests.

* llama : fix command-r detokenization

* convert_hf : reduce usages of the UNKNOWN token type

* llama : add UNKNOWN tokens in the special tokens cache

* convert_hf : reduce usages of UNKNOWN for InternLM2

This makes the changes from #8321 more consistent
with the other changes made here.

* test-tokenizer-random : reduce potential confilcts with #8379

* test-tokenizer-random : add a failing edge case for falcon

commit | commitdiff | tree

bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]

vulkan : cmake integration (#8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]

metal : template-ify some of the kernels (#8447)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 11:48:15 +0000 (14:48 +0300)]

server : handle content array in chat API (#8449)

* server : handle content array in chat API

* Update examples/server/utils.hpp

Co-authored-by: Xuan Son Nguyen <redacted>
---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 11:48:04 +0000 (14:48 +0300)]

main : print error on empty input (#8456)

commit | commitdiff | tree

Daniel Bevenius [Fri, 12 Jul 2024 09:05:21 +0000 (11:05 +0200)]

llama : suppress unary minus operator warning (#8448)

This commit updates the _try_copy lambda and moves the unary minus
operator to after the cast to int32_t.

The motivation for this that currently the following warning is
generated on windows:

```console
llama.cpp\src\llama.cpp(21147,30): warning C4146: unary minus operator
applied to unsigned type, result still unsigned
```

commit | commitdiff | tree

Douglas Hanley [Fri, 12 Jul 2024 08:14:12 +0000 (03:14 -0500)]

server : ensure batches are either all embed or all completion (#8420)

* make sure batches are all embed or all non-embed

* non-embedding batch for sampled tokens; fix unused params warning

commit | commitdiff | tree

Armen Kaleshian [Fri, 12 Jul 2024 08:08:19 +0000 (04:08 -0400)]

docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441)

Commit b0a4699 changed the name of this script from convert-hf-to-gguf.py to
convert_hf_to_gguf.py breaking how convert is called from within a Docker
container.

commit | commitdiff | tree

Jiří Podivín [Fri, 12 Jul 2024 08:06:33 +0000 (10:06 +0200)]

convert : remove fsep token from GPTRefactForCausalLM (#8237)

The <filename> token used by Refact doesn't serve
the same purpose as the <file_separator> from CodeGemma.

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 07:46:14 +0000 (10:46 +0300)]

examples : sprintf -> snprintf (#8434)

* examples : sprintf -> snprintf

ggml-ci

* examples : use sizeof() instead of hardcoded constants

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]

ggml : minor naming changes (#8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

commit | commitdiff | tree

Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]

[SYCL] fix the mul_mat_id ut issues (#8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>

commit | commitdiff | tree

Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]

ggml : add NVPL BLAS support (#8329) (#8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]

cuda : suppress 'noreturn' warn in no_device_code (#8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <redacted>
---------

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)]

CUDA: optimize and refactor MMQ (#8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jul 2024 08:20:40 +0000 (11:20 +0300)]

gitignore : deprecated binaries

commit | commitdiff | tree

compilade [Thu, 11 Jul 2024 07:41:48 +0000 (03:41 -0400)]

tokenize : add --no-parse-special option (#8423)

This should allow more easily explaining
how parse_special affects tokenization.

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jul 2024 07:21:30 +0000 (10:21 +0300)]

llama : use F32 precision in Qwen2 attention and no FA (#8412)

commit | commitdiff | tree

Clint Herron [Thu, 11 Jul 2024 00:08:17 +0000 (20:08 -0400)]

Initialize default slot sampling parameters from the global context. (#8418)

commit | commitdiff | tree

Clint Herron [Wed, 10 Jul 2024 16:35:18 +0000 (12:35 -0400)]

Name Migration: Build the deprecation-warning 'main' binary every time (#8404)

* Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions.

* Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.

commit | commitdiff | tree

AidanBeltonS [Wed, 10 Jul 2024 15:10:49 +0000 (16:10 +0100)]

[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)

commit | commitdiff | tree

Georgi Gerganov [Wed, 10 Jul 2024 12:23:29 +0000 (15:23 +0300)]

ggml : move sgemm sources to llamafile subfolder (#8394)

ggml-ci

commit | commitdiff | tree

Dibakar Gope [Wed, 10 Jul 2024 12:14:51 +0000 (07:14 -0500)]

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

commit | commitdiff | tree

M. Yusuf Sarıgöz [Wed, 10 Jul 2024 12:12:35 +0000 (15:12 +0300)]

gguf-py rel pipeline (#8410)

* Upd gguf-py/readme

* Bump patch version for release

commit | commitdiff | tree

Borislav Stanimirov [Wed, 10 Jul 2024 11:45:44 +0000 (14:45 +0300)]

llama : C++20 compatibility for u8 strings (#8408)

commit | commitdiff | tree

Borislav Stanimirov [Wed, 10 Jul 2024 11:40:53 +0000 (14:40 +0300)]

msvc : silence codecvt c++17 deprecation warnings (#8395)

commit | commitdiff | tree

fairydreaming [Wed, 10 Jul 2024 11:38:58 +0000 (13:38 +0200)]

llama : add assert about missing llama_encode() call (#8400)

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

RunningLeon [Wed, 10 Jul 2024 11:26:40 +0000 (19:26 +0800)]

py : fix converter for internlm2 (#8321)

* update internlm2

* remove unused file

* fix lint

commit | commitdiff | tree

laik [Wed, 10 Jul 2024 11:19:10 +0000 (19:19 +0800)]

py : fix extra space in convert_hf_to_gguf.py (#8407)

commit | commitdiff | tree

Clint Herron [Tue, 9 Jul 2024 22:26:40 +0000 (18:26 -0400)]

Server: Enable setting default sampling parameters via command-line (#8402)

* Load server sampling parameters from the server context by default.

* Wordsmithing comment

commit | commitdiff | tree

Andy Salerno [Tue, 9 Jul 2024 18:58:44 +0000 (11:58 -0700)]

Update README.md to fix broken link to docs (#8399)

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

commit | commitdiff | tree

Clint Herron [Tue, 9 Jul 2024 15:54:43 +0000 (11:54 -0400)]

Deprecation warning to assist with migration to new binary names (#8283)

* Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames.

* Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

commit | commitdiff | tree

Johannes Gäßler [Tue, 9 Jul 2024 15:11:07 +0000 (17:11 +0200)]

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)

commit | commitdiff | tree

Alberto Cabrera Pérez [Tue, 9 Jul 2024 14:03:15 +0000 (15:03 +0100)]

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

commit | commitdiff | tree

Borislav Stanimirov [Tue, 9 Jul 2024 08:38:00 +0000 (11:38 +0300)]

cmake : allow external ggml (#8370)

commit | commitdiff | tree

daghanerdonmez [Tue, 9 Jul 2024 06:16:00 +0000 (09:16 +0300)]

readme : fix typo [no ci] (#8389)

Bakus-Naur --> Backus-Naur

commit | commitdiff | tree

compilade [Tue, 9 Jul 2024 05:04:49 +0000 (01:04 -0400)]

gguf-py : do not use internal numpy types (#7472)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 22:36:38 +0000 (01:36 +0300)]

flake.lock: Update (#8342)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01)
  → 'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7?narHash=sha256-pQMhCCHyQGRzdfAkdJ4cIWiw%2BJNuWsTX7f0ZYSyz0VY%3D' (2024-07-03)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01)
  → 'https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz?narHash=sha256-Fm2rDDs86sHy0/1jxTOKB1118Q0O3Uc7EC0iXvXKpbI%3D' (2024-07-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/b2852eb9365c6de48ffb0dc2c9562591f652242a?narHash=sha256-C8e9S7RzshSdHB7L%2Bv9I51af1gDM5unhJ2xO1ywxNH8%3D' (2024-06-27)
  → 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

Alberto Cabrera Pérez [Mon, 8 Jul 2024 20:35:17 +0000 (21:35 +0100)]

labeler : updated sycl to match docs and code refactor (#8373)

commit | commitdiff | tree

b4b4o [Mon, 8 Jul 2024 14:19:24 +0000 (22:19 +0800)]

readme : fix web link error [no ci] (#8347)

Packaging of ggml-org/llama.cpp

RSS Atom