git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

BarfingLemurs [Tue, 17 Oct 2023 18:13:21 +0000 (14:13 -0400)]

readme : update hot-topics & models, detail windows release in usage (#3615)

* Update README.md

* Update README.md

* Update README.md

* move "Running on Windows" section below "Prepare data and run"

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

shibe2 [Wed, 11 Oct 2023 17:30:06 +0000 (21:30 +0400)]

CLBlast: Fix temporary buffer size for f16 conversion (wsize)

Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.

commit | commitdiff | tree

slaren [Tue, 17 Oct 2023 17:00:58 +0000 (19:00 +0200)]

train-text-from-scratch : fix assert failure in ggml-alloc (#3618)

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 16:52:53 +0000 (19:52 +0300)]

editorconfig : remove trailing spaces

commit | commitdiff | tree

coezbek [Tue, 17 Oct 2023 16:51:02 +0000 (18:51 +0200)]

server : documentation of JSON return value of /completion endpoint (#3632)

* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 16:12:46 +0000 (19:12 +0300)]

save-load-state : fix example + add ci test (#3655)

* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci

commit | commitdiff | tree

ldwang [Tue, 17 Oct 2023 15:52:33 +0000 (23:52 +0800)]

readme : add Aquila2 links (#3610)

Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>

commit | commitdiff | tree

staviq [Tue, 17 Oct 2023 15:11:01 +0000 (17:11 +0200)]

tokenizer : special token handling (#3538)

* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 06:19:28 +0000 (09:19 +0300)]

k-quants : fix quantization ranges (#3646)

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Oct 2023 20:58:00 +0000 (23:58 +0300)]

llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)

* llava : fix tokenization to not add bos after system prompt

* set seed

---------

Co-authored-by: M. Yusuf Sarıgöz <redacted>

commit | commitdiff | tree

cebtenzzre [Sun, 15 Oct 2023 06:32:06 +0000 (02:32 -0400)]

MPT : support GQA for replit-code-v1.5 (#3627)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Sat, 14 Oct 2023 10:52:44 +0000 (13:52 +0300)]

Honor -ngl option for Cuda offloading in llava (#3621)

commit | commitdiff | tree

Daniel Bevenius [Fri, 13 Oct 2023 10:33:16 +0000 (12:33 +0200)]

llama : remove n_threads from llama_decode_internal (#3614)

This commit removes `n_threads` from the `llama_decode_internal`
functions doc comment as it does not exist anymore.

It looks like this parameter was removed in
Commit 16bc66d9479edd5ee12ec734973554d4493c5dfa ("llama.cpp : split
llama_context_params into model and context params").

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

slaren [Fri, 13 Oct 2023 10:23:10 +0000 (12:23 +0200)]

ggml : add context enumeration functions (#3605)

finetune : fix assert failure in ggml-alloc

commit | commitdiff | tree

shibe2 [Thu, 12 Oct 2023 19:59:47 +0000 (23:59 +0400)]

CLBlast: Fix matrix-vector multiplication (#3544)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Thu, 12 Oct 2023 15:23:18 +0000 (18:23 +0300)]

examples: support LLaVA v1.5 (multimodal model) (#3436)

* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

uint256_t [Thu, 12 Oct 2023 13:36:16 +0000 (22:36 +0900)]

docs : fix typo GOMP_CPU_AFFINITY (#3597)

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 11:31:05 +0000 (14:31 +0300)]

cmake : fix add_compile_options on macOS

commit | commitdiff | tree

Ian Scrivener [Thu, 12 Oct 2023 11:10:50 +0000 (22:10 +1100)]

typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)

fixed a typo in the MacOS Metal run doco

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 10:44:56 +0000 (13:44 +0300)]

ci : check if there is enough VRAM (#3596)

ggml-ci

commit | commitdiff | tree

Aarni Koskela [Thu, 12 Oct 2023 06:51:53 +0000 (15:51 +0900)]

server : add completion mode (no chat) (#3582)

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 06:35:19 +0000 (09:35 +0300)]

prompts : add mnemonics.txt

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 06:29:04 +0000 (09:29 +0300)]

server : fix kv cache management (#3588)

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Oct 2023 20:55:08 +0000 (23:55 +0300)]

main : fix session loading bug (#3400)

commit | commitdiff | tree

Michael Coppola [Wed, 11 Oct 2023 19:42:22 +0000 (15:42 -0400)]

server : add parameter -tb N, --threads-batch N (#3584)

Co-authored-by: Michael Coppola <redacted>

commit | commitdiff | tree

Kerfuffle [Wed, 11 Oct 2023 19:35:46 +0000 (13:35 -0600)]

common : fix mirostat state when using multiple sequences (#3543)

* Fix mirostat state when using multiple sequences

* Fix mirostat by completely refactoring sampling!

* Try to fix zig build.

* Export function to fetch/create default sampler states

Code formatting cleanups and add some comments

Silence a warning about id not being used when logging is disabled

* Apply some renaming suggestions.

Fix comments that were out of sync with the pull.

* Use more consistant naming convention for sampling contexts

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Oct 2023 18:25:33 +0000 (21:25 +0300)]

batched : add bench tool (#3545)

* batched : add bench tool

* batched : minor fix table

* batched-bench : add readme + n_kv_max is now configurable

* batched-bench : init warm-up batch

* batched-bench : pass custom set of PP, TG and PL

* batched-bench : add mmq CLI arg

commit | commitdiff | tree

Zane Shannon [Wed, 11 Oct 2023 11:14:05 +0000 (04:14 -0700)]

examples : add batched.swift + improve CI for swift (#3562)

commit | commitdiff | tree

Galunid [Tue, 10 Oct 2023 23:02:49 +0000 (01:02 +0200)]

Add MPT model to supported models in README.md (#3574)

commit | commitdiff | tree

goerch [Tue, 10 Oct 2023 16:59:52 +0000 (18:59 +0200)]

Minor improvements in GPT2 tokenizer (#3567)

* Fixing minor bugs in bpe_gpt2_preprocess

* Don't add bos token in test

commit | commitdiff | tree

Xingchen Song(宋星辰) [Tue, 10 Oct 2023 16:28:50 +0000 (00:28 +0800)]

readme : add bloom (#3570)

commit | commitdiff | tree

Xingchen Song(宋星辰) [Tue, 10 Oct 2023 14:48:21 +0000 (22:48 +0800)]

llm : add bloom models (#3553)

* feat: Support bloom models

* fix(bloom): fix model size

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Tue, 10 Oct 2023 11:31:13 +0000 (06:31 -0500)]

swift : improvements and fixes (#3564)

* swift : use macOS 12 as minimum requirement

* swift : add missing ggml-backend.c source

* swift : add -O3 -DNDEBUG unsafe flags

commit | commitdiff | tree

Jan Ploski [Tue, 10 Oct 2023 07:50:23 +0000 (09:50 +0200)]

llm : add MPT support (#3417)

* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545)

* mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt

* mpt : protect against "clip_qkv": null in mpt-7b

* mpt : quick fix to avoid "Strange model" warning when quantizing MPT models

* mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)

* mpt : standardized all tensor names to follow GGUF spec

* mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code

* mpt : fixed comment s/gptneox/mpt/

* mpt : remove tabs, trailing whitespace

* mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt

* mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252

* comment out n_past instead of marking it unused

* mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"]

* mpt : remove unused tokenizer_json in convert script

* ggml : remove obsolete n_past assert in ggml_alibi

* llama : print clam_kqv and max_alibi_bias hparams

---------

Co-authored-by: Cebtenzzre <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

vvhg1 [Tue, 10 Oct 2023 07:31:21 +0000 (09:31 +0200)]

infill. : fix tokenization (#3508)

* infill tokens correction

* serverinfill tokens correction

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* only rm when params.escape, rm space if possible which is added back or rm added space token

* only rm when params.escape, rm space if possible which is added back or rm added space token

* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"

This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738.

* fix interactive prompt escaping and fix server infill leading space handling

* rm unnecessary bool check

commit | commitdiff | tree

slaren [Mon, 9 Oct 2023 12:44:58 +0000 (14:44 +0200)]

ggml-alloc : fix assert in debug builds (#3555)

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Oct 2023 11:32:17 +0000 (14:32 +0300)]

refact : fix convert script + zero out KV cache to avoid nans (#3523)

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Oct 2023 11:28:27 +0000 (14:28 +0300)]

metal : do not use mul_mm kernels when ne00 < 64 (#3542)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 17:19:14 +0000 (20:19 +0300)]

sync : ggml (ggml-backend) (#3548)

* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build

commit | commitdiff | tree

Matheus C. França [Sun, 8 Oct 2023 13:59:20 +0000 (10:59 -0300)]

ci : add Zig CI/CD and fix build (#2996)

* zig CI/CD and fix build

Signed-off-by: Matheus Catarino França <redacted>
* fix build_compiler

* ci : remove trailing whitespace

---------

Signed-off-by: Matheus Catarino França <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Ryder Wishart [Sun, 8 Oct 2023 10:55:58 +0000 (03:55 -0700)]

api_like_OAI.py : compat with Microsoft Guidance (#2746)

Check for None in addition to empty string check in all request params

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

arcrank [Sun, 8 Oct 2023 10:52:57 +0000 (06:52 -0400)]

api_like_OAI.py : simplify function (#2796)

Simplify function

commit | commitdiff | tree

Johannes Rudolph [Sun, 8 Oct 2023 10:21:19 +0000 (12:21 +0200)]

k-quants : fix comments about block sizing (#3499)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 08:24:50 +0000 (11:24 +0300)]

ci : enable on obj-c changes + fix metal build (#3540)

commit | commitdiff | tree

Luo Tian [Sun, 8 Oct 2023 08:24:01 +0000 (16:24 +0800)]

zig : fix build by introducing train.cpp (#3539)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 07:01:53 +0000 (10:01 +0300)]

metal : support MTLGPUFamily < Apple7, formatting, style (#3524)

* metal : improve decoding speed for batches of 2-16

* metal : rename kernels mul_mat_ to mul_mv_

* metal : indentations

* minor

* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7

commit | commitdiff | tree

Kerfuffle [Sun, 8 Oct 2023 05:22:17 +0000 (23:22 -0600)]

llama : fix missing break in Persimmon arch case statements (#3535)

commit | commitdiff | tree

Kerfuffle [Sat, 7 Oct 2023 21:31:41 +0000 (15:31 -0600)]

Fix trying to strip newline from empty prompt and cfg prompt file content (#3534)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Sat, 7 Oct 2023 19:14:10 +0000 (22:14 +0300)]

gguf.py : fix CI for publishing GGUF package (#3532)

* Fix CI for publishing GGUF package

* Bump version

* fix

* bump version

* bump version

* bump version

commit | commitdiff | tree

Tom C [Sat, 7 Oct 2023 09:56:15 +0000 (02:56 -0700)]

py : change version of numpy requirement to 1.24.4 (#3515)

Co-authored-by: Lyjia <redacted>

commit | commitdiff | tree

cebtenzzre [Sat, 7 Oct 2023 08:41:52 +0000 (04:41 -0400)]

quantize : fail fast on write errors (#3521)

commit | commitdiff | tree

Jhen-Jie Hong [Sat, 7 Oct 2023 08:40:27 +0000 (03:40 -0500)]

metal : support default.metallib load & reuse code for swift package (#3522)

* metal : support load default.metallib & reuse code for swift package

* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT

commit | commitdiff | tree

Phillip Kravtsov [Sat, 7 Oct 2023 07:12:43 +0000 (00:12 -0700)]

llm : support Adept Persimmon 8B (#3410)

* Produces garbage output

* wip: correct tensors up to RoPE

* correct tensors thru RoPE

* Correct outputs through masked & softmax'd KQ

* fp32 works

* Rename adept->persimmon

* Produces correct outputs

* clean up convert scripts

* remove printing logic from ggml.c

* remove prints from llama.cpp & fix merge

* trivial cleanups

* Add offload funcs

* update conversion script to directly take adept artifacts rather than .saftensors file

* Fix norm eps bug

* Support sqr and concat on metal, persimmon-8b-q4 runs correctly

* Small changes from review

* Formatting changes

* Minor changes to conversion script

* Remove old script

* Fix editorconfig formatting

* Fix build

* add overlooked offload code ggml-ci

commit | commitdiff | tree

goerch [Sat, 7 Oct 2023 04:57:01 +0000 (06:57 +0200)]

Fix for #3454 (#3455)

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

commit | commitdiff | tree

BarfingLemurs [Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)]

readme : update models, cuda + ppl instructions (#3510)

commit | commitdiff | tree

Mihai [Fri, 6 Oct 2023 18:39:33 +0000 (21:39 +0300)]

server : docs fix default values and add n_probs (#3506)

commit | commitdiff | tree

Kerfuffle [Fri, 6 Oct 2023 16:10:13 +0000 (10:10 -0600)]

kv cache slot search improvements (#3493)

* kv cache slot search improvements

* Use n_ctx in kv find slot for consistency

* Ensure kv cache head points to a valid slot in llama_decode internal

* Add some comments to prevent dumb people (like me) from getting confused.

commit | commitdiff | tree

Georgi Gerganov [Fri, 6 Oct 2023 13:35:55 +0000 (16:35 +0300)]

prompts : fix editorconfig checks after #3416

commit | commitdiff | tree

pudepiedj [Fri, 6 Oct 2023 13:16:38 +0000 (14:16 +0100)]

parallel : add option to load external prompt file (#3416)

* Enable external file and add datestamp

* Add name of external file at end

* Upload ToK2024

* Delete ToK2024.txt

* Experiments with jeopardy

* Move ParallelQuestions to /proimpts and rename

* Interim commit

* Interim commit

* Final revision

* Remove trailing whitespace

* remove cmake_all.sh

* Remove cmake_all.sh

* Changed .gitignore

* Improved reporting and new question files.

* Corrected typo

* More LLM questions

* Update LLM-questions.txt

* Yet more LLM-questions

* Remove jeopardy results file

* Reinstate original jeopardy.sh

* Update examples/parallel/parallel.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 6 Oct 2023 12:44:24 +0000 (07:44 -0500)]

server : reuse llama_sample_token common util (#3494)

* server : reuse llama_sample_token common function

* common : use n_probs for temperature sampling

commit | commitdiff | tree

l3utterfly [Fri, 6 Oct 2023 10:47:59 +0000 (18:47 +0800)]

llama : correct hparams comparison (#3446)

* fixed floating point comparison issues

* updated implementation for hparam comparison to handle inf and NaN

* fixed code review comments

* minor simplification

* rename is_float_eq -> is_float_close

---------

Co-authored-by: Cebtenzzre <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 6 Oct 2023 10:36:43 +0000 (05:36 -0500)]

ci : fix xcodebuild destinations (#3491)

* ci : fix xcodebuild destinations

* ci : add .swift to paths

commit | commitdiff | tree

cebtenzzre [Thu, 5 Oct 2023 19:00:34 +0000 (15:00 -0400)]

convert : update Falcon script for new HF config (#3448)

Also adds Falcon-180B support.
Closes #3049

Co-authored-by: jb <redacted>

commit | commitdiff | tree

Kenvix ⭐ [Thu, 5 Oct 2023 17:16:39 +0000 (01:16 +0800)]

build : use std::make_tuple() for compatibility with older GCC versions (#3488)

commit | commitdiff | tree

staviq [Thu, 5 Oct 2023 16:17:29 +0000 (18:17 +0200)]

common : process escape sequences in reverse prompts (#3461)

commit | commitdiff | tree

shibe2 [Thu, 5 Oct 2023 11:57:03 +0000 (15:57 +0400)]

CLBlast: Fix handling of on-device tensor data

Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 14:02:55 +0000 (09:02 -0500)]

server : fix incorrect num_tokens_predicted (#3480)

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 14:00:07 +0000 (09:00 -0500)]

swift : disable ACCELERATE_NEW_LAPACK (#3481)

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 13:56:21 +0000 (08:56 -0500)]

ci : add swift build via xcodebuild (#3482)

commit | commitdiff | tree

Kerfuffle [Wed, 4 Oct 2023 14:20:28 +0000 (08:20 -0600)]

convert : fix Baichuan2 models by using vocab size in config.json (#3299)

Use local GGUF package when possible in Baichuan converter

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 13:50:44 +0000 (16:50 +0300)]

readme : add project status link

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 13:25:41 +0000 (16:25 +0300)]

ggml : fix build after #3329

commit | commitdiff | tree

ds5t5 [Wed, 4 Oct 2023 13:23:39 +0000 (06:23 -0700)]

llm : add Refact model (#3329)

* add refact model

* resolve comments

* rebase to the latest

* solve alibi cpu error

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 12:29:58 +0000 (15:29 +0300)]

sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)

* sync : ggml (conv 1d + 2d updates)

ggml-ci

* ggml : fix UB in q5_0 and q5_1 quantize code

ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml-ci

* tests : fix UB in test-quantize-perf

commit | commitdiff | tree

Merrick Christensen [Wed, 4 Oct 2023 06:33:13 +0000 (00:33 -0600)]

finetune : readme fix typo (#3465)

Fix small typo

commit | commitdiff | tree

Tameem [Tue, 3 Oct 2023 18:38:19 +0000 (23:38 +0500)]

ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453)

* Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v.

The RVV intrinsics is added for the following quantize row functions
   quantize_row_q8_0
   quantize_row_q8_1

The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1

And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics

Signed-off-by: Ahmad Tameem <redacted>
* Added RVV intrinsics support for k_quants

This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64
   ggml_vec_dot_q2_K_q8_K
   ggml_vec_dot_q3_K_q8_K
   ggml_vec_dot_q4_K_q8_K
   ggml_vec_dot_q5_K_q8_K
   ggml_vec_dot_q6_K_q8_K

Signed-off-by: Ahmad Tameem <redacted>
---------

Signed-off-by: Ahmad Tameem <redacted>

commit | commitdiff | tree

h-h-h-h [Tue, 3 Oct 2023 18:16:15 +0000 (20:16 +0200)]

main : consistent prefix/suffix coloring (#3425)

* Typo

* No `--in-prefix` coloring

The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.

commit | commitdiff | tree

Georgi Gerganov [Tue, 3 Oct 2023 18:04:01 +0000 (21:04 +0300)]

llama : fix session saving/loading (#3400)

* llama : fix session saving/loading

* llama : temp fix for clearing "future" tokens from the KV cache

* llama : fix handling of "future" tokens when loading sessions

* llama : fix comments for llama_kv_cache API

commit | commitdiff | tree

Alex Klinkhamer [Tue, 3 Oct 2023 17:09:28 +0000 (10:09 -0700)]

llama : expose model's rope_freq_scale in the API (#3418)

so it can be scaled further before creating a context.

commit | commitdiff | tree

Jiahao Li [Tue, 3 Oct 2023 16:55:21 +0000 (00:55 +0800)]

metal : alibi for arbitrary number of heads (#3426)

commit | commitdiff | tree

Eve [Tue, 3 Oct 2023 16:53:15 +0000 (16:53 +0000)]

cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)

* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <redacted>

commit | commitdiff | tree

goerch [Tue, 3 Oct 2023 07:16:26 +0000 (09:16 +0200)]

Work on the BPE tokenizer (#3252)

* Work on the BPE tokenizer

Tokenizer tests work for Falcon-7B

* Try to fix build problem

* Fix debug assertion failure

* Fix MSVC Unicode BOM problem

* Cleanup and an improvement

* Fix compiler warning

* Cleanup

* Test doesn't work over the full range of Unicodes

* Update .gitignore and Makefile

* Another Makefile rule

* Testing Aquila

* Moving byte decoding back to `token_to_piece` ...

... because everyone is using it.

* Guarding some unusable code pathes

* Streamlining code and adding some more assertions

Important change: I'm classifying added tokens as control tokens now for BPE.

* Adding a comment

* Adding another assertion

* Fixed vocabulary guarding assertions

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fix PR for recent change

* Fix PR for recent change

* Fix PR for recent change

* Fix for compiler warning

* Fixes for more compiler warnings

* Remove unused code

* Fix initialization of static maps

* Add scores and token types back, adapt gptneox

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update unicode.h

Co-authored-by: Georgi Gerganov <redacted>
* Update unicode.h

Co-authored-by: Georgi Gerganov <redacted>
* Ported Starcoder and added some assertions

* Fix coding style

* Apply @jploski 's fix for missing tokens

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

cebtenzzre [Mon, 2 Oct 2023 22:07:24 +0000 (18:07 -0400)]

convert : fix vocab size when not defined in hparams (#3421)

commit | commitdiff | tree

cebtenzzre [Mon, 2 Oct 2023 19:38:43 +0000 (15:38 -0400)]

cmake : increase minimum version for add_link_options (#3444)

commit | commitdiff | tree

shibe2 [Mon, 2 Oct 2023 19:26:15 +0000 (23:26 +0400)]

CLBlast: Add broadcast support for matrix multiplication (#3402)

Broadcast src0 into src1 across dimensions 2 and 3 when needed.
This is required for models that use GQA.

commit | commitdiff | tree

cebtenzzre [Mon, 2 Oct 2023 19:20:28 +0000 (15:20 -0400)]

gguf : add BERT, MPT, and GPT-J arch info (#3408)

commit | commitdiff | tree

cebtenzzre [Mon, 2 Oct 2023 18:58:46 +0000 (14:58 -0400)]

gguf : general usability improvements (#3409)

commit | commitdiff | tree

cebtenzzre [Mon, 2 Oct 2023 13:16:50 +0000 (09:16 -0400)]

cmake : make CUDA flags more similar to the Makefile (#3420)

* cmake : fix misuse of cxx_flags

* cmake : make CUDA flags more similar to the Makefile

* cmake : fix MSVC build

commit | commitdiff | tree

xaedes [Mon, 2 Oct 2023 13:15:45 +0000 (15:15 +0200)]

finetune : fix #3404 (#3437)

the shapes for init model of gqa models was wrong

commit | commitdiff | tree

Adrian [Mon, 2 Oct 2023 10:49:59 +0000 (03:49 -0700)]

metal : set log callback before initializing (#3427)

commit | commitdiff | tree

bandoti [Mon, 2 Oct 2023 09:51:49 +0000 (06:51 -0300)]

cmake : fix transient definitions in find pkg (#3411)

commit | commitdiff | tree

Kevin Ji [Mon, 2 Oct 2023 08:53:53 +0000 (04:53 -0400)]

docker : ignore Git files (#3314)

commit | commitdiff | tree

vvhg1 [Mon, 2 Oct 2023 07:42:02 +0000 (09:42 +0200)]

infill : add new example + extend server API (#3296)

* vvhg-code-infill (#1)

* infill in separate example (#2)

* reverted changes to main and added infill example

* cleanup

* naming improvement

* make : add missing blank line

* fix missing semicolon

* brought infill up to current main code

* cleanup

---------

Co-authored-by: Cebtenzzre <redacted>

commit | commitdiff | tree

slaren [Sat, 30 Sep 2023 16:12:57 +0000 (18:12 +0200)]

ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)

* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU

commit | commitdiff | tree

slaren [Fri, 29 Sep 2023 16:42:32 +0000 (18:42 +0200)]

llama.cpp : add documentation about rope_freq_base and scale values (#3401)

* llama.cpp : add documentation about rope_freq_base and scale values

* add notice to hot topics

commit | commitdiff | tree

Georgi Gerganov [Fri, 29 Sep 2023 16:05:18 +0000 (19:05 +0300)]

train : fix KQ_pos allocation (#3392)

* train : fix KQ_pos allocation

* make sure KQ_pos is not reallocated in finetune

---------

Co-authored-by: xaedes <redacted>

commit | commitdiff | tree

Cebtenzzre [Fri, 29 Sep 2023 13:48:45 +0000 (09:48 -0400)]

llama : quantize up to 31% faster on Linux and Windows with mmap (#3206)

* llama : enable mmap in quantize on Linux -> 31% faster

* also enable mmap on Windows

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

BarfingLemurs [Fri, 29 Sep 2023 12:50:35 +0000 (08:50 -0400)]

readme : update hot topics + model links (#3399)

commit | commitdiff | tree

Andrew Duffy [Fri, 29 Sep 2023 11:15:57 +0000 (07:15 -0400)]

readme : add link to grammars app (#3388)

* Add link to grammars app per @ggernagov suggestion

Adding a sentence in the Grammars section of README to point to grammar app, per https://github.com/ggerganov/llama.cpp/discussions/2494#discussioncomment-7138211

* Update README.md

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 29 Sep 2023 05:25:13 +0000 (13:25 +0800)]

swift : fix build on xcode 15 (#3387)

Packaging of ggml-org/llama.cpp