git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

Ian Scrivener [Sun, 22 Oct 2023 18:16:43 +0000 (05:16 +1100)]

readme : remove unsupported node.js library (#3703)

- https://github.com/Atome-FE/llama-node is quite out of date
- doesn't support recent/current llama.cpp functionality

commit | commitdiff | tree

Kerfuffle [Sun, 22 Oct 2023 18:14:56 +0000 (12:14 -0600)]

llama : validate special token ids are in range when loading GGUF model (#3635)

* Add validation for special token ids to llama.cpp

Small optimization for llama_byte_to_token SPM mode

* Fix BPE newline check, only I could break something so simple

* Killll meeeeee

* Account for GGUF_KEY_KEY only setting when the key exists

* Minor code cleanups.

* Fix convert.py error msg when added tokens are out of range

* Make gguf SpecialVocab vocab size-aware

Update conversion scripts accordingly

* Avoid a string copy

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

vvhg1 [Sun, 22 Oct 2023 18:09:51 +0000 (20:09 +0200)]

main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623)

* infill tokens correction

* serverinfill tokens correction

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* only rm when params.escape, rm space if possible which is added back or rm added space token

* only rm when params.escape, rm space if possible which is added back or rm added space token

* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"

This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738.

* fix interactive prompt escaping and fix server infill leading space handling

* rm unnecessary bool check

* process escapes for neg prompt and interactive consec prompts

* removed unneccessary static string escape

commit | commitdiff | tree

Georgi Gerganov [Sun, 22 Oct 2023 05:37:20 +0000 (08:37 +0300)]

batched : add len CLI argument

commit | commitdiff | tree

shibe2 [Thu, 12 Oct 2023 12:01:23 +0000 (16:01 +0400)]

CLBlast: Add outer loops over src0 for broadcasting in mulmat

Reduce repeated dequantization of the same data.

commit | commitdiff | tree

Georgi Gerganov [Fri, 20 Oct 2023 18:07:23 +0000 (21:07 +0300)]

sampling : refactor init to use llama_sampling_params (#3696)

* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci

commit | commitdiff | tree

Qin Yue Chen [Fri, 20 Oct 2023 11:19:40 +0000 (06:19 -0500)]

gguf : support big endian platform (#3552)

* check whether platform is 390x if yes->do not import immintrin.h

* support s390x big endian

* support --bigendian option for s390x
1. verified with baichuan7b-chat with float 16 on s390x
2. verified with baichuan7b-chat
3. verified with chinese-alpaca-2-13b-f16

* update format based on editor-config checker result

* Update convert-baichuan-hf-to-gguf.py

* 1. check in ggml.c if endianess is not match
2. update GGUF version
3. change get_pack_prefix to property
4. update information log

* always use "GGUF" as beginng of GGUF file

* Compare "GGUF" with file header char by char
1. Set GGUF_MAGIC to "GGUF" string instead of int value
2. Compare "GGUF" char by char to ensure its byte order
3. Move bytes swap code from convert.py to gguf.py write_tensor_data

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 20 Oct 2023 10:06:10 +0000 (13:06 +0300)]

server : fix uninitialized sampling context (close #3685)

commit | commitdiff | tree

Herman Semenov [Fri, 20 Oct 2023 10:02:12 +0000 (10:02 +0000)]

ggml : fix rope + llama minor optimizations (#3560)

* Minor fixes and fixed memleak

* Using const auto references in range-based loop C++17

commit | commitdiff | tree

cebtenzzre [Fri, 20 Oct 2023 05:32:08 +0000 (01:32 -0400)]

convert : restore compat with old Falcon models (#3680)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Thu, 19 Oct 2023 16:40:41 +0000 (19:40 +0300)]

multimodal : add BakLLaVA conversion support (#3682)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Thu, 19 Oct 2023 13:59:11 +0000 (16:59 +0300)]

llava : avoid segfault in case of non-existent mmproj file (#3674)

commit | commitdiff | tree

Georgi Gerganov [Wed, 18 Oct 2023 18:44:43 +0000 (21:44 +0300)]

readme : update hot topics

commit | commitdiff | tree

Georgi Gerganov [Wed, 18 Oct 2023 15:49:40 +0000 (18:49 +0300)]

speculative : bug fixes

commit | commitdiff | tree

Georgi Gerganov [Wed, 18 Oct 2023 13:21:57 +0000 (16:21 +0300)]

speculative : add tree-based sampling example (#3624)

* sampling : one sequence per sampling context

ggml-ci

* speculative : add tree-based sampling support

ggml-ci

* speculative : reuse the n_parallel CLI param

* speculative : refactor sampling

* examples : fix build after sampling refactoring

ggml-ci

* batched : fix n_seq_id

* sampling : fix malloc

ggml-ci

* swift : fix build

ggml-ci

* swift : try to fix build

ggml-ci

* prompts : add assistant.txt

* common : add llama_batch_add() and llama_batch_clear() helpers

* speculative : minor refactor

ggml-ci

* minor : comments + rename

ggml-ci

* speculative : fix off-by-one for n_drafted

* speculative : fix the n_drafted fix + p constants

commit | commitdiff | tree

Jhen-Jie Hong [Wed, 18 Oct 2023 12:21:48 +0000 (07:21 -0500)]

metal : implement q5_0 and q5_1 kernels (#3648)

* metal : implement dequantize_q5_0

* metal : block_q_n_dot_y for block_q5_0 (broken)

* metal : revert unnecessary change

* metal : implement dequantize_q5_1

* metal : block_q_n_dot_y for q5_1 (broken)

* metal : fix block_q_n_dot_y

* minor : spaces / formatting

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

shibe2 [Wed, 18 Oct 2023 12:09:22 +0000 (16:09 +0400)]

opencl : fix element-wise multiplication (#3656)

commit | commitdiff | tree

slaren [Tue, 17 Oct 2023 20:24:50 +0000 (22:24 +0200)]

fix embeddings when using CUDA (#3657)

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 19:34:26 +0000 (22:34 +0300)]

llama : avoid fprintf in favor of LLAMA_LOG (#3538)

commit | commitdiff | tree

BarfingLemurs [Tue, 17 Oct 2023 18:13:21 +0000 (14:13 -0400)]

readme : update hot-topics & models, detail windows release in usage (#3615)

* Update README.md

* Update README.md

* Update README.md

* move "Running on Windows" section below "Prepare data and run"

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

shibe2 [Wed, 11 Oct 2023 17:30:06 +0000 (21:30 +0400)]

CLBlast: Fix temporary buffer size for f16 conversion (wsize)

Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.

commit | commitdiff | tree

slaren [Tue, 17 Oct 2023 17:00:58 +0000 (19:00 +0200)]

train-text-from-scratch : fix assert failure in ggml-alloc (#3618)

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 16:52:53 +0000 (19:52 +0300)]

editorconfig : remove trailing spaces

commit | commitdiff | tree

coezbek [Tue, 17 Oct 2023 16:51:02 +0000 (18:51 +0200)]

server : documentation of JSON return value of /completion endpoint (#3632)

* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 16:12:46 +0000 (19:12 +0300)]

save-load-state : fix example + add ci test (#3655)

* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci

commit | commitdiff | tree

ldwang [Tue, 17 Oct 2023 15:52:33 +0000 (23:52 +0800)]

readme : add Aquila2 links (#3610)

Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>

commit | commitdiff | tree

staviq [Tue, 17 Oct 2023 15:11:01 +0000 (17:11 +0200)]

tokenizer : special token handling (#3538)

* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 17 Oct 2023 06:19:28 +0000 (09:19 +0300)]

k-quants : fix quantization ranges (#3646)

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Oct 2023 20:58:00 +0000 (23:58 +0300)]

llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)

* llava : fix tokenization to not add bos after system prompt

* set seed

---------

Co-authored-by: M. Yusuf Sarıgöz <redacted>

commit | commitdiff | tree

cebtenzzre [Sun, 15 Oct 2023 06:32:06 +0000 (02:32 -0400)]

MPT : support GQA for replit-code-v1.5 (#3627)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Sat, 14 Oct 2023 10:52:44 +0000 (13:52 +0300)]

Honor -ngl option for Cuda offloading in llava (#3621)

commit | commitdiff | tree

Daniel Bevenius [Fri, 13 Oct 2023 10:33:16 +0000 (12:33 +0200)]

llama : remove n_threads from llama_decode_internal (#3614)

This commit removes `n_threads` from the `llama_decode_internal`
functions doc comment as it does not exist anymore.

It looks like this parameter was removed in
Commit 16bc66d9479edd5ee12ec734973554d4493c5dfa ("llama.cpp : split
llama_context_params into model and context params").

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

slaren [Fri, 13 Oct 2023 10:23:10 +0000 (12:23 +0200)]

ggml : add context enumeration functions (#3605)

finetune : fix assert failure in ggml-alloc

commit | commitdiff | tree

shibe2 [Thu, 12 Oct 2023 19:59:47 +0000 (23:59 +0400)]

CLBlast: Fix matrix-vector multiplication (#3544)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Thu, 12 Oct 2023 15:23:18 +0000 (18:23 +0300)]

examples: support LLaVA v1.5 (multimodal model) (#3436)

* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

uint256_t [Thu, 12 Oct 2023 13:36:16 +0000 (22:36 +0900)]

docs : fix typo GOMP_CPU_AFFINITY (#3597)

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 11:31:05 +0000 (14:31 +0300)]

cmake : fix add_compile_options on macOS

commit | commitdiff | tree

Ian Scrivener [Thu, 12 Oct 2023 11:10:50 +0000 (22:10 +1100)]

typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)

fixed a typo in the MacOS Metal run doco

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 10:44:56 +0000 (13:44 +0300)]

ci : check if there is enough VRAM (#3596)

ggml-ci

commit | commitdiff | tree

Aarni Koskela [Thu, 12 Oct 2023 06:51:53 +0000 (15:51 +0900)]

server : add completion mode (no chat) (#3582)

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 06:35:19 +0000 (09:35 +0300)]

prompts : add mnemonics.txt

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 06:29:04 +0000 (09:29 +0300)]

server : fix kv cache management (#3588)

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Oct 2023 20:55:08 +0000 (23:55 +0300)]

main : fix session loading bug (#3400)

commit | commitdiff | tree

Michael Coppola [Wed, 11 Oct 2023 19:42:22 +0000 (15:42 -0400)]

server : add parameter -tb N, --threads-batch N (#3584)

Co-authored-by: Michael Coppola <redacted>

commit | commitdiff | tree

Kerfuffle [Wed, 11 Oct 2023 19:35:46 +0000 (13:35 -0600)]

common : fix mirostat state when using multiple sequences (#3543)

* Fix mirostat state when using multiple sequences

* Fix mirostat by completely refactoring sampling!

* Try to fix zig build.

* Export function to fetch/create default sampler states

Code formatting cleanups and add some comments

Silence a warning about id not being used when logging is disabled

* Apply some renaming suggestions.

Fix comments that were out of sync with the pull.

* Use more consistant naming convention for sampling contexts

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Oct 2023 18:25:33 +0000 (21:25 +0300)]

batched : add bench tool (#3545)

* batched : add bench tool

* batched : minor fix table

* batched-bench : add readme + n_kv_max is now configurable

* batched-bench : init warm-up batch

* batched-bench : pass custom set of PP, TG and PL

* batched-bench : add mmq CLI arg

commit | commitdiff | tree

Zane Shannon [Wed, 11 Oct 2023 11:14:05 +0000 (04:14 -0700)]

examples : add batched.swift + improve CI for swift (#3562)

commit | commitdiff | tree

Galunid [Tue, 10 Oct 2023 23:02:49 +0000 (01:02 +0200)]

Add MPT model to supported models in README.md (#3574)

commit | commitdiff | tree

goerch [Tue, 10 Oct 2023 16:59:52 +0000 (18:59 +0200)]

Minor improvements in GPT2 tokenizer (#3567)

* Fixing minor bugs in bpe_gpt2_preprocess

* Don't add bos token in test

commit | commitdiff | tree

Xingchen Song(宋星辰) [Tue, 10 Oct 2023 16:28:50 +0000 (00:28 +0800)]

readme : add bloom (#3570)

commit | commitdiff | tree

Xingchen Song(宋星辰) [Tue, 10 Oct 2023 14:48:21 +0000 (22:48 +0800)]

llm : add bloom models (#3553)

* feat: Support bloom models

* fix(bloom): fix model size

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Tue, 10 Oct 2023 11:31:13 +0000 (06:31 -0500)]

swift : improvements and fixes (#3564)

* swift : use macOS 12 as minimum requirement

* swift : add missing ggml-backend.c source

* swift : add -O3 -DNDEBUG unsafe flags

commit | commitdiff | tree

Jan Ploski [Tue, 10 Oct 2023 07:50:23 +0000 (09:50 +0200)]

llm : add MPT support (#3417)

* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545)

* mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt

* mpt : protect against "clip_qkv": null in mpt-7b

* mpt : quick fix to avoid "Strange model" warning when quantizing MPT models

* mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)

* mpt : standardized all tensor names to follow GGUF spec

* mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code

* mpt : fixed comment s/gptneox/mpt/

* mpt : remove tabs, trailing whitespace

* mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt

* mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252

* comment out n_past instead of marking it unused

* mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"]

* mpt : remove unused tokenizer_json in convert script

* ggml : remove obsolete n_past assert in ggml_alibi

* llama : print clam_kqv and max_alibi_bias hparams

---------

Co-authored-by: Cebtenzzre <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

vvhg1 [Tue, 10 Oct 2023 07:31:21 +0000 (09:31 +0200)]

infill. : fix tokenization (#3508)

* infill tokens correction

* serverinfill tokens correction

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape

* only rm when params.escape, rm space if possible which is added back or rm added space token

* only rm when params.escape, rm space if possible which is added back or rm added space token

* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"

This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738.

* fix interactive prompt escaping and fix server infill leading space handling

* rm unnecessary bool check

commit | commitdiff | tree

slaren [Mon, 9 Oct 2023 12:44:58 +0000 (14:44 +0200)]

ggml-alloc : fix assert in debug builds (#3555)

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Oct 2023 11:32:17 +0000 (14:32 +0300)]

refact : fix convert script + zero out KV cache to avoid nans (#3523)

* refact : fix convert script + zero out KV cache to avoid nans

* ggml : silu(-inf) should never happen

* metal : assert various kernel requirements

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Oct 2023 11:28:27 +0000 (14:28 +0300)]

metal : do not use mul_mm kernels when ne00 < 64 (#3542)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 17:19:14 +0000 (20:19 +0300)]

sync : ggml (ggml-backend) (#3548)

* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build

commit | commitdiff | tree

Matheus C. França [Sun, 8 Oct 2023 13:59:20 +0000 (10:59 -0300)]

ci : add Zig CI/CD and fix build (#2996)

* zig CI/CD and fix build

Signed-off-by: Matheus Catarino França <redacted>
* fix build_compiler

* ci : remove trailing whitespace

---------

Signed-off-by: Matheus Catarino França <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Ryder Wishart [Sun, 8 Oct 2023 10:55:58 +0000 (03:55 -0700)]

api_like_OAI.py : compat with Microsoft Guidance (#2746)

Check for None in addition to empty string check in all request params

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

arcrank [Sun, 8 Oct 2023 10:52:57 +0000 (06:52 -0400)]

api_like_OAI.py : simplify function (#2796)

Simplify function

commit | commitdiff | tree

Johannes Rudolph [Sun, 8 Oct 2023 10:21:19 +0000 (12:21 +0200)]

k-quants : fix comments about block sizing (#3499)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 08:24:50 +0000 (11:24 +0300)]

ci : enable on obj-c changes + fix metal build (#3540)

commit | commitdiff | tree

Luo Tian [Sun, 8 Oct 2023 08:24:01 +0000 (16:24 +0800)]

zig : fix build by introducing train.cpp (#3539)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 07:01:53 +0000 (10:01 +0300)]

metal : support MTLGPUFamily < Apple7, formatting, style (#3524)

* metal : improve decoding speed for batches of 2-16

* metal : rename kernels mul_mat_ to mul_mv_

* metal : indentations

* minor

* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7

commit | commitdiff | tree

Kerfuffle [Sun, 8 Oct 2023 05:22:17 +0000 (23:22 -0600)]

llama : fix missing break in Persimmon arch case statements (#3535)

commit | commitdiff | tree

Kerfuffle [Sat, 7 Oct 2023 21:31:41 +0000 (15:31 -0600)]

Fix trying to strip newline from empty prompt and cfg prompt file content (#3534)

commit | commitdiff | tree

M. Yusuf Sarıgöz [Sat, 7 Oct 2023 19:14:10 +0000 (22:14 +0300)]

gguf.py : fix CI for publishing GGUF package (#3532)

* Fix CI for publishing GGUF package

* Bump version

* fix

* bump version

* bump version

* bump version

commit | commitdiff | tree

Tom C [Sat, 7 Oct 2023 09:56:15 +0000 (02:56 -0700)]

py : change version of numpy requirement to 1.24.4 (#3515)

Co-authored-by: Lyjia <redacted>

commit | commitdiff | tree

cebtenzzre [Sat, 7 Oct 2023 08:41:52 +0000 (04:41 -0400)]

quantize : fail fast on write errors (#3521)

commit | commitdiff | tree

Jhen-Jie Hong [Sat, 7 Oct 2023 08:40:27 +0000 (03:40 -0500)]

metal : support default.metallib load & reuse code for swift package (#3522)

* metal : support load default.metallib & reuse code for swift package

* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT

commit | commitdiff | tree

Phillip Kravtsov [Sat, 7 Oct 2023 07:12:43 +0000 (00:12 -0700)]

llm : support Adept Persimmon 8B (#3410)

* Produces garbage output

* wip: correct tensors up to RoPE

* correct tensors thru RoPE

* Correct outputs through masked & softmax'd KQ

* fp32 works

* Rename adept->persimmon

* Produces correct outputs

* clean up convert scripts

* remove printing logic from ggml.c

* remove prints from llama.cpp & fix merge

* trivial cleanups

* Add offload funcs

* update conversion script to directly take adept artifacts rather than .saftensors file

* Fix norm eps bug

* Support sqr and concat on metal, persimmon-8b-q4 runs correctly

* Small changes from review

* Formatting changes

* Minor changes to conversion script

* Remove old script

* Fix editorconfig formatting

* Fix build

* add overlooked offload code ggml-ci

commit | commitdiff | tree

goerch [Sat, 7 Oct 2023 04:57:01 +0000 (06:57 +0200)]

Fix for #3454 (#3455)

Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion

commit | commitdiff | tree

BarfingLemurs [Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)]

readme : update models, cuda + ppl instructions (#3510)

commit | commitdiff | tree

Mihai [Fri, 6 Oct 2023 18:39:33 +0000 (21:39 +0300)]

server : docs fix default values and add n_probs (#3506)

commit | commitdiff | tree

Kerfuffle [Fri, 6 Oct 2023 16:10:13 +0000 (10:10 -0600)]

kv cache slot search improvements (#3493)

* kv cache slot search improvements

* Use n_ctx in kv find slot for consistency

* Ensure kv cache head points to a valid slot in llama_decode internal

* Add some comments to prevent dumb people (like me) from getting confused.

commit | commitdiff | tree

Georgi Gerganov [Fri, 6 Oct 2023 13:35:55 +0000 (16:35 +0300)]

prompts : fix editorconfig checks after #3416

commit | commitdiff | tree

pudepiedj [Fri, 6 Oct 2023 13:16:38 +0000 (14:16 +0100)]

parallel : add option to load external prompt file (#3416)

* Enable external file and add datestamp

* Add name of external file at end

* Upload ToK2024

* Delete ToK2024.txt

* Experiments with jeopardy

* Move ParallelQuestions to /proimpts and rename

* Interim commit

* Interim commit

* Final revision

* Remove trailing whitespace

* remove cmake_all.sh

* Remove cmake_all.sh

* Changed .gitignore

* Improved reporting and new question files.

* Corrected typo

* More LLM questions

* Update LLM-questions.txt

* Yet more LLM-questions

* Remove jeopardy results file

* Reinstate original jeopardy.sh

* Update examples/parallel/parallel.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 6 Oct 2023 12:44:24 +0000 (07:44 -0500)]

server : reuse llama_sample_token common util (#3494)

* server : reuse llama_sample_token common function

* common : use n_probs for temperature sampling

commit | commitdiff | tree

l3utterfly [Fri, 6 Oct 2023 10:47:59 +0000 (18:47 +0800)]

llama : correct hparams comparison (#3446)

* fixed floating point comparison issues

* updated implementation for hparam comparison to handle inf and NaN

* fixed code review comments

* minor simplification

* rename is_float_eq -> is_float_close

---------

Co-authored-by: Cebtenzzre <redacted>

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 6 Oct 2023 10:36:43 +0000 (05:36 -0500)]

ci : fix xcodebuild destinations (#3491)

* ci : fix xcodebuild destinations

* ci : add .swift to paths

commit | commitdiff | tree

cebtenzzre [Thu, 5 Oct 2023 19:00:34 +0000 (15:00 -0400)]

convert : update Falcon script for new HF config (#3448)

Also adds Falcon-180B support.
Closes #3049

Co-authored-by: jb <redacted>

commit | commitdiff | tree

Kenvix ⭐ [Thu, 5 Oct 2023 17:16:39 +0000 (01:16 +0800)]

build : use std::make_tuple() for compatibility with older GCC versions (#3488)

commit | commitdiff | tree

staviq [Thu, 5 Oct 2023 16:17:29 +0000 (18:17 +0200)]

common : process escape sequences in reverse prompts (#3461)

commit | commitdiff | tree

shibe2 [Thu, 5 Oct 2023 11:57:03 +0000 (15:57 +0400)]

CLBlast: Fix handling of on-device tensor data

Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 14:02:55 +0000 (09:02 -0500)]

server : fix incorrect num_tokens_predicted (#3480)

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 14:00:07 +0000 (09:00 -0500)]

swift : disable ACCELERATE_NEW_LAPACK (#3481)

commit | commitdiff | tree

Jhen-Jie Hong [Thu, 5 Oct 2023 13:56:21 +0000 (08:56 -0500)]

ci : add swift build via xcodebuild (#3482)

commit | commitdiff | tree

Kerfuffle [Wed, 4 Oct 2023 14:20:28 +0000 (08:20 -0600)]

convert : fix Baichuan2 models by using vocab size in config.json (#3299)

Use local GGUF package when possible in Baichuan converter

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 13:50:44 +0000 (16:50 +0300)]

readme : add project status link

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 13:25:41 +0000 (16:25 +0300)]

ggml : fix build after #3329

commit | commitdiff | tree

ds5t5 [Wed, 4 Oct 2023 13:23:39 +0000 (06:23 -0700)]

llm : add Refact model (#3329)

* add refact model

* resolve comments

* rebase to the latest

* solve alibi cpu error

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 12:29:58 +0000 (15:29 +0300)]

sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)

* sync : ggml (conv 1d + 2d updates)

ggml-ci

* ggml : fix UB in q5_0 and q5_1 quantize code

ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

ggml-ci

* tests : fix UB in test-quantize-perf

commit | commitdiff | tree

Merrick Christensen [Wed, 4 Oct 2023 06:33:13 +0000 (00:33 -0600)]

finetune : readme fix typo (#3465)

Fix small typo

commit | commitdiff | tree

Tameem [Tue, 3 Oct 2023 18:38:19 +0000 (23:38 +0500)]

ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453)

* Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v.

The RVV intrinsics is added for the following quantize row functions
   quantize_row_q8_0
   quantize_row_q8_1

The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1
   ggml_vec_dot_q4_0_q8_0
   ggml_vec_dot_q4_1_q8_1
   ggml_vec_dot_q5_0_q8_0
   ggml_vec_dot_q5_1_q8_1

And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics

Signed-off-by: Ahmad Tameem <redacted>
* Added RVV intrinsics support for k_quants

This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64
   ggml_vec_dot_q2_K_q8_K
   ggml_vec_dot_q3_K_q8_K
   ggml_vec_dot_q4_K_q8_K
   ggml_vec_dot_q5_K_q8_K
   ggml_vec_dot_q6_K_q8_K

Signed-off-by: Ahmad Tameem <redacted>
---------

Signed-off-by: Ahmad Tameem <redacted>

commit | commitdiff | tree

h-h-h-h [Tue, 3 Oct 2023 18:16:15 +0000 (20:16 +0200)]

main : consistent prefix/suffix coloring (#3425)

* Typo

* No `--in-prefix` coloring

The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.

commit | commitdiff | tree

Georgi Gerganov [Tue, 3 Oct 2023 18:04:01 +0000 (21:04 +0300)]

llama : fix session saving/loading (#3400)

* llama : fix session saving/loading

* llama : temp fix for clearing "future" tokens from the KV cache

* llama : fix handling of "future" tokens when loading sessions

* llama : fix comments for llama_kv_cache API

commit | commitdiff | tree

Alex Klinkhamer [Tue, 3 Oct 2023 17:09:28 +0000 (10:09 -0700)]

llama : expose model's rope_freq_scale in the API (#3418)

so it can be scaled further before creating a context.

commit | commitdiff | tree

Jiahao Li [Tue, 3 Oct 2023 16:55:21 +0000 (00:55 +0800)]

metal : alibi for arbitrary number of heads (#3426)

commit | commitdiff | tree

Eve [Tue, 3 Oct 2023 16:53:15 +0000 (16:53 +0000)]

cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)

* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <redacted>

Packaging of ggml-org/llama.cpp

RSS Atom