]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Ian Scrivener [Sun, 22 Oct 2023 18:16:43 +0000 (05:16 +1100)]
readme : remove unsupported node.js library (#3703)
- https://github.com/Atome-FE/llama-node is quite out of date
- doesn't support recent/current llama.cpp functionality
Kerfuffle [Sun, 22 Oct 2023 18:14:56 +0000 (12:14 -0600)]
llama : validate special token ids are in range when loading GGUF model (#3635)
* Add validation for special token ids to llama.cpp
Small optimization for llama_byte_to_token SPM mode
* Fix BPE newline check, only I could break something so simple
* Killll meeeeee
* Account for GGUF_KEY_KEY only setting when the key exists
* Minor code cleanups.
* Fix convert.py error msg when added tokens are out of range
* Make gguf SpecialVocab vocab size-aware
Update conversion scripts accordingly
* Avoid a string copy
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
vvhg1 [Sun, 22 Oct 2023 18:09:51 +0000 (20:09 +0200)]
main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623)
* infill tokens correction
* serverinfill tokens correction
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* only rm when params.escape, rm space if possible which is added back or rm added space token
* only rm when params.escape, rm space if possible which is added back or rm added space token
* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"
This reverts commit
63ba0b621f21077c0e3bc6ba6a327534123cb738 .
* fix interactive prompt escaping and fix server infill leading space handling
* rm unnecessary bool check
* process escapes for neg prompt and interactive consec prompts
* removed unneccessary static string escape
Georgi Gerganov [Sun, 22 Oct 2023 05:37:20 +0000 (08:37 +0300)]
batched : add len CLI argument
shibe2 [Thu, 12 Oct 2023 12:01:23 +0000 (16:01 +0400)]
CLBlast: Add outer loops over src0 for broadcasting in mulmat
Reduce repeated dequantization of the same data.
Georgi Gerganov [Fri, 20 Oct 2023 18:07:23 +0000 (21:07 +0300)]
sampling : refactor init to use llama_sampling_params (#3696)
* sampling : refactor init to use llama_sampling_params
* llama : combine repetition, frequency and presence penalties in 1 call
* examples : remove embd-input and gptneox-wip
* sampling : rename penalty params + reduce size of "prev" vector
* sampling : add llama_sampling_print helper
* sampling : hide prev behind API and apply #3661
ggml-ci
Qin Yue Chen [Fri, 20 Oct 2023 11:19:40 +0000 (06:19 -0500)]
gguf : support big endian platform (#3552)
* check whether platform is 390x if yes->do not import immintrin.h
* support s390x big endian
* support --bigendian option for s390x
1. verified with baichuan7b-chat with float 16 on s390x
2. verified with baichuan7b-chat
3. verified with chinese-alpaca-2-13b-f16
* update format based on editor-config checker result
* Update convert-baichuan-hf-to-gguf.py
* 1. check in ggml.c if endianess is not match
2. update GGUF version
3. change get_pack_prefix to property
4. update information log
* always use "GGUF" as beginng of GGUF file
* Compare "GGUF" with file header char by char
1. Set GGUF_MAGIC to "GGUF" string instead of int value
2. Compare "GGUF" char by char to ensure its byte order
3. Move bytes swap code from convert.py to gguf.py write_tensor_data
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 20 Oct 2023 10:06:10 +0000 (13:06 +0300)]
server : fix uninitialized sampling context (close #3685)
Herman Semenov [Fri, 20 Oct 2023 10:02:12 +0000 (10:02 +0000)]
ggml : fix rope + llama minor optimizations (#3560)
* Minor fixes and fixed memleak
* Using const auto references in range-based loop C++17
cebtenzzre [Fri, 20 Oct 2023 05:32:08 +0000 (01:32 -0400)]
convert : restore compat with old Falcon models (#3680)
M. Yusuf Sarıgöz [Thu, 19 Oct 2023 16:40:41 +0000 (19:40 +0300)]
multimodal : add BakLLaVA conversion support (#3682)
M. Yusuf Sarıgöz [Thu, 19 Oct 2023 13:59:11 +0000 (16:59 +0300)]
llava : avoid segfault in case of non-existent mmproj file (#3674)
Georgi Gerganov [Wed, 18 Oct 2023 18:44:43 +0000 (21:44 +0300)]
readme : update hot topics
Georgi Gerganov [Wed, 18 Oct 2023 15:49:40 +0000 (18:49 +0300)]
speculative : bug fixes
Georgi Gerganov [Wed, 18 Oct 2023 13:21:57 +0000 (16:21 +0300)]
speculative : add tree-based sampling example (#3624)
* sampling : one sequence per sampling context
ggml-ci
* speculative : add tree-based sampling support
ggml-ci
* speculative : reuse the n_parallel CLI param
* speculative : refactor sampling
* examples : fix build after sampling refactoring
ggml-ci
* batched : fix n_seq_id
* sampling : fix malloc
ggml-ci
* swift : fix build
ggml-ci
* swift : try to fix build
ggml-ci
* prompts : add assistant.txt
* common : add llama_batch_add() and llama_batch_clear() helpers
* speculative : minor refactor
ggml-ci
* minor : comments + rename
ggml-ci
* speculative : fix off-by-one for n_drafted
* speculative : fix the n_drafted fix + p constants
Jhen-Jie Hong [Wed, 18 Oct 2023 12:21:48 +0000 (07:21 -0500)]
metal : implement q5_0 and q5_1 kernels (#3648)
* metal : implement dequantize_q5_0
* metal : block_q_n_dot_y for block_q5_0 (broken)
* metal : revert unnecessary change
* metal : implement dequantize_q5_1
* metal : block_q_n_dot_y for q5_1 (broken)
* metal : fix block_q_n_dot_y
* minor : spaces / formatting
---------
Co-authored-by: Georgi Gerganov <redacted>
shibe2 [Wed, 18 Oct 2023 12:09:22 +0000 (16:09 +0400)]
opencl : fix element-wise multiplication (#3656)
slaren [Tue, 17 Oct 2023 20:24:50 +0000 (22:24 +0200)]
fix embeddings when using CUDA (#3657)
Georgi Gerganov [Tue, 17 Oct 2023 19:34:26 +0000 (22:34 +0300)]
llama : avoid fprintf in favor of LLAMA_LOG (#3538)
BarfingLemurs [Tue, 17 Oct 2023 18:13:21 +0000 (14:13 -0400)]
readme : update hot-topics & models, detail windows release in usage (#3615)
* Update README.md
* Update README.md
* Update README.md
* move "Running on Windows" section below "Prepare data and run"
---------
Co-authored-by: Georgi Gerganov <redacted>
shibe2 [Wed, 11 Oct 2023 17:30:06 +0000 (21:30 +0400)]
CLBlast: Fix temporary buffer size for f16 conversion (wsize)
Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.
slaren [Tue, 17 Oct 2023 17:00:58 +0000 (19:00 +0200)]
train-text-from-scratch : fix assert failure in ggml-alloc (#3618)
Georgi Gerganov [Tue, 17 Oct 2023 16:52:53 +0000 (19:52 +0300)]
editorconfig : remove trailing spaces
coezbek [Tue, 17 Oct 2023 16:51:02 +0000 (18:51 +0200)]
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint
* Update examples/server/README.md
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Tue, 17 Oct 2023 16:12:46 +0000 (19:12 +0300)]
save-load-state : fix example + add ci test (#3655)
* save-load-state : fix example (close #3606)
* ci : add test for save-load-state example
ggml-ci
ldwang [Tue, 17 Oct 2023 15:52:33 +0000 (23:52 +0800)]
readme : add Aquila2 links (#3610)
Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>
staviq [Tue, 17 Oct 2023 15:11:01 +0000 (17:11 +0200)]
tokenizer : special token handling (#3538)
* Rewrite special token handling from #1931
* shorten param name, add st verification by type
* use offsets instead of copy by substr
* formatting, remove copying iterator on delete
* llama : normalize code-style
* swift fix
* print pfx/sfx if verb, main: split pfx input sfx
* dont add space when using special tokens
* minor : comment + spacing
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Tue, 17 Oct 2023 06:19:28 +0000 (09:19 +0300)]
k-quants : fix quantization ranges (#3646)
Georgi Gerganov [Mon, 16 Oct 2023 20:58:00 +0000 (23:58 +0300)]
llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)
* llava : fix tokenization to not add bos after system prompt
* set seed
---------
Co-authored-by: M. Yusuf Sarıgöz <redacted>
cebtenzzre [Sun, 15 Oct 2023 06:32:06 +0000 (02:32 -0400)]
MPT : support GQA for replit-code-v1.5 (#3627)
M. Yusuf Sarıgöz [Sat, 14 Oct 2023 10:52:44 +0000 (13:52 +0300)]
Honor -ngl option for Cuda offloading in llava (#3621)
Daniel Bevenius [Fri, 13 Oct 2023 10:33:16 +0000 (12:33 +0200)]
llama : remove n_threads from llama_decode_internal (#3614)
This commit removes `n_threads` from the `llama_decode_internal`
functions doc comment as it does not exist anymore.
It looks like this parameter was removed in
Commit
16bc66d9479edd5ee12ec734973554d4493c5dfa ("llama.cpp : split
llama_context_params into model and context params").
Signed-off-by: Daniel Bevenius <redacted>
slaren [Fri, 13 Oct 2023 10:23:10 +0000 (12:23 +0200)]
ggml : add context enumeration functions (#3605)
finetune : fix assert failure in ggml-alloc
shibe2 [Thu, 12 Oct 2023 19:59:47 +0000 (23:59 +0400)]
CLBlast: Fix matrix-vector multiplication (#3544)
M. Yusuf Sarıgöz [Thu, 12 Oct 2023 15:23:18 +0000 (18:23 +0300)]
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA
* rm scratch buf for now, will revert after cleanup
* LLaVA image encoder is working. will combine with llama
* Add llava inference code, but it's buggy. debugging
* LLaVA is working e2e, needs to optimize memory allocation + cleanup
* Use ggml_allocr + rm unnecessary code
* fix: crlf -> lf
* fix: new line at EoF
* fix: trailing whitespace
* Add readme
* Update readme
* Some cleanup
* Are you happy editorconfig?
* rm unused batch image preprocessing
* rm unused import
* fix: rm designated initializers
* introduce pad-to-square mode for non-square images
* are you happy editorconfig?
* gitignore /llava
* Handle cases where image file does not exist
* add llava target to Makefile
* add support for 13b model variant
* Maybe seed is unlucky?
* Check if apples are compared to apples
* are you happy editorconfig?
* Use temperature = 0.1 by default
* command line: use gpt_params_parse()
* minor
* handle default n_predict
* fix typo
* llava : code formatting, rename files, fix compile warnings
* do not use Wno-cast-qual for MSVC
---------
Co-authored-by: Georgi Gerganov <redacted>
uint256_t [Thu, 12 Oct 2023 13:36:16 +0000 (22:36 +0900)]
docs : fix typo GOMP_CPU_AFFINITY (#3597)
Georgi Gerganov [Thu, 12 Oct 2023 11:31:05 +0000 (14:31 +0300)]
cmake : fix add_compile_options on macOS
Ian Scrivener [Thu, 12 Oct 2023 11:10:50 +0000 (22:10 +1100)]
typo : it is `--n-gpu-layers` not `--gpu-layers` (#3592)
fixed a typo in the MacOS Metal run doco
Georgi Gerganov [Thu, 12 Oct 2023 10:44:56 +0000 (13:44 +0300)]
ci : check if there is enough VRAM (#3596)
ggml-ci
Aarni Koskela [Thu, 12 Oct 2023 06:51:53 +0000 (15:51 +0900)]
server : add completion mode (no chat) (#3582)
Georgi Gerganov [Thu, 12 Oct 2023 06:35:19 +0000 (09:35 +0300)]
prompts : add mnemonics.txt
Georgi Gerganov [Thu, 12 Oct 2023 06:29:04 +0000 (09:29 +0300)]
server : fix kv cache management (#3588)
Georgi Gerganov [Wed, 11 Oct 2023 20:55:08 +0000 (23:55 +0300)]
main : fix session loading bug (#3400)
Michael Coppola [Wed, 11 Oct 2023 19:42:22 +0000 (15:42 -0400)]
server : add parameter -tb N, --threads-batch N (#3584)
Co-authored-by: Michael Coppola <redacted>
Kerfuffle [Wed, 11 Oct 2023 19:35:46 +0000 (13:35 -0600)]
common : fix mirostat state when using multiple sequences (#3543)
* Fix mirostat state when using multiple sequences
* Fix mirostat by completely refactoring sampling!
* Try to fix zig build.
* Export function to fetch/create default sampler states
Code formatting cleanups and add some comments
Silence a warning about id not being used when logging is disabled
* Apply some renaming suggestions.
Fix comments that were out of sync with the pull.
* Use more consistant naming convention for sampling contexts
Georgi Gerganov [Wed, 11 Oct 2023 18:25:33 +0000 (21:25 +0300)]
batched : add bench tool (#3545)
* batched : add bench tool
* batched : minor fix table
* batched-bench : add readme + n_kv_max is now configurable
* batched-bench : init warm-up batch
* batched-bench : pass custom set of PP, TG and PL
* batched-bench : add mmq CLI arg
Zane Shannon [Wed, 11 Oct 2023 11:14:05 +0000 (04:14 -0700)]
examples : add batched.swift + improve CI for swift (#3562)
Galunid [Tue, 10 Oct 2023 23:02:49 +0000 (01:02 +0200)]
Add MPT model to supported models in README.md (#3574)
goerch [Tue, 10 Oct 2023 16:59:52 +0000 (18:59 +0200)]
Minor improvements in GPT2 tokenizer (#3567)
* Fixing minor bugs in bpe_gpt2_preprocess
* Don't add bos token in test
Xingchen Song(宋星辰) [Tue, 10 Oct 2023 16:28:50 +0000 (00:28 +0800)]
readme : add bloom (#3570)
Xingchen Song(宋星辰) [Tue, 10 Oct 2023 14:48:21 +0000 (22:48 +0800)]
llm : add bloom models (#3553)
* feat: Support bloom models
* fix(bloom): fix model size
---------
Co-authored-by: Georgi Gerganov <redacted>
Jhen-Jie Hong [Tue, 10 Oct 2023 11:31:13 +0000 (06:31 -0500)]
swift : improvements and fixes (#3564)
* swift : use macOS 12 as minimum requirement
* swift : add missing ggml-backend.c source
* swift : add -O3 -DNDEBUG unsafe flags
Jan Ploski [Tue, 10 Oct 2023 07:50:23 +0000 (09:50 +0200)]
llm : add MPT support (#3417)
* CUDA: added support for ggml_clamp (see also: https://github.com/ggerganov/ggml/issues/545)
* mpt : added an implementation based (mostly) on falcon integration, modified with deltas from ggml/examples/mpt
* mpt : protect against "clip_qkv": null in mpt-7b
* mpt : quick fix to avoid "Strange model" warning when quantizing MPT models
* mpt : addendum to changeset:
84e30e8 - leave parameter clamp_kqv out from metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)
* mpt : standardized all tensor names to follow GGUF spec
* mpt : addendum to changeset:
1be89c40 - use "req" parameter of GGUF_GET_KEY macro instead of duplicate code
* mpt : fixed comment s/gptneox/mpt/
* mpt : remove tabs, trailing whitespace
* mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) and rope_shift from build_mpt
* mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to convert-gptneox-hf-to-gguf.py in pr:3252
* comment out n_past instead of marking it unused
* mpt : removed hardcoded +178 from convert script in favor of utilizing hparams["vocab_size"]
* mpt : remove unused tokenizer_json in convert script
* ggml : remove obsolete n_past assert in ggml_alibi
* llama : print clam_kqv and max_alibi_bias hparams
---------
Co-authored-by: Cebtenzzre <redacted>
Co-authored-by: Georgi Gerganov <redacted>
vvhg1 [Tue, 10 Oct 2023 07:31:21 +0000 (09:31 +0200)]
infill. : fix tokenization (#3508)
* infill tokens correction
* serverinfill tokens correction
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape
* only rm when params.escape, rm space if possible which is added back or rm added space token
* only rm when params.escape, rm space if possible which is added back or rm added space token
* Revert "only rm when params.escape, rm space if possible which is added back or rm added space token"
This reverts commit
63ba0b621f21077c0e3bc6ba6a327534123cb738 .
* fix interactive prompt escaping and fix server infill leading space handling
* rm unnecessary bool check
slaren [Mon, 9 Oct 2023 12:44:58 +0000 (14:44 +0200)]
ggml-alloc : fix assert in debug builds (#3555)
Georgi Gerganov [Mon, 9 Oct 2023 11:32:17 +0000 (14:32 +0300)]
refact : fix convert script + zero out KV cache to avoid nans (#3523)
* refact : fix convert script + zero out KV cache to avoid nans
* ggml : silu(-inf) should never happen
* metal : assert various kernel requirements
Georgi Gerganov [Mon, 9 Oct 2023 11:28:27 +0000 (14:28 +0300)]
metal : do not use mul_mm kernels when ne00 < 64 (#3542)
Georgi Gerganov [Sun, 8 Oct 2023 17:19:14 +0000 (20:19 +0300)]
sync : ggml (ggml-backend) (#3548)
* sync : ggml (ggml-backend)
ggml-ci
* zig : add ggml-backend to the build
Matheus C. França [Sun, 8 Oct 2023 13:59:20 +0000 (10:59 -0300)]
ci : add Zig CI/CD and fix build (#2996)
* zig CI/CD and fix build
Signed-off-by: Matheus Catarino França <redacted>
* fix build_compiler
* ci : remove trailing whitespace
---------
Signed-off-by: Matheus Catarino França <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Ryder Wishart [Sun, 8 Oct 2023 10:55:58 +0000 (03:55 -0700)]
api_like_OAI.py : compat with Microsoft Guidance (#2746)
Check for None in addition to empty string check in all request params
Co-authored-by: Georgi Gerganov <redacted>
arcrank [Sun, 8 Oct 2023 10:52:57 +0000 (06:52 -0400)]
api_like_OAI.py : simplify function (#2796)
Simplify function
Johannes Rudolph [Sun, 8 Oct 2023 10:21:19 +0000 (12:21 +0200)]
k-quants : fix comments about block sizing (#3499)
Georgi Gerganov [Sun, 8 Oct 2023 08:24:50 +0000 (11:24 +0300)]
ci : enable on obj-c changes + fix metal build (#3540)
Luo Tian [Sun, 8 Oct 2023 08:24:01 +0000 (16:24 +0800)]
zig : fix build by introducing train.cpp (#3539)
Georgi Gerganov [Sun, 8 Oct 2023 07:01:53 +0000 (10:01 +0300)]
metal : support MTLGPUFamily < Apple7, formatting, style (#3524)
* metal : improve decoding speed for batches of 2-16
* metal : rename kernels mul_mat_ to mul_mv_
* metal : indentations
* minor
* metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7
Kerfuffle [Sun, 8 Oct 2023 05:22:17 +0000 (23:22 -0600)]
llama : fix missing break in Persimmon arch case statements (#3535)
Kerfuffle [Sat, 7 Oct 2023 21:31:41 +0000 (15:31 -0600)]
Fix trying to strip newline from empty prompt and cfg prompt file content (#3534)
M. Yusuf Sarıgöz [Sat, 7 Oct 2023 19:14:10 +0000 (22:14 +0300)]
gguf.py : fix CI for publishing GGUF package (#3532)
* Fix CI for publishing GGUF package
* Bump version
* fix
* bump version
* bump version
* bump version
Tom C [Sat, 7 Oct 2023 09:56:15 +0000 (02:56 -0700)]
py : change version of numpy requirement to 1.24.4 (#3515)
Co-authored-by: Lyjia <redacted>
cebtenzzre [Sat, 7 Oct 2023 08:41:52 +0000 (04:41 -0400)]
quantize : fail fast on write errors (#3521)
Jhen-Jie Hong [Sat, 7 Oct 2023 08:40:27 +0000 (03:40 -0500)]
metal : support default.metallib load & reuse code for swift package (#3522)
* metal : support load default.metallib & reuse code for swift package
* metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT
Phillip Kravtsov [Sat, 7 Oct 2023 07:12:43 +0000 (00:12 -0700)]
llm : support Adept Persimmon 8B (#3410)
* Produces garbage output
* wip: correct tensors up to RoPE
* correct tensors thru RoPE
* Correct outputs through masked & softmax'd KQ
* fp32 works
* Rename adept->persimmon
* Produces correct outputs
* clean up convert scripts
* remove printing logic from ggml.c
* remove prints from llama.cpp & fix merge
* trivial cleanups
* Add offload funcs
* update conversion script to directly take adept artifacts rather than .saftensors file
* Fix norm eps bug
* Support sqr and concat on metal, persimmon-8b-q4 runs correctly
* Small changes from review
* Formatting changes
* Minor changes to conversion script
* Remove old script
* Fix editorconfig formatting
* Fix build
* add overlooked offload code ggml-ci
goerch [Sat, 7 Oct 2023 04:57:01 +0000 (06:57 +0200)]
Fix for #3454 (#3455)
Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion
BarfingLemurs [Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)]
readme : update models, cuda + ppl instructions (#3510)
Mihai [Fri, 6 Oct 2023 18:39:33 +0000 (21:39 +0300)]
server : docs fix default values and add n_probs (#3506)
Kerfuffle [Fri, 6 Oct 2023 16:10:13 +0000 (10:10 -0600)]
kv cache slot search improvements (#3493)
* kv cache slot search improvements
* Use n_ctx in kv find slot for consistency
* Ensure kv cache head points to a valid slot in llama_decode internal
* Add some comments to prevent dumb people (like me) from getting confused.
Georgi Gerganov [Fri, 6 Oct 2023 13:35:55 +0000 (16:35 +0300)]
prompts : fix editorconfig checks after #3416
pudepiedj [Fri, 6 Oct 2023 13:16:38 +0000 (14:16 +0100)]
parallel : add option to load external prompt file (#3416)
* Enable external file and add datestamp
* Add name of external file at end
* Upload ToK2024
* Delete ToK2024.txt
* Experiments with jeopardy
* Move ParallelQuestions to /proimpts and rename
* Interim commit
* Interim commit
* Final revision
* Remove trailing whitespace
* remove cmake_all.sh
* Remove cmake_all.sh
* Changed .gitignore
* Improved reporting and new question files.
* Corrected typo
* More LLM questions
* Update LLM-questions.txt
* Yet more LLM-questions
* Remove jeopardy results file
* Reinstate original jeopardy.sh
* Update examples/parallel/parallel.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
Jhen-Jie Hong [Fri, 6 Oct 2023 12:44:24 +0000 (07:44 -0500)]
server : reuse llama_sample_token common util (#3494)
* server : reuse llama_sample_token common function
* common : use n_probs for temperature sampling
l3utterfly [Fri, 6 Oct 2023 10:47:59 +0000 (18:47 +0800)]
llama : correct hparams comparison (#3446)
* fixed floating point comparison issues
* updated implementation for hparam comparison to handle inf and NaN
* fixed code review comments
* minor simplification
* rename is_float_eq -> is_float_close
---------
Co-authored-by: Cebtenzzre <redacted>
Jhen-Jie Hong [Fri, 6 Oct 2023 10:36:43 +0000 (05:36 -0500)]
ci : fix xcodebuild destinations (#3491)
* ci : fix xcodebuild destinations
* ci : add .swift to paths
cebtenzzre [Thu, 5 Oct 2023 19:00:34 +0000 (15:00 -0400)]
convert : update Falcon script for new HF config (#3448)
Also adds Falcon-180B support.
Closes #3049
Co-authored-by: jb <redacted>
Kenvix ⭐ [Thu, 5 Oct 2023 17:16:39 +0000 (01:16 +0800)]
build : use std::make_tuple() for compatibility with older GCC versions (#3488)
staviq [Thu, 5 Oct 2023 16:17:29 +0000 (18:17 +0200)]
common : process escape sequences in reverse prompts (#3461)
shibe2 [Thu, 5 Oct 2023 11:57:03 +0000 (15:57 +0400)]
CLBlast: Fix handling of on-device tensor data
Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
Jhen-Jie Hong [Thu, 5 Oct 2023 14:02:55 +0000 (09:02 -0500)]
server : fix incorrect num_tokens_predicted (#3480)
Jhen-Jie Hong [Thu, 5 Oct 2023 14:00:07 +0000 (09:00 -0500)]
swift : disable ACCELERATE_NEW_LAPACK (#3481)
Jhen-Jie Hong [Thu, 5 Oct 2023 13:56:21 +0000 (08:56 -0500)]
ci : add swift build via xcodebuild (#3482)
Kerfuffle [Wed, 4 Oct 2023 14:20:28 +0000 (08:20 -0600)]
convert : fix Baichuan2 models by using vocab size in config.json (#3299)
Use local GGUF package when possible in Baichuan converter
Georgi Gerganov [Wed, 4 Oct 2023 13:50:44 +0000 (16:50 +0300)]
readme : add project status link
Georgi Gerganov [Wed, 4 Oct 2023 13:25:41 +0000 (16:25 +0300)]
ggml : fix build after #3329
ds5t5 [Wed, 4 Oct 2023 13:23:39 +0000 (06:23 -0700)]
llm : add Refact model (#3329)
* add refact model
* resolve comments
* rebase to the latest
* solve alibi cpu error
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Wed, 4 Oct 2023 12:29:58 +0000 (15:29 +0300)]
sync : ggml (conv 1d + 2d updates, UB fixes) (#3468)
* sync : ggml (conv 1d + 2d updates)
ggml-ci
* ggml : fix UB in q5_0 and q5_1 quantize code
ggml.c:1033:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml.c:1081:39: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
ggml-ci
* tests : fix UB in test-quantize-perf
Merrick Christensen [Wed, 4 Oct 2023 06:33:13 +0000 (00:33 -0600)]
finetune : readme fix typo (#3465)
Fix small typo
Tameem [Tue, 3 Oct 2023 18:38:19 +0000 (23:38 +0500)]
ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (#3453)
* Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v.
The RVV intrinsics is added for the following quantize row functions
quantize_row_q8_0
quantize_row_q8_1
The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1
ggml_vec_dot_q4_0_q8_0
ggml_vec_dot_q4_1_q8_1
ggml_vec_dot_q5_0_q8_0
ggml_vec_dot_q5_1_q8_1
And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics
Signed-off-by: Ahmad Tameem <redacted>
* Added RVV intrinsics support for k_quants
This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64
ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q5_K_q8_K
ggml_vec_dot_q6_K_q8_K
Signed-off-by: Ahmad Tameem <redacted>
---------
Signed-off-by: Ahmad Tameem <redacted>
h-h-h-h [Tue, 3 Oct 2023 18:16:15 +0000 (20:16 +0200)]
main : consistent prefix/suffix coloring (#3425)
* Typo
* No `--in-prefix` coloring
The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.
Georgi Gerganov [Tue, 3 Oct 2023 18:04:01 +0000 (21:04 +0300)]
llama : fix session saving/loading (#3400)
* llama : fix session saving/loading
* llama : temp fix for clearing "future" tokens from the KV cache
* llama : fix handling of "future" tokens when loading sessions
* llama : fix comments for llama_kv_cache API
Alex Klinkhamer [Tue, 3 Oct 2023 17:09:28 +0000 (10:09 -0700)]
llama : expose model's rope_freq_scale in the API (#3418)
so it can be scaled further before creating a context.
Jiahao Li [Tue, 3 Oct 2023 16:55:21 +0000 (00:55 +0800)]
metal : alibi for arbitrary number of heads (#3426)
Eve [Tue, 3 Oct 2023 16:53:15 +0000 (16:53 +0000)]
cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)
* fix LLAMA_NATIVE
* syntax
* alternate implementation
* my eyes must be getting bad...
* set cmake LLAMA_NATIVE=ON by default
* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc
* revert
8283237 and only allow LLAMA_NATIVE on x86 like the Makefile
* remove -DLLAMA_MPI=ON
---------
Co-authored-by: netrunnereve <redacted>