]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Jared Van Bortel [Sat, 23 Mar 2024 22:48:02 +0000 (18:48 -0400)]
use _wfopen instead of fopen on Windows (#6248)
also fix missing #defines before windows.h, and BPE LF token on MSVC
Georgi Gerganov [Sat, 23 Mar 2024 19:35:23 +0000 (21:35 +0200)]
gitignore : gguf-split
Pierrick Hymbert [Sat, 23 Mar 2024 17:07:00 +0000 (18:07 +0100)]
common: llama_load_model_from_url split support (#6192)
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
- fix header name case sensitive
- support downloading additional split in parallel
- hide password in url
* common: EOL EOF
* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition
* common: change max url max length
* common: minor comment
* server: support HF URL options
* llama: llama_model_loader fix log
* common: use a constant for max url length
* common: clean up curl if file cannot be loaded in gguf
* server: tests: add split tests, and HF options params
* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda
* server: tests: enable back Release test on PR
* spacing
Co-authored-by: Georgi Gerganov <redacted>
* spacing
Co-authored-by: Georgi Gerganov <redacted>
* spacing
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Pierrick Hymbert [Sat, 23 Mar 2024 17:00:38 +0000 (18:00 +0100)]
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (#6254)
Julius Arkenberg [Sat, 23 Mar 2024 16:41:53 +0000 (17:41 +0100)]
llama : add grok-1 support (#6204)
* Add support for Grok model architecture
* Revert convert-hf-to-gguf to default options
* Fixed f_norm_rms_eps bug
* Fix whitespaces
* llama : fix grok rope type
* llama : minor
---------
Co-authored-by: Georgi Gerganov <redacted>
Pierrick Hymbert [Sat, 23 Mar 2024 16:18:13 +0000 (17:18 +0100)]
split: add gguf-split in the make build target (#6262)
Pierrick Hymbert [Sat, 23 Mar 2024 12:18:45 +0000 (13:18 +0100)]
server: flush stdout after logging in both text and json layout (#6253)
Johannes Gäßler [Sat, 23 Mar 2024 00:24:36 +0000 (01:24 +0100)]
lookup: complement data from context with general text statistics (#5479)
* lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
Georgi Gerganov [Fri, 22 Mar 2024 19:10:39 +0000 (21:10 +0200)]
common : default --hf-file to --model (#6234)
fraxy-v [Fri, 22 Mar 2024 18:49:06 +0000 (20:49 +0200)]
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
* convert-llama2c-to-ggml: enable conversion of multiqueries, #5608
* add test in build action
* Update build.yml
* Update build.yml
* Update build.yml
* gg patch
Kawrakow [Fri, 22 Mar 2024 18:47:14 +0000 (19:47 +0100)]
quantize: options for output and token embedding tensors qtype (#6239)
* quantize: be able to specify the output tensor type
* quantize: be able to specify the token embedding tensor type
---------
Co-authored-by: Iwan Kawrakow <redacted>
Pierrick Hymbert [Fri, 22 Mar 2024 18:00:01 +0000 (19:00 +0100)]
llama_model_loader: support multiple split/shard GGUFs (#6187)
* split: support in llama_model_loader
* avoid copying the entire vector
Co-authored-by: slaren <redacted>
* split: move llama_tensor_offset to llama_model_loader
* llama_model_loader: PR feedbacks:
- use only one gguf_context for metadata only
- store all ggml_context in a vector as the files and mappings
- store all weights in a vector along with the source tensor
- rename ctx_gguf to meta
- rename ctx_meta to contexts
* avoid copying the entire vector
* Simplify this by making these optional, switch some layer creation tensor optional
Co-authored-by: Georgi Gerganov <redacted>
* Handle optional tensors
Co-authored-by: Georgi Gerganov <redacted>
* llama_model_loader: fail if backend cannot allocate buffer
* fix mmap buffer management
* llama_model_loader: map file to backend buffer if the allocation succeeds only
* llama_model_loader: only map tensors included in the context
* llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast
* llama_model_loader: fail if any of backend buffer cannot be allocated
* spacing
Co-authored-by: slaren <redacted>
* fix loop over pointer
Co-authored-by: slaren <redacted>
* llama_model_loader: if n_tensors declared not equals to loaded tensors in split, throw an exception instead of asserting
* llama_model_loader: ensure mappings vector has the expected size
* llama_model_loader: use at instead of operator[] if this should never add to the map.
* llama_model_loader: immediately add the backend buffer to the model buffers in order to free them if an error occurs in the next allocation. Reserve the expected size.
* llama_model_loader: be sure the model mappings has enough capacity before allocating backend buffer
* llama_model_loader: fix map -> unordered map
* llama_split_prefix: use a clearer version, not pass split path len but dest max len.
Co-authored-by: Xuan Son Nguyen <redacted>
* llama : minor
ggml-ci
* llama : introduce some typedef helpers
* docs: add model shard in hot topic
* llama_model_loader: put mapping in a unique_ptr from the moment it is allocated
Co-authored-by: slaren <redacted>
* fix llama_split_prefix
---------
Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Minsoo Cheong [Fri, 22 Mar 2024 17:15:06 +0000 (02:15 +0900)]
ci: apply concurrency limit for github workflows (#6243)
Georgi Gerganov [Fri, 22 Mar 2024 13:33:38 +0000 (15:33 +0200)]
common : add HF arg helpers (#6234)
* common : add HF arg helpers
* common : remove defaults
Nexesenex [Fri, 22 Mar 2024 13:32:02 +0000 (14:32 +0100)]
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
IQ3_XS was not mentioned, IQ3_S and IQ3_M were present twice.
That PR corrects this in the manner which was probably intended initially.
Olivier Chafik [Fri, 22 Mar 2024 13:09:07 +0000 (13:09 +0000)]
tests : conditional python & node json schema tests (#6207)
* json: only attempt python & node schema conversion tests if their bins are present
Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978
disabled in https://github.com/ggerganov/llama.cpp/pull/6198
* json: orange warnings when tests skipped
* json: ensure py/js schema conv tested on ubuntu-focal-make
* json: print env vars in test
Olivier Chafik [Fri, 22 Mar 2024 13:07:44 +0000 (13:07 +0000)]
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
* json: ordered json in server/schema converter to respect orig order
* json: ws nits
* json: support non-string const / enums
slaren [Fri, 22 Mar 2024 13:05:31 +0000 (14:05 +0100)]
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
* cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy
* add LLAMA_CUDA_NO_PEER_COPY to HIP build
Xiaoyi Chen [Fri, 22 Mar 2024 11:29:49 +0000 (04:29 -0700)]
readme : add RecurseChat to the list of UIs (#6219)
Jan Boon [Fri, 22 Mar 2024 11:12:05 +0000 (19:12 +0800)]
server : fix n_keep always showing as 0 in response (#6211)
Georgi Gerganov [Fri, 22 Mar 2024 11:08:28 +0000 (13:08 +0200)]
server : enable continuous batching by default (#6231)
Georgi Gerganov [Fri, 22 Mar 2024 09:35:53 +0000 (11:35 +0200)]
metal : proper assert for mat-mat memory alignment (#6225)
* metal : proper assert for mat-mat memory alignment
ggml-ci
* readme : add notice about the bug fix
* metal : fix the fix
ggml-ci
Vaibhav Srivastav [Fri, 22 Mar 2024 07:53:43 +0000 (08:53 +0100)]
ci : add CURL flag for the mac builds (#6214)
Georgi Gerganov [Fri, 22 Mar 2024 07:36:03 +0000 (09:36 +0200)]
metal : pad n_ctx by 32 (#6177)
* metal : require ne00 >= 128 for mat-mat kernels
ggml-ci
* llama : pad n_ctx by 32
ggml-ci
Neo Zhang Jianyu [Fri, 22 Mar 2024 07:19:37 +0000 (15:19 +0800)]
add blog link (#6222)
DAN™ [Fri, 22 Mar 2024 01:32:42 +0000 (21:32 -0400)]
Fix params underscore convert to dash. (#6203)
* Fix params underscore convert to dash.
* Update common/common.cpp
---------
Co-authored-by: slaren <redacted>
Jan Boon [Thu, 21 Mar 2024 22:41:24 +0000 (06:41 +0800)]
server : update readme doc from `slot_id` to `id_slot` (#6213)
slaren [Thu, 21 Mar 2024 18:54:28 +0000 (19:54 +0100)]
cuda : disable host register by default (#6206)
semidark [Thu, 21 Mar 2024 17:52:35 +0000 (11:52 -0600)]
Corrected typo to wrong file (#6199)
The stated file `./devops/main-server.Dockerfile` does not exist. I figure that `.devops/server-intel.Dockerfile` was meant.
Georgi Gerganov [Thu, 21 Mar 2024 14:20:05 +0000 (16:20 +0200)]
tests : disable system() calls (#6198)
ggml-ci
slaren [Thu, 21 Mar 2024 12:59:53 +0000 (13:59 +0100)]
cuda : fix LLAMA_CUDA_F16 build (#6197)
Kawrakow [Thu, 21 Mar 2024 12:59:38 +0000 (13:59 +0100)]
ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)
* Make quantize_row_iq4_nl do the same thing is quantization on CUDA
* Make quantize_row_iq4_nl do the same thing is quantization on CUDA
This time for real. backend-ops tests pass.
* Now fix test-quantize-fns
---------
Co-authored-by: Iwan Kawrakow <redacted>
Olivier Chafik [Thu, 21 Mar 2024 11:50:43 +0000 (11:50 +0000)]
json-schema-to-grammar improvements (+ added to server) (#5978)
* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
Vaibhav Srivastav [Thu, 21 Mar 2024 09:30:40 +0000 (10:30 +0100)]
ci : fix indentation error (#6195)
Vaibhav Srivastav [Thu, 21 Mar 2024 09:13:12 +0000 (10:13 +0100)]
build : add mac pre-build binaries (#6182)
* Initial commit - add mac prebuilds.
* forward contribution credits for building the workflow.
* minor : remove trailing whitespaces
---------
Co-authored-by: Nicolas Patry <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Kawrakow [Thu, 21 Mar 2024 07:27:57 +0000 (08:27 +0100)]
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
* k_cache: be able to use Q5_0
* k_cache: be able to use Q5_1 on CODA
* k_cache: be able to use Q5_0 on Metal
* k_cache: be able to use Q5_1 on Metal
* k_cache: be able to use IQ4_NL - just CUDA for now
* k_cache: be able to use IQ4_NL on Metal
* k_cache: add newly added supported types to llama-bench and CUDA supports_op
---------
Co-authored-by: Iwan Kawrakow <redacted>
AidanBeltonS [Thu, 21 Mar 2024 06:10:52 +0000 (06:10 +0000)]
Add nvidia and amd backends (#6157)
slaren [Thu, 21 Mar 2024 00:47:46 +0000 (01:47 +0100)]
cuda : fix conflict with std::swap (#6186)
slaren [Wed, 20 Mar 2024 20:03:26 +0000 (21:03 +0100)]
cuda : print the returned error when CUDA initialization fails (#6185)
Ziang Wu [Wed, 20 Mar 2024 15:29:51 +0000 (23:29 +0800)]
llava : update MobileVLM-README.md (#6180)
Ziang Wu [Wed, 20 Mar 2024 15:02:32 +0000 (23:02 +0800)]
llava : add MobileVLM_V2 backup (#6175)
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <redacted>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <redacted>
* clip : fix whitespace
* fix deifinition mistake in clip.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
slaren [Wed, 20 Mar 2024 13:42:59 +0000 (14:42 +0100)]
cuda : refactor to remove global resources (#6170)
* cuda : refactor to remove global resources
Xuan Son Nguyen [Wed, 20 Mar 2024 12:30:36 +0000 (13:30 +0100)]
Server: version bump for httplib and json (#6169)
* server: version bump for httplib and json
* fix build
* bring back content_length
Georgi Gerganov [Wed, 20 Mar 2024 12:17:34 +0000 (14:17 +0200)]
gitignore : ignore curl-related files
Georgi Gerganov [Wed, 20 Mar 2024 12:14:32 +0000 (14:14 +0200)]
server : allow to override -ngl in tests (#6170)
Georgi Gerganov [Wed, 20 Mar 2024 11:29:49 +0000 (13:29 +0200)]
Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"
This reverts commit
f8c4e745e1e728204ab26dbadf52853545e6789c .
Ziang Wu [Wed, 20 Mar 2024 11:20:37 +0000 (19:20 +0800)]
llava : add a MobileVLM_V2-1.7B backup (#6152)
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <redacted>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <redacted>
* clip : fix whitespace
---------
Co-authored-by: Georgi Gerganov <redacted>
Karthick [Wed, 20 Mar 2024 11:02:34 +0000 (16:32 +0530)]
Server: Handle n_keep parameter in the request (#6174)
Jared Van Bortel [Wed, 20 Mar 2024 05:33:49 +0000 (01:33 -0400)]
server tests : more pythonic process management; fix bare `except:` (#6146)
* server tests : remove seemingly redundant newlines in print()
* server tests : use built-in subprocess features, not os.kill and psutil
* server tests : do not catch e.g. SystemExit; use print_exc
* server tests: handle TimeoutExpired exception
* server tests: fix connect on dual-stack systems
* server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127)
* server: tests: remove the hack on windows since now we get the good socket family
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)
---------
Co-authored-by: Pierrick HYMBERT <redacted>
Neo Zhang Jianyu [Wed, 20 Mar 2024 03:21:41 +0000 (11:21 +0800)]
update readme sycl for new update (#6151)
* update readme sycl for new update
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <redacted>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <redacted>
* update by review comments
* update w64devkit link
* update for verify device id part
* Update README-sycl.md
Co-authored-by: Meng, Hengyu <redacted>
---------
Co-authored-by: Abhilash Majumder <redacted>
Co-authored-by: AidanBeltonS <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Abhilash Majumder [Wed, 20 Mar 2024 02:58:49 +0000 (08:28 +0530)]
increase igpu cluster limit (#6159)
DAN™ [Tue, 19 Mar 2024 16:16:09 +0000 (12:16 -0400)]
Remove undeed header file. (#6158)
Pierrick Hymbert [Tue, 19 Mar 2024 11:05:44 +0000 (12:05 +0100)]
gguf-split: split and merge gguf per batch of tensors (#6135)
* gguf-split: split and merge gguf files per tensor
* gguf-split: build with make toolchain
* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split
* split : minor style + fix compile warnings
* gguf-split: remove --upload not implemented
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Tue, 19 Mar 2024 08:21:54 +0000 (10:21 +0200)]
common : disable repeat penalties by default (#6127)
slaren [Tue, 19 Mar 2024 08:06:54 +0000 (09:06 +0100)]
ci : exempt some labels from being tagged as stale (#6140)
DAN™ [Tue, 19 Mar 2024 05:59:36 +0000 (01:59 -0400)]
common : print usage on '-h' and '--help' (#6145)
github-actions[bot] [Sun, 17 Mar 2024 06:37:44 +0000 (06:37 +0000)]
flake.lock: Update
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
9df3e30ce24fd28c7b3e2de0d986769db5d6225d ' (2024-03-06)
→ 'github:NixOS/nixpkgs/
d691274a972b3165335d261cc4671335f5c67de9 ' (2024-03-14)
Jared Van Bortel [Mon, 18 Mar 2024 16:49:02 +0000 (12:49 -0400)]
mpt : implement backwards compatiblity with duped output tensor (#6139)
Felix [Mon, 18 Mar 2024 15:40:22 +0000 (16:40 +0100)]
clip : fix memory leak (#6138)
slaren [Mon, 18 Mar 2024 15:33:44 +0000 (16:33 +0100)]
backend : set max split inputs to GGML_MAX_SRC (#6137)
Georgi Gerganov [Mon, 18 Mar 2024 11:45:38 +0000 (13:45 +0200)]
ci : disable stale issue messages (#6126)
Georgi Gerganov [Mon, 18 Mar 2024 11:45:27 +0000 (13:45 +0200)]
ci : temporary disable sanitizer builds (#6128)
slaren [Mon, 18 Mar 2024 10:03:04 +0000 (11:03 +0100)]
backend : offload large batches to GPU (#6083)
* backend : offload large batches to GPU
* fix hip
* code cleanup
* fix CUDA split buffers
* Update ggml-backend-impl.h
Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix memset without set_device
* imatrix : remove sched affix from weight names
* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup
* update backends
ggml-ci
---------
Co-authored-by: Johannes Gäßler <redacted>
DAN™ [Mon, 18 Mar 2024 08:27:44 +0000 (04:27 -0400)]
common : tidy-up argument parsing (#6105)
* Tidy-up argument parsing.
* Missing ref.
* common : minor
* common : add static classifier
---------
Co-authored-by: Georgi Gerganov <redacted>
Thérence [Mon, 18 Mar 2024 08:17:00 +0000 (09:17 +0100)]
convert : add support for CamembertModel architecture (#6119)
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
Romain D [Mon, 18 Mar 2024 08:04:41 +0000 (09:04 +0100)]
convert : use f32 outtype for bf16 tensors (#6106)
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
Pierrick Hymbert [Sun, 17 Mar 2024 18:12:37 +0000 (19:12 +0100)]
common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Sun, 17 Mar 2024 17:51:57 +0000 (19:51 +0200)]
ci : close all stale issues at once (#6115)
GainLee [Sun, 17 Mar 2024 17:12:22 +0000 (01:12 +0800)]
ggml:fix finding transfer queue family index error (#6094)
Co-authored-by: GainLee <redacted>
AmirAli Mirian [Sat, 16 Mar 2024 15:52:02 +0000 (11:52 -0400)]
ggml : add AVX512F SIMD (#6088)
Daniel Bevenius [Sat, 16 Mar 2024 15:46:29 +0000 (16:46 +0100)]
gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm
This commit adds a suggestion for an initial README.md for the gritlm
example.
Signed-off-by: Daniel Bevenius <redacted>
* squash! gritlm: add initial README.md to examples/gritlm
Use the `scripts/hf.sh` script to download the model file.
Signed-off-by: Daniel Bevenius <redacted>
* squash! gritlm: add initial README.md to examples/gritlm
Fix editorconfig-checker error in examples/gritlm/README.md.
Signed-off-by: Daniel Bevenius <redacted>
---------
Signed-off-by: Daniel Bevenius <redacted>
Xuan Son Nguyen [Sat, 16 Mar 2024 15:42:08 +0000 (16:42 +0100)]
readme : add wllama as a wasm binding (#6100)
DAN™ [Sat, 16 Mar 2024 15:39:15 +0000 (11:39 -0400)]
common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.
* Revert back and remove else's.
* Add flag to track found arguments.
Pierrick Hymbert [Sat, 16 Mar 2024 12:20:53 +0000 (13:20 +0100)]
ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow
* ci: close issue, change workflow schedule time
slaren [Fri, 15 Mar 2024 21:14:16 +0000 (22:14 +0100)]
llama : fix Baichuan2 13B (#6092)
Theia Vogel [Fri, 15 Mar 2024 20:43:02 +0000 (13:43 -0700)]
llama : add support for control vectors (#5970)
* control vector api and implementation
* control-vectors : minor code style updates
* disable control vector when data == nullptr
use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)
---------
Co-authored-by: Georgi Gerganov <redacted>
Andrew Canis [Fri, 15 Mar 2024 20:41:22 +0000 (16:41 -0400)]
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
https://huggingface.co/CohereForAI/c4ai-command-r-v01
Based on the llama2 model with a few changes:
1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used
Find GGUF files here:
https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF
To convert model to GGUF format yourself:
1) Download Command-R Hugging Face safetensors:
git lfs install
git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01
2) Run:
python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
Ting Lou [Fri, 15 Mar 2024 14:31:05 +0000 (22:31 +0800)]
llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <redacted>
slaren [Fri, 15 Mar 2024 12:24:03 +0000 (13:24 +0100)]
cuda : disable unused cudaLaunchHostFunc code (#6078)
Neo Zhang Jianyu [Fri, 15 Mar 2024 10:53:53 +0000 (18:53 +0800)]
fix set main gpu error (#6073)
Georgi Gerganov [Fri, 15 Mar 2024 09:36:50 +0000 (11:36 +0200)]
make : ggml-metal.o depends on ggml.h
AidanBeltonS [Fri, 15 Mar 2024 09:26:20 +0000 (09:26 +0000)]
[SYCL] Fix non-intel device selection (#6042)
* Fix non-intel device selection
* Update ggml-sycl.cpp
Co-authored-by: Neo Zhang Jianyu <redacted>
* Update ggml-sycl.cpp
Co-authored-by: Neo Zhang Jianyu <redacted>
---------
Co-authored-by: Abhilash Majumder <redacted>
Co-authored-by: Neo Zhang Jianyu <redacted>
Ondřej Čertík [Fri, 15 Mar 2024 08:46:51 +0000 (02:46 -0600)]
gguf : add support for I64 and F64 arrays (#6062)
* gguf : add support for I64 and F64 arrays
GGML currently does not support I64 or F64 arrays and they are not often
used in machine learning, however if in the future the need arises, it
would be nice to add them now, so that the types are next to the other
types I8, I16, I32 in the enums, and it also reserves their type number.
Furthermore, with this addition the GGUF format becomes very usable for
most computational applications of NumPy (being compatible with the most
common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster,
and more versatile alternative to the `npz` format, and a simpler
alternative to the `hdf5` format.
The change in this PR seems small, not significantly increasing the
maintenance burden. I tested this from Python using GGUFWriter/Reader
and `gguf-dump`, as well as from C, everything seems to work.
* Fix compiler warnings
Xuan Son Nguyen [Fri, 15 Mar 2024 08:44:57 +0000 (09:44 +0100)]
llama : add Orion chat template (#6066)
slaren [Fri, 15 Mar 2024 08:22:24 +0000 (09:22 +0100)]
llama-bench : use random tokens to improve accuracy with mixtral (#6069)
Georgi Gerganov [Thu, 14 Mar 2024 20:58:41 +0000 (22:58 +0200)]
llama : fix integer overflow during quantization (#6063)
Steve Grubb [Thu, 14 Mar 2024 18:29:32 +0000 (14:29 -0400)]
gguf : fix resource leaks (#6061)
There several places where a gguf context is allocated. A call to gguf_free
is missing in some error paths. Also on linux, llama-bench was missing a
fclose.
Ondřej Čertík [Thu, 14 Mar 2024 17:57:31 +0000 (11:57 -0600)]
gguf-py : bump version to 0.8.0 (#6060)
Michael Podvitskiy [Thu, 14 Mar 2024 16:21:56 +0000 (17:21 +0100)]
llama : support models without vocabulary (#5798)
* additional methods to read model and ctx parameters
* vocab size as a part of a model metadata
* models without vocabulary, convert.py part
* models without vocabulary, llama.cpp part
* PR clean up
* converter scrypt fixes
* llama_vocab_type update (renamed the new key)
* pr review fixes
* revert function renaming
* one more NoVocab assert
Georgi Gerganov [Thu, 14 Mar 2024 13:14:14 +0000 (15:14 +0200)]
embedding : add EOS token if not present (#899)
Georgi Gerganov [Thu, 14 Mar 2024 11:32:14 +0000 (13:32 +0200)]
gguf-py : fix dtype check (#6045)
Jian Liao [Thu, 14 Mar 2024 11:18:23 +0000 (04:18 -0700)]
readme : improve readme for Llava-1.6 example (#6044)
Co-authored-by: Jian Liao <redacted>
Pierrick Hymbert [Thu, 14 Mar 2024 11:15:39 +0000 (12:15 +0100)]
server: disable debug release type sanitizer, simplify trigger (#6047)
- increase time out for server
- do not fail fast
Georgi Gerganov [Thu, 14 Mar 2024 11:13:06 +0000 (13:13 +0200)]
llama : fix typo
Michael Podvitskiy [Thu, 14 Mar 2024 10:56:48 +0000 (11:56 +0100)]
llama : optimize defrag moves + fix fragmentation calculation (#6037)
* attempt to reduce the impact of a worst-case scenario
* fragmentation calculation fix
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
Ondřej Čertík [Thu, 14 Mar 2024 10:40:14 +0000 (04:40 -0600)]
gguf-py : add support for I8, I16 and I32 (#6045)
* Refactor dtype handling to be extensible
This code is equivalent as before, but now it is prepared to easily add
more NumPy dtypes.
* Add support for I8, I16 and I32
These types are allowed in the GGUF specification.
* Add support for I8, I16 and I32 to gguf_writer
* Add support for I8, I16, I32 to gguf_reader
Georgi Gerganov [Thu, 14 Mar 2024 10:38:37 +0000 (12:38 +0200)]
ggml : designate enum vals for integer types (#6050)
Georgi Gerganov [Thu, 14 Mar 2024 10:37:20 +0000 (12:37 +0200)]
embedding : print all resulting embeddings (#899)
Georgi Gerganov [Thu, 14 Mar 2024 09:55:23 +0000 (11:55 +0200)]
metal : build metallib + fix embed path (#6015)
* metal : build metallib + fix embed path
ggml-ci
* metal : fix embed build + update library load logic
ggml-ci
* metal : fix embeded library build
ggml-ci
* ci : fix iOS builds to use embedded library
Georgi Gerganov [Thu, 14 Mar 2024 08:12:29 +0000 (10:12 +0200)]
embedding : print cosine similarity (#899)