]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Pedro Cuenca [Tue, 26 Mar 2024 12:32:19 +0000 (13:32 +0100)]
convert-hf : fix exception in sentencepiece with added tokens (#6320)
Kawrakow [Tue, 26 Mar 2024 12:09:30 +0000 (13:09 +0100)]
quantize : be able to override metadata by key (#6321)
* quantize: be able to override metadata by key
* minor : spacing
---------
Co-authored-by: Iwan Kawrakow <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Minsoo Cheong [Tue, 26 Mar 2024 09:11:46 +0000 (18:11 +0900)]
embedding : adjust `n_ubatch` value (#6296)
* embedding: assign `n_ubatch` value, print error on `n_batch` overflow
* Update examples/embedding/embedding.cpp
Co-authored-by: Xuan Son Nguyen <redacted>
* use %ld instead of %lld
* Revert "use %ld instead of %lld"
This reverts commit
ea753ede90a86a0699f65878cc8e2020ff5eabb8 .
---------
Co-authored-by: Xuan Son Nguyen <redacted>
Jan Boon [Tue, 26 Mar 2024 08:47:43 +0000 (16:47 +0800)]
server : add `n_discard` parameter (#6300)
Joseph Stahl [Tue, 26 Mar 2024 00:51:46 +0000 (20:51 -0400)]
nix: make `xcrun` visible in Nix sandbox for precompiling Metal shaders (#6118)
* Symlink to /usr/bin/xcrun so that `xcrun` binary
is usable during build (used for compiling Metal shaders)
Fixes https://github.com/ggerganov/llama.cpp/issues/6117
* cmake - copy default.metallib to install directory
When metal files are compiled to default.metallib, Cmake needs to add this to the install directory so that it's visible to llama-cpp
Also, update package.nix to use absolute path for default.metallib (it's not finding the bundle)
* add `precompileMetalShaders` flag (defaults to false) to disable precompilation of metal shader
Precompilation requires Xcode to be installed and requires disable sandbox on nix-darwin
slaren [Tue, 26 Mar 2024 00:16:01 +0000 (01:16 +0100)]
cuda : rename build flag to LLAMA_CUDA (#6299)
Christian Kögler [Mon, 25 Mar 2024 17:52:45 +0000 (18:52 +0100)]
nix: fix blas support (#6281)
Since no blas was provided to buildInputs, the executable is built without blas support.
This is a backport of NixOS/nixpkgs#298567
Kawrakow [Mon, 25 Mar 2024 17:33:15 +0000 (18:33 +0100)]
tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303)
Co-authored-by: Iwan Kawrakow <redacted>
Georgi Gerganov [Mon, 25 Mar 2024 15:22:27 +0000 (17:22 +0200)]
flake.lock: Update (#6266)
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
d691274a972b3165335d261cc4671335f5c67de9 ' (2024-03-14)
→ 'github:NixOS/nixpkgs/
44d0940ea560dee511026a53f0e2e2cde489b4d4 ' (2024-03-23)
Co-authored-by: github-actions[bot] <redacted>
slaren [Mon, 25 Mar 2024 14:43:22 +0000 (15:43 +0100)]
cuda : fix LLAMA_CUDA_F16 build (#6298)
slaren [Mon, 25 Mar 2024 12:50:23 +0000 (13:50 +0100)]
cuda : refactor into multiple files (#6269)
Xuan Son Nguyen [Mon, 25 Mar 2024 08:42:17 +0000 (09:42 +0100)]
Server: clean up OAI params parsing function (#6284)
* server: clean up oai parsing function
* fix response_format
* fix empty response_format
* minor fixes
* add TODO for logprobs
* update docs
Neo Zhang Jianyu [Mon, 25 Mar 2024 07:52:41 +0000 (15:52 +0800)]
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
* fix LOG() error for SYCL, enhance erro check by CI
* rollback to bash
* add newline at end of file
Minsoo Cheong [Mon, 25 Mar 2024 07:38:22 +0000 (16:38 +0900)]
examples : add "retrieval" (#6193)
* add `retrieval` example
* add README
* minor fixes
* cast filepos on print
* remove use of variable sized array
* store similarities in separate vector
* print error on insufficient batch size
* fix error message printing
* assign n_batch value to n_ubatch
* fix param definitions
* define retrieval-only parameters in retrieval.cpp
* fix `--context-file` option to be provided multiple times for multiple files
* use vector for `query_emb`
* add usage description in README
* fix merge conflict
* fix usage printing
* remove seed setting
* fix lint
* increase file read buffer size
* retrieval : minor
---------
Co-authored-by: Georgi Gerganov <redacted>
Justine Tunney [Mon, 25 Mar 2024 05:39:56 +0000 (01:39 -0400)]
ggml : support AVX512VNNI (#6280)
This change causes some quants (e.g. Q4_0, Q8_0) to go faster on some
architectures (e.g. AMD Zen 4).
Rick G [Sun, 24 Mar 2024 21:45:56 +0000 (14:45 -0700)]
Fix heap corruption from wmode out-of-bound writes on windows (#6272)
* would throw error on VS2022 on GGML_FREE(wmode)
* wchar_t is usually 2 bytes, but malloc wants bytes
* therefore `*wmode_p++ = (wchar_t)*mode;` could write off the end of the allocation
* Fixes error possibly introduced by https://github.com/ggerganov/llama.cpp/pull/6248
Georgi Gerganov [Sun, 24 Mar 2024 14:18:45 +0000 (16:18 +0200)]
imatrix : fix wname for mul_mat_id ops (#6271)
* imatrix : fix wname for mul_mat_id ops
* also filter tensor names in mul_mat_id ops
---------
Co-authored-by: slaren <redacted>
Johannes Gäßler [Sun, 24 Mar 2024 13:21:17 +0000 (14:21 +0100)]
Fixed lookup compilation issues on Windows (#6273)
Pierrick Hymbert [Sun, 24 Mar 2024 08:57:06 +0000 (09:57 +0100)]
ci : close inactive issue, increase operations per run (#6270)
Minsoo Cheong [Sun, 24 Mar 2024 08:54:07 +0000 (17:54 +0900)]
sampling : deduplicated code for probability distribution access (#6240)
* sampling: remove duplicated code for probability distribution access
* free original_logits
* fix original_logits allocation
* fixes based on review @cebtenzzre
* change function name to `llama_sampling_prepare`
Meng, Hengyu [Sun, 24 Mar 2024 04:04:25 +0000 (12:04 +0800)]
[SYCL] offload op (#6217)
* remove no USM methods
* leave the schedule to ggml_backend_sched entirely
Neo Zhang Jianyu [Sun, 24 Mar 2024 01:44:01 +0000 (09:44 +0800)]
Support build win release for SYCL (#6241)
* support release win
* fix value
* fix value
* fix value
* fix error
* fix error
* fix format
Jared Van Bortel [Sat, 23 Mar 2024 22:48:02 +0000 (18:48 -0400)]
use _wfopen instead of fopen on Windows (#6248)
also fix missing #defines before windows.h, and BPE LF token on MSVC
Georgi Gerganov [Sat, 23 Mar 2024 19:35:23 +0000 (21:35 +0200)]
gitignore : gguf-split
Pierrick Hymbert [Sat, 23 Mar 2024 17:07:00 +0000 (18:07 +0100)]
common: llama_load_model_from_url split support (#6192)
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
- fix header name case sensitive
- support downloading additional split in parallel
- hide password in url
* common: EOL EOF
* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition
* common: change max url max length
* common: minor comment
* server: support HF URL options
* llama: llama_model_loader fix log
* common: use a constant for max url length
* common: clean up curl if file cannot be loaded in gguf
* server: tests: add split tests, and HF options params
* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda
* server: tests: enable back Release test on PR
* spacing
Co-authored-by: Georgi Gerganov <redacted>
* spacing
Co-authored-by: Georgi Gerganov <redacted>
* spacing
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Pierrick Hymbert [Sat, 23 Mar 2024 17:00:38 +0000 (18:00 +0100)]
server: docs: `--threads` and `--threads`, `--ubatch-size`, `--log-disable` (#6254)
Julius Arkenberg [Sat, 23 Mar 2024 16:41:53 +0000 (17:41 +0100)]
llama : add grok-1 support (#6204)
* Add support for Grok model architecture
* Revert convert-hf-to-gguf to default options
* Fixed f_norm_rms_eps bug
* Fix whitespaces
* llama : fix grok rope type
* llama : minor
---------
Co-authored-by: Georgi Gerganov <redacted>
Pierrick Hymbert [Sat, 23 Mar 2024 16:18:13 +0000 (17:18 +0100)]
split: add gguf-split in the make build target (#6262)
Pierrick Hymbert [Sat, 23 Mar 2024 12:18:45 +0000 (13:18 +0100)]
server: flush stdout after logging in both text and json layout (#6253)
Johannes Gäßler [Sat, 23 Mar 2024 00:24:36 +0000 (01:24 +0100)]
lookup: complement data from context with general text statistics (#5479)
* lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
Georgi Gerganov [Fri, 22 Mar 2024 19:10:39 +0000 (21:10 +0200)]
common : default --hf-file to --model (#6234)
fraxy-v [Fri, 22 Mar 2024 18:49:06 +0000 (20:49 +0200)]
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
* convert-llama2c-to-ggml: enable conversion of multiqueries, #5608
* add test in build action
* Update build.yml
* Update build.yml
* Update build.yml
* gg patch
Kawrakow [Fri, 22 Mar 2024 18:47:14 +0000 (19:47 +0100)]
quantize: options for output and token embedding tensors qtype (#6239)
* quantize: be able to specify the output tensor type
* quantize: be able to specify the token embedding tensor type
---------
Co-authored-by: Iwan Kawrakow <redacted>
Pierrick Hymbert [Fri, 22 Mar 2024 18:00:01 +0000 (19:00 +0100)]
llama_model_loader: support multiple split/shard GGUFs (#6187)
* split: support in llama_model_loader
* avoid copying the entire vector
Co-authored-by: slaren <redacted>
* split: move llama_tensor_offset to llama_model_loader
* llama_model_loader: PR feedbacks:
- use only one gguf_context for metadata only
- store all ggml_context in a vector as the files and mappings
- store all weights in a vector along with the source tensor
- rename ctx_gguf to meta
- rename ctx_meta to contexts
* avoid copying the entire vector
* Simplify this by making these optional, switch some layer creation tensor optional
Co-authored-by: Georgi Gerganov <redacted>
* Handle optional tensors
Co-authored-by: Georgi Gerganov <redacted>
* llama_model_loader: fail if backend cannot allocate buffer
* fix mmap buffer management
* llama_model_loader: map file to backend buffer if the allocation succeeds only
* llama_model_loader: only map tensors included in the context
* llama_model_loader: minor, use same variable name for consistency, fix spacing in types cast
* llama_model_loader: fail if any of backend buffer cannot be allocated
* spacing
Co-authored-by: slaren <redacted>
* fix loop over pointer
Co-authored-by: slaren <redacted>
* llama_model_loader: if n_tensors declared not equals to loaded tensors in split, throw an exception instead of asserting
* llama_model_loader: ensure mappings vector has the expected size
* llama_model_loader: use at instead of operator[] if this should never add to the map.
* llama_model_loader: immediately add the backend buffer to the model buffers in order to free them if an error occurs in the next allocation. Reserve the expected size.
* llama_model_loader: be sure the model mappings has enough capacity before allocating backend buffer
* llama_model_loader: fix map -> unordered map
* llama_split_prefix: use a clearer version, not pass split path len but dest max len.
Co-authored-by: Xuan Son Nguyen <redacted>
* llama : minor
ggml-ci
* llama : introduce some typedef helpers
* docs: add model shard in hot topic
* llama_model_loader: put mapping in a unique_ptr from the moment it is allocated
Co-authored-by: slaren <redacted>
* fix llama_split_prefix
---------
Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Minsoo Cheong [Fri, 22 Mar 2024 17:15:06 +0000 (02:15 +0900)]
ci: apply concurrency limit for github workflows (#6243)
Georgi Gerganov [Fri, 22 Mar 2024 13:33:38 +0000 (15:33 +0200)]
common : add HF arg helpers (#6234)
* common : add HF arg helpers
* common : remove defaults
Nexesenex [Fri, 22 Mar 2024 13:32:02 +0000 (14:32 +0100)]
llama : correction of the attn.v.weight quantization for IQ3_XS (#6209)
IQ3_XS was not mentioned, IQ3_S and IQ3_M were present twice.
That PR corrects this in the manner which was probably intended initially.
Olivier Chafik [Fri, 22 Mar 2024 13:09:07 +0000 (13:09 +0000)]
tests : conditional python & node json schema tests (#6207)
* json: only attempt python & node schema conversion tests if their bins are present
Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978
disabled in https://github.com/ggerganov/llama.cpp/pull/6198
* json: orange warnings when tests skipped
* json: ensure py/js schema conv tested on ubuntu-focal-make
* json: print env vars in test
Olivier Chafik [Fri, 22 Mar 2024 13:07:44 +0000 (13:07 +0000)]
json-schema-to-grammar : fix order of props + non-str const/enum (#6232)
* json: ordered json in server/schema converter to respect orig order
* json: ws nits
* json: support non-string const / enums
slaren [Fri, 22 Mar 2024 13:05:31 +0000 (14:05 +0100)]
cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208)
* cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy
* add LLAMA_CUDA_NO_PEER_COPY to HIP build
Xiaoyi Chen [Fri, 22 Mar 2024 11:29:49 +0000 (04:29 -0700)]
readme : add RecurseChat to the list of UIs (#6219)
Jan Boon [Fri, 22 Mar 2024 11:12:05 +0000 (19:12 +0800)]
server : fix n_keep always showing as 0 in response (#6211)
Georgi Gerganov [Fri, 22 Mar 2024 11:08:28 +0000 (13:08 +0200)]
server : enable continuous batching by default (#6231)
Georgi Gerganov [Fri, 22 Mar 2024 09:35:53 +0000 (11:35 +0200)]
metal : proper assert for mat-mat memory alignment (#6225)
* metal : proper assert for mat-mat memory alignment
ggml-ci
* readme : add notice about the bug fix
* metal : fix the fix
ggml-ci
Vaibhav Srivastav [Fri, 22 Mar 2024 07:53:43 +0000 (08:53 +0100)]
ci : add CURL flag for the mac builds (#6214)
Georgi Gerganov [Fri, 22 Mar 2024 07:36:03 +0000 (09:36 +0200)]
metal : pad n_ctx by 32 (#6177)
* metal : require ne00 >= 128 for mat-mat kernels
ggml-ci
* llama : pad n_ctx by 32
ggml-ci
Neo Zhang Jianyu [Fri, 22 Mar 2024 07:19:37 +0000 (15:19 +0800)]
add blog link (#6222)
DAN™ [Fri, 22 Mar 2024 01:32:42 +0000 (21:32 -0400)]
Fix params underscore convert to dash. (#6203)
* Fix params underscore convert to dash.
* Update common/common.cpp
---------
Co-authored-by: slaren <redacted>
Jan Boon [Thu, 21 Mar 2024 22:41:24 +0000 (06:41 +0800)]
server : update readme doc from `slot_id` to `id_slot` (#6213)
slaren [Thu, 21 Mar 2024 18:54:28 +0000 (19:54 +0100)]
cuda : disable host register by default (#6206)
semidark [Thu, 21 Mar 2024 17:52:35 +0000 (11:52 -0600)]
Corrected typo to wrong file (#6199)
The stated file `./devops/main-server.Dockerfile` does not exist. I figure that `.devops/server-intel.Dockerfile` was meant.
Georgi Gerganov [Thu, 21 Mar 2024 14:20:05 +0000 (16:20 +0200)]
tests : disable system() calls (#6198)
ggml-ci
slaren [Thu, 21 Mar 2024 12:59:53 +0000 (13:59 +0100)]
cuda : fix LLAMA_CUDA_F16 build (#6197)
Kawrakow [Thu, 21 Mar 2024 12:59:38 +0000 (13:59 +0100)]
ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196)
* Make quantize_row_iq4_nl do the same thing is quantization on CUDA
* Make quantize_row_iq4_nl do the same thing is quantization on CUDA
This time for real. backend-ops tests pass.
* Now fix test-quantize-fns
---------
Co-authored-by: Iwan Kawrakow <redacted>
Olivier Chafik [Thu, 21 Mar 2024 11:50:43 +0000 (11:50 +0000)]
json-schema-to-grammar improvements (+ added to server) (#5978)
* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
Vaibhav Srivastav [Thu, 21 Mar 2024 09:30:40 +0000 (10:30 +0100)]
ci : fix indentation error (#6195)
Vaibhav Srivastav [Thu, 21 Mar 2024 09:13:12 +0000 (10:13 +0100)]
build : add mac pre-build binaries (#6182)
* Initial commit - add mac prebuilds.
* forward contribution credits for building the workflow.
* minor : remove trailing whitespaces
---------
Co-authored-by: Nicolas Patry <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Kawrakow [Thu, 21 Mar 2024 07:27:57 +0000 (08:27 +0100)]
Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
* k_cache: be able to use Q5_0
* k_cache: be able to use Q5_1 on CODA
* k_cache: be able to use Q5_0 on Metal
* k_cache: be able to use Q5_1 on Metal
* k_cache: be able to use IQ4_NL - just CUDA for now
* k_cache: be able to use IQ4_NL on Metal
* k_cache: add newly added supported types to llama-bench and CUDA supports_op
---------
Co-authored-by: Iwan Kawrakow <redacted>
AidanBeltonS [Thu, 21 Mar 2024 06:10:52 +0000 (06:10 +0000)]
Add nvidia and amd backends (#6157)
slaren [Thu, 21 Mar 2024 00:47:46 +0000 (01:47 +0100)]
cuda : fix conflict with std::swap (#6186)
slaren [Wed, 20 Mar 2024 20:03:26 +0000 (21:03 +0100)]
cuda : print the returned error when CUDA initialization fails (#6185)
Ziang Wu [Wed, 20 Mar 2024 15:29:51 +0000 (23:29 +0800)]
llava : update MobileVLM-README.md (#6180)
Ziang Wu [Wed, 20 Mar 2024 15:02:32 +0000 (23:02 +0800)]
llava : add MobileVLM_V2 backup (#6175)
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <redacted>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <redacted>
* clip : fix whitespace
* fix deifinition mistake in clip.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
slaren [Wed, 20 Mar 2024 13:42:59 +0000 (14:42 +0100)]
cuda : refactor to remove global resources (#6170)
* cuda : refactor to remove global resources
Xuan Son Nguyen [Wed, 20 Mar 2024 12:30:36 +0000 (13:30 +0100)]
Server: version bump for httplib and json (#6169)
* server: version bump for httplib and json
* fix build
* bring back content_length
Georgi Gerganov [Wed, 20 Mar 2024 12:17:34 +0000 (14:17 +0200)]
gitignore : ignore curl-related files
Georgi Gerganov [Wed, 20 Mar 2024 12:14:32 +0000 (14:14 +0200)]
server : allow to override -ngl in tests (#6170)
Georgi Gerganov [Wed, 20 Mar 2024 11:29:49 +0000 (13:29 +0200)]
Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)"
This reverts commit
f8c4e745e1e728204ab26dbadf52853545e6789c .
Ziang Wu [Wed, 20 Mar 2024 11:20:37 +0000 (19:20 +0800)]
llava : add a MobileVLM_V2-1.7B backup (#6152)
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <redacted>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <redacted>
* clip : fix whitespace
---------
Co-authored-by: Georgi Gerganov <redacted>
Karthick [Wed, 20 Mar 2024 11:02:34 +0000 (16:32 +0530)]
Server: Handle n_keep parameter in the request (#6174)
Jared Van Bortel [Wed, 20 Mar 2024 05:33:49 +0000 (01:33 -0400)]
server tests : more pythonic process management; fix bare `except:` (#6146)
* server tests : remove seemingly redundant newlines in print()
* server tests : use built-in subprocess features, not os.kill and psutil
* server tests : do not catch e.g. SystemExit; use print_exc
* server tests: handle TimeoutExpired exception
* server tests: fix connect on dual-stack systems
* server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127)
* server: tests: remove the hack on windows since now we get the good socket family
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)
* server: tests: add new tokens regex following new repeat penalties default changed in (#6127)
---------
Co-authored-by: Pierrick HYMBERT <redacted>
Neo Zhang Jianyu [Wed, 20 Mar 2024 03:21:41 +0000 (11:21 +0800)]
update readme sycl for new update (#6151)
* update readme sycl for new update
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: Abhilash Majumder <redacted>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <redacted>
* Update README-sycl.md
Co-authored-by: AidanBeltonS <redacted>
* update by review comments
* update w64devkit link
* update for verify device id part
* Update README-sycl.md
Co-authored-by: Meng, Hengyu <redacted>
---------
Co-authored-by: Abhilash Majumder <redacted>
Co-authored-by: AidanBeltonS <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Abhilash Majumder [Wed, 20 Mar 2024 02:58:49 +0000 (08:28 +0530)]
increase igpu cluster limit (#6159)
DAN™ [Tue, 19 Mar 2024 16:16:09 +0000 (12:16 -0400)]
Remove undeed header file. (#6158)
Pierrick Hymbert [Tue, 19 Mar 2024 11:05:44 +0000 (12:05 +0100)]
gguf-split: split and merge gguf per batch of tensors (#6135)
* gguf-split: split and merge gguf files per tensor
* gguf-split: build with make toolchain
* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split
* split : minor style + fix compile warnings
* gguf-split: remove --upload not implemented
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Tue, 19 Mar 2024 08:21:54 +0000 (10:21 +0200)]
common : disable repeat penalties by default (#6127)
slaren [Tue, 19 Mar 2024 08:06:54 +0000 (09:06 +0100)]
ci : exempt some labels from being tagged as stale (#6140)
DAN™ [Tue, 19 Mar 2024 05:59:36 +0000 (01:59 -0400)]
common : print usage on '-h' and '--help' (#6145)
github-actions[bot] [Sun, 17 Mar 2024 06:37:44 +0000 (06:37 +0000)]
flake.lock: Update
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/
9df3e30ce24fd28c7b3e2de0d986769db5d6225d ' (2024-03-06)
→ 'github:NixOS/nixpkgs/
d691274a972b3165335d261cc4671335f5c67de9 ' (2024-03-14)
Jared Van Bortel [Mon, 18 Mar 2024 16:49:02 +0000 (12:49 -0400)]
mpt : implement backwards compatiblity with duped output tensor (#6139)
Felix [Mon, 18 Mar 2024 15:40:22 +0000 (16:40 +0100)]
clip : fix memory leak (#6138)
slaren [Mon, 18 Mar 2024 15:33:44 +0000 (16:33 +0100)]
backend : set max split inputs to GGML_MAX_SRC (#6137)
Georgi Gerganov [Mon, 18 Mar 2024 11:45:38 +0000 (13:45 +0200)]
ci : disable stale issue messages (#6126)
Georgi Gerganov [Mon, 18 Mar 2024 11:45:27 +0000 (13:45 +0200)]
ci : temporary disable sanitizer builds (#6128)
slaren [Mon, 18 Mar 2024 10:03:04 +0000 (11:03 +0100)]
backend : offload large batches to GPU (#6083)
* backend : offload large batches to GPU
* fix hip
* code cleanup
* fix CUDA split buffers
* Update ggml-backend-impl.h
Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix memset without set_device
* imatrix : remove sched affix from weight names
* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup
* update backends
ggml-ci
---------
Co-authored-by: Johannes Gäßler <redacted>
DAN™ [Mon, 18 Mar 2024 08:27:44 +0000 (04:27 -0400)]
common : tidy-up argument parsing (#6105)
* Tidy-up argument parsing.
* Missing ref.
* common : minor
* common : add static classifier
---------
Co-authored-by: Georgi Gerganov <redacted>
Thérence [Mon, 18 Mar 2024 08:17:00 +0000 (09:17 +0100)]
convert : add support for CamembertModel architecture (#6119)
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
Romain D [Mon, 18 Mar 2024 08:04:41 +0000 (09:04 +0100)]
convert : use f32 outtype for bf16 tensors (#6106)
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
Pierrick Hymbert [Sun, 17 Mar 2024 18:12:37 +0000 (19:12 +0100)]
common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Sun, 17 Mar 2024 17:51:57 +0000 (19:51 +0200)]
ci : close all stale issues at once (#6115)
GainLee [Sun, 17 Mar 2024 17:12:22 +0000 (01:12 +0800)]
ggml:fix finding transfer queue family index error (#6094)
Co-authored-by: GainLee <redacted>
AmirAli Mirian [Sat, 16 Mar 2024 15:52:02 +0000 (11:52 -0400)]
ggml : add AVX512F SIMD (#6088)
Daniel Bevenius [Sat, 16 Mar 2024 15:46:29 +0000 (16:46 +0100)]
gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm
This commit adds a suggestion for an initial README.md for the gritlm
example.
Signed-off-by: Daniel Bevenius <redacted>
* squash! gritlm: add initial README.md to examples/gritlm
Use the `scripts/hf.sh` script to download the model file.
Signed-off-by: Daniel Bevenius <redacted>
* squash! gritlm: add initial README.md to examples/gritlm
Fix editorconfig-checker error in examples/gritlm/README.md.
Signed-off-by: Daniel Bevenius <redacted>
---------
Signed-off-by: Daniel Bevenius <redacted>
Xuan Son Nguyen [Sat, 16 Mar 2024 15:42:08 +0000 (16:42 +0100)]
readme : add wllama as a wasm binding (#6100)
DAN™ [Sat, 16 Mar 2024 15:39:15 +0000 (11:39 -0400)]
common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.
* Revert back and remove else's.
* Add flag to track found arguments.
Pierrick Hymbert [Sat, 16 Mar 2024 12:20:53 +0000 (13:20 +0100)]
ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow
* ci: close issue, change workflow schedule time
slaren [Fri, 15 Mar 2024 21:14:16 +0000 (22:14 +0100)]
llama : fix Baichuan2 13B (#6092)
Theia Vogel [Fri, 15 Mar 2024 20:43:02 +0000 (13:43 -0700)]
llama : add support for control vectors (#5970)
* control vector api and implementation
* control-vectors : minor code style updates
* disable control vector when data == nullptr
use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)
---------
Co-authored-by: Georgi Gerganov <redacted>
Andrew Canis [Fri, 15 Mar 2024 20:41:22 +0000 (16:41 -0400)]
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
https://huggingface.co/CohereForAI/c4ai-command-r-v01
Based on the llama2 model with a few changes:
1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used
Find GGUF files here:
https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF
To convert model to GGUF format yourself:
1) Download Command-R Hugging Face safetensors:
git lfs install
git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01
2) Run:
python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
Ting Lou [Fri, 15 Mar 2024 14:31:05 +0000 (22:31 +0800)]
llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <redacted>