]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Kawrakow [Sun, 20 Aug 2023 13:44:46 +0000 (16:44 +0300)]
More efficient Hellaswag implementation (#2677)
Co-authored-by: Iwan Kawrakow <redacted>
Georgi Gerganov [Fri, 18 Aug 2023 21:45:36 +0000 (00:45 +0300)]
server : better default prompt (#2646)
Jhen-Jie Hong [Fri, 18 Aug 2023 21:41:32 +0000 (05:41 +0800)]
server : update xxd usage for older versions compatibility (#2649)
* server : update xxd usage for older versions compatibility
* remove unused $func
Adrian [Fri, 18 Aug 2023 19:39:22 +0000 (12:39 -0700)]
Add link to clojure bindings to Readme. (#2659)
Georgi Gerganov [Fri, 18 Aug 2023 14:48:31 +0000 (17:48 +0300)]
readme : incoming BREAKING CHANGE
slaren [Fri, 18 Aug 2023 10:44:58 +0000 (12:44 +0200)]
llama : add benchmark example (#2626)
* llama : add benchmark example
* add to examples CMakeLists.txt
* fix msvc build
* add missing include
* add Bessel's correction to stdev calculation
Co-authored-by: Johannes Gäßler <redacted>
* improve markdown formatting
* add missing include
* print warning is NDEBUG is not defined
* remove n_prompt and n_gen from the matrix, use each value separately instead
* better checks for non-optimized builds
* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call
* fix json formatting
* add sql output
* add basic cpu and gpu info (linx/cuda only)
* markdown: also show values that differ from the default
* markdown: add build id
* cleanup
* improve formatting
* formatting
---------
Co-authored-by: Johannes Gäßler <redacted>
mdrokz [Fri, 18 Aug 2023 10:17:58 +0000 (15:47 +0530)]
readme : add link to Rust bindings (#2656)
Georgi Gerganov [Fri, 18 Aug 2023 09:48:55 +0000 (12:48 +0300)]
perplexity : more meaningful ETA number - 2 decimal points
Evan Jones [Thu, 17 Aug 2023 23:54:44 +0000 (19:54 -0400)]
Fix unicode in grammars (fixes #2501) (#2553)
* Fix unicode in grammars (fixes #2501)
* add more comments
* fix test-llama-grammar
staviq [Thu, 17 Aug 2023 23:34:01 +0000 (23:34 +0000)]
server : support for saving templates in browser LocalStorage (#2486)
* support for templates in browser LocalStorage
* sync accepted #2409 fix from upstream
* convert autosave invocation to useEffect
* Apply suggestions from code review
Co-authored-by: Jhen-Jie Hong <redacted>
* Regen index.html.cpp, suggested from code review
---------
Co-authored-by: Jhen-Jie Hong <redacted>
Johannes Gäßler [Thu, 17 Aug 2023 21:57:59 +0000 (23:57 +0200)]
README: fix LLAMA_CUDA_MMV_Y documentation (#2647)
Henri Vasserman [Thu, 17 Aug 2023 20:11:18 +0000 (23:11 +0300)]
[Zig] Fixing Zig build and improvements (#2554)
* Fix zig after console.o was split
* Better include and flag management
* Change LTO to option
Kerfuffle [Thu, 17 Aug 2023 13:29:44 +0000 (07:29 -0600)]
Add --cfg-negative-prompt-file option for examples (#2591)
Add --cfg-negative-prompt-file option for examples
Georgi Gerganov [Thu, 17 Aug 2023 07:47:09 +0000 (10:47 +0300)]
llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)
ggml-ci
drbh [Thu, 17 Aug 2023 07:41:01 +0000 (03:41 -0400)]
tests : adds simple llama grammar tests (#2618)
* adds simple llama grammar tests
* fix lint and add Makefile
* 0 terminate code_points
* avoid dangling pointers in candidate cleanup
* cleanup grammar at end of test
Shouzheng Liu [Thu, 17 Aug 2023 07:35:53 +0000 (03:35 -0400)]
ggml-alloc : fix discrepency between measure&eval (#2639)
The GGML memory allocator consistently places a tensor within the
optimal-fit memory block, which is the smallest block capable of
accommodating the tensor's size. During the measurement phase, the final
block is generously sized, ensuring it never qualifies as the
optimal-fit block as long as there exists another block capable of
accommodating the tensor. Nevertheless, in the evaluation phase, the
last block is constrained in size and could potentially qualify as the
optimal-fit block. Consequently, there exists the possibility of a
tensor being allocated to a different region during evaluation, leading
to more memory fragmentation in our scratch buffer.
This recent commit guarantees uniform behavior of the allocator across
both the measurement and evaluation phases, eliminating discrepancies
between the two.
Kolen Cheung [Wed, 16 Aug 2023 20:09:49 +0000 (21:09 +0100)]
cmake : install ggml-meta.metal if LLAMA_METAL (#2449)
Jhen-Jie Hong [Wed, 16 Aug 2023 20:09:03 +0000 (04:09 +0800)]
metal : print error of load pipeline state (#2564)
* metal : print error of load pipeline state
* metal : return null if load pipeline failed
Shouzheng Liu [Wed, 16 Aug 2023 20:08:28 +0000 (16:08 -0400)]
metal : enable ggml-alloc (#2627)
* metal: enable ggml-alloc
Make ggml-alloc work with concurrently dispatch.
* style-fix
Co-authored-by: slaren <redacted>
---------
Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Shouzheng Liu [Wed, 16 Aug 2023 20:07:04 +0000 (16:07 -0400)]
metal : matrix-matrix multiplication kernel (#2615)
* metal: matrix-matrix multiplication kernel
This commit removes MPS and uses custom matrix-matrix multiplication
kernels for all quantization types. This commit also adds grouped-query
attention to support llama2 70B.
* metal: fix performance degradation from gqa
Integers are slow on the GPU, and 64-bit divides are extremely slow.
In the context of GQA, we introduce a 64-bit divide that cannot be
optimized out by the compiler, which results in a decrease of ~8% in
inference performance. This commit fixes that issue by calculating a
part of the offset with a 32-bit divide. Naturally, this limits the
size of a single matrix to ~4GB. However, this limitation should
suffice for the near future.
* metal: fix bugs for GQA and perplexity test.
I mixed up ne02 and nb02 in previous commit.
Georgi Gerganov [Tue, 15 Aug 2023 07:04:58 +0000 (10:04 +0300)]
scripts : add helper script to get wikitext
Jhen-Jie Hong [Mon, 14 Aug 2023 22:14:14 +0000 (06:14 +0800)]
server : add missing /json-schema-to-grammar.mjs (#2616)
fixes #2611
Jhen-Jie Hong [Mon, 14 Aug 2023 13:37:39 +0000 (21:37 +0800)]
metal : return null instead of exit(1) (#2573)
Cheng Shao [Mon, 14 Aug 2023 13:36:42 +0000 (15:36 +0200)]
server : add --numa support (#2524)
Kamil Tomšík [Mon, 14 Aug 2023 13:35:16 +0000 (15:35 +0200)]
llama : add missing enum keyword in function signatures (#2610)
Johannes Gäßler [Mon, 14 Aug 2023 08:41:22 +0000 (10:41 +0200)]
CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596)
Jhen-Jie Hong [Mon, 14 Aug 2023 08:20:17 +0000 (16:20 +0800)]
server : fix default grammar by use empty string in the UI (#2604)
Jhen-Jie Hong [Mon, 14 Aug 2023 07:16:54 +0000 (15:16 +0800)]
server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588)
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration
vxiiduu [Mon, 14 Aug 2023 03:59:16 +0000 (13:59 +1000)]
Enhance Windows 7 and below compatibility. (#2592)
* Enhance Windows 7 compatibility.
* Clean away unnecessary preprocessor conditional
drbh [Sun, 13 Aug 2023 14:00:48 +0000 (10:00 -0400)]
test : add simple grammar parsing tests (#2594)
* adds simple grammar parsing tests
* adds cassert header
Johannes Gäßler [Sat, 12 Aug 2023 22:24:45 +0000 (00:24 +0200)]
CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590)
byte-6174 [Fri, 11 Aug 2023 23:17:25 +0000 (19:17 -0400)]
Adding support for llama2.c models (#2559)
Equim [Fri, 11 Aug 2023 22:35:14 +0000 (06:35 +0800)]
server: fixed wrong variable name in timing json (#2579)
* server: fixed wrong variable name in timing json
* remove redunct entry
DannyDaemonic [Thu, 10 Aug 2023 20:11:36 +0000 (13:11 -0700)]
Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.
Christian Demsar [Thu, 10 Aug 2023 14:28:27 +0000 (10:28 -0400)]
Add --n-predict -2 for stopping generation on full context (#2565)
Martin Krasser [Thu, 10 Aug 2023 10:16:38 +0000 (12:16 +0200)]
Fix grammar-based sampling issue in server (#2566)
Sam Spilsbury [Wed, 9 Aug 2023 20:47:42 +0000 (23:47 +0300)]
ggml-alloc: Don't try to re-use buffers of external tensors (#2562)
* ggml-alloc: Don't try to re-use buffers of external tensors
They might be weights that came from another context, so we
have no control over them (and they might be re-used elsewhere
so writing to them would be a bad idea).
* ggml-alloc: >= when checking for out-of-bounds
Co-authored-by: slaren <redacted>
---------
Co-authored-by: slaren <redacted>
grahameth [Wed, 9 Aug 2023 20:46:40 +0000 (22:46 +0200)]
add log_callback to llama_context_params for custom logging. (#2234)
* add log_callback to llama_context_params for custom logging.
* Fix macro expansion on gcc
* Add struct llama_state for global variables and move log_callback there
* Turn log level into enum and some minor changes.
* Remove model_for_logging parameter (not needed anymore)
* Convert remaining fprintf(stderr, ...) calls to use new macros.
* Fix enum and initialize g_state
* Fix log calls after merge
* Fix missing static
* Add back all the new lines in the logging strings
* Add comment for llama_log_callback and replace remaining printf calls
---------
Co-authored-by: grahameth <->
Co-authored-by: Helmut <redacted>
Johannes Gäßler [Wed, 9 Aug 2023 07:42:34 +0000 (09:42 +0200)]
CUDA: tuned mul_mat_q kernels (#2546)
Martin Krasser [Tue, 8 Aug 2023 13:29:19 +0000 (15:29 +0200)]
Allow passing grammar to completion endpoint (#2532)
* Allow passing grammar to completion endpoint
Johannes Gäßler [Tue, 8 Aug 2023 12:38:16 +0000 (14:38 +0200)]
CUDA: tighter VRAM scratch size for 65b/70b (#2551)
chaihahaha [Tue, 8 Aug 2023 12:07:02 +0000 (20:07 +0800)]
llm.vim : multiline autocompletion, get rid of "^@" (#2543)
Georgi Gerganov [Tue, 8 Aug 2023 12:05:30 +0000 (15:05 +0300)]
vim : bring back simple llm.vim example
AustinMroz [Tue, 8 Aug 2023 11:44:48 +0000 (06:44 -0500)]
vim : streaming and more (#2495)
* Update Vim plugin
* Remove getbufoneline usage, Add input bind example.
getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.
An additional example that explains how to add a keybind that works in
insert mode was added.
klosax [Mon, 7 Aug 2023 17:07:19 +0000 (19:07 +0200)]
Add --rope-scale parameter (#2544)
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
Georgi Gerganov [Mon, 7 Aug 2023 11:25:58 +0000 (14:25 +0300)]
ggml : mul mat tweaks (#2372)
* ggml : mul mat wip
ggml-ci
* ggml : alternative thread distribution for mul_mat
ggml-ci
* ggml : mul_mat block tiling attempt
* ggml : mul_mat threads yield
ggml-ci
Georgi Gerganov [Mon, 7 Aug 2023 11:24:42 +0000 (14:24 +0300)]
ggml : pad result of ggml_nbytes()
Georgi Gerganov [Mon, 7 Aug 2023 10:55:18 +0000 (13:55 +0300)]
ggml : change params pointer (style change) (#2539)
ggml-ci
Georgi Gerganov [Mon, 7 Aug 2023 10:20:09 +0000 (13:20 +0300)]
ggml : sync (custom ops) (#2537)
ggml-ci
Johannes Gäßler [Mon, 7 Aug 2023 08:09:40 +0000 (10:09 +0200)]
Fixed mmap prefetch for GPU offloading (#2529)
Georgi Gerganov [Mon, 7 Aug 2023 07:52:57 +0000 (10:52 +0300)]
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
* metal : fix out-of-bounds access + style changes
* metal : increase concurrency nodes to 2*GGML_MAX_NODES
GiviMAD [Mon, 7 Aug 2023 06:21:46 +0000 (23:21 -0700)]
[Makefile] Move ARM CFLAGS before compilation (#2536)
Henri Vasserman [Mon, 7 Aug 2023 05:35:53 +0000 (08:35 +0300)]
[Zig] Rewrite build for Zig 0.11 (#2514)
* zig build fixes
* Disable LTO on Windows.
DannyDaemonic [Sun, 6 Aug 2023 06:49:34 +0000 (23:49 -0700)]
console : fix issue related to Windows 11 PowerShell console mode persistence (#2521)
Keiichi Tabata [Sun, 6 Aug 2023 06:34:05 +0000 (15:34 +0900)]
convert.py : add missing abstract methods for quantized data (#2491)
Johannes Gäßler [Sat, 5 Aug 2023 16:20:44 +0000 (18:20 +0200)]
CUDA: faster k-quant mul_mat_q kernels (#2525)
Jonas Wunderlich [Fri, 4 Aug 2023 20:16:11 +0000 (20:16 +0000)]
fix firefox autoscroll (#2519)
Cebtenzzre [Fri, 4 Aug 2023 19:00:57 +0000 (15:00 -0400)]
server: regenerate completion.js.hpp (#2515)
Cebtenzzre [Fri, 4 Aug 2023 15:35:22 +0000 (11:35 -0400)]
CUDA: use min compute capability of GPUs actually used (#2506)
Cebtenzzre [Fri, 4 Aug 2023 15:34:32 +0000 (11:34 -0400)]
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
Fixes #2503
DannyDaemonic [Fri, 4 Aug 2023 15:20:12 +0000 (08:20 -0700)]
Add --simple-io option for subprocesses and break out console.h and cpp (#1558)
Stephen Nichols [Fri, 4 Aug 2023 11:37:24 +0000 (06:37 -0500)]
Fixing race condition in server and partial stream handling in frontend. (#2391)
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
l3utterfly [Fri, 4 Aug 2023 11:29:52 +0000 (19:29 +0800)]
Stream save llama context data to file instead of allocating entire buffer upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <redacted>
* Apply code review suggestions
Co-authored-by: slaren <redacted>
---------
Co-authored-by: slaren <redacted>
Borislav Stanimirov [Fri, 4 Aug 2023 10:07:21 +0000 (13:07 +0300)]
build : fix several cast and printf warnings (#2499)
Evan Jones [Thu, 3 Aug 2023 02:05:44 +0000 (22:05 -0400)]
examples : generate JSON according to schema (#1887)
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
Johannes Gäßler [Wed, 2 Aug 2023 16:04:04 +0000 (18:04 +0200)]
CUDA: faster non k-quant mul_mat_q kernels (#2483)
Johannes Gäßler [Wed, 2 Aug 2023 14:48:10 +0000 (16:48 +0200)]
CUDA: Fix models with output size != 32000 (#2480)
ldwang [Wed, 2 Aug 2023 08:21:11 +0000 (16:21 +0800)]
readme : add Aquila-7B model series to supported models (#2487)
* support bpe tokenizer in convert
Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert
Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <redacted>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <redacted>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <redacted>
---------
Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>
Eve [Wed, 2 Aug 2023 08:06:19 +0000 (04:06 -0400)]
tests : Fix compilation warnings (Linux/GCC) (#2451)
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
Yiming Cui [Wed, 2 Aug 2023 06:18:31 +0000 (14:18 +0800)]
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
* add support for chinese llama-2 / alpaca-2
* remove white spaces
Bono Lv [Tue, 1 Aug 2023 12:54:28 +0000 (20:54 +0800)]
fix a typo in examples/server/README.md (#2478)
ebraminio [Tue, 1 Aug 2023 08:56:23 +0000 (01:56 -0700)]
server : Support dark mode (#2414)
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
Matteo Boschini [Tue, 1 Aug 2023 07:43:12 +0000 (09:43 +0200)]
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <redacted>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <redacted>
Johannes Gäßler [Mon, 31 Jul 2023 19:02:19 +0000 (21:02 +0200)]
CUDA: fixed LLAMA_FAST compilation option (#2473)
Johannes Gäßler [Mon, 31 Jul 2023 17:52:22 +0000 (19:52 +0200)]
CUDA: fixed cmake F16 option (#2471)
Johannes Gäßler [Mon, 31 Jul 2023 13:44:35 +0000 (15:44 +0200)]
CUDA: mmq CLI option, fixed mmq build issues (#2453)
Johannes Gäßler [Mon, 31 Jul 2023 12:32:30 +0000 (14:32 +0200)]
CUDA: Implemented row flattening for non-glm RoPE (#2468)
Johannes Gäßler [Mon, 31 Jul 2023 11:18:51 +0000 (13:18 +0200)]
CUDA: fewer memory bank conflicts for mul_mat_q (#2458)
slaren [Mon, 31 Jul 2023 09:02:53 +0000 (11:02 +0200)]
Fix Metal backend broken from the allocator changes (#2455)
* fix Metal backend broken from the allocator changes
slaren [Sun, 30 Jul 2023 13:58:01 +0000 (15:58 +0200)]
ggml : add graph tensor allocator (#2411)
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
Johannes Gäßler [Sat, 29 Jul 2023 21:04:44 +0000 (23:04 +0200)]
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds
Johannes Gäßler [Sat, 29 Jul 2023 21:04:10 +0000 (23:04 +0200)]
CUDA: faster multi GPU synchronization (#2448)
klosax [Fri, 28 Jul 2023 18:25:36 +0000 (20:25 +0200)]
perplexity : add Hellaswag calculation (#2389)
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
Lee [Fri, 28 Jul 2023 18:17:45 +0000 (02:17 +0800)]
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405)
eric8607242 [Fri, 28 Jul 2023 18:10:05 +0000 (02:10 +0800)]
llama : support more diverse tokenizers? (#2420)
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 28 Jul 2023 18:05:08 +0000 (21:05 +0300)]
examples : fix whitespace
nhamanasu [Fri, 28 Jul 2023 18:02:10 +0000 (03:02 +0900)]
examples : server chat mode with llama2 (#2400)
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
Weird Constructor [Fri, 28 Jul 2023 08:44:43 +0000 (10:44 +0200)]
readme : fix the description of the Tail free sampling (TFS) method (#2431)
Rand Xie [Fri, 28 Jul 2023 08:42:53 +0000 (01:42 -0700)]
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)
niansa/tuxifan [Fri, 28 Jul 2023 01:14:11 +0000 (03:14 +0200)]
Obtaining LLaMA 2 instructions (#2308)
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
mj-shifu [Thu, 27 Jul 2023 20:39:17 +0000 (22:39 +0200)]
convert.py : Update to support 70B HF format model files (#2427)
* convert.py : fix llama 2 70b conversion from Huggingface
Georgi Gerganov [Thu, 27 Jul 2023 08:00:54 +0000 (11:00 +0300)]
metal : disable graph concurrency optimization due to bug (#2413)
slaren [Wed, 26 Jul 2023 21:57:23 +0000 (23:57 +0200)]
ggml : fix assert in ggml_set_unary_op (#2410)
Cebtenzzre [Wed, 26 Jul 2023 18:00:04 +0000 (14:00 -0400)]
make : build with -Wmissing-prototypes (#2394)
slaren [Wed, 26 Jul 2023 13:56:53 +0000 (15:56 +0200)]
ggml : allocate graphs in a context (#2392)
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <redacted>
Kawrakow [Tue, 25 Jul 2023 15:35:53 +0000 (18:35 +0300)]
Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)
Co-authored-by: Iwan Kawrakow <redacted>
slaren [Tue, 25 Jul 2023 14:20:12 +0000 (16:20 +0200)]
ggml : fix ggml_flash_attn to use op_params (#2387)
* ggml : fix ggml_flash_attn to use op_params
ldwang [Tue, 25 Jul 2023 13:22:09 +0000 (21:22 +0800)]
convert.py : support bpe tokenizer (#2228)
* support bpe tokenizer in convert
Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert
Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <redacted>
---------
Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>
Jiahao Li [Tue, 25 Jul 2023 12:58:32 +0000 (20:58 +0800)]
ggml : relax contiguous constraints in activation function (#2371)
slaren [Tue, 25 Jul 2023 12:32:20 +0000 (14:32 +0200)]
ggml : improve graph build time via hash table lookup (#2329)
* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead