git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

commit | commitdiff | tree

Kawrakow [Sun, 20 Aug 2023 13:44:46 +0000 (16:44 +0300)]

More efficient Hellaswag implementation (#2677)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 18 Aug 2023 21:45:36 +0000 (00:45 +0300)]

server : better default prompt (#2646)

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 18 Aug 2023 21:41:32 +0000 (05:41 +0800)]

server : update xxd usage for older versions compatibility (#2649)

* server : update xxd usage for older versions compatibility

* remove unused $func

commit | commitdiff | tree

Adrian [Fri, 18 Aug 2023 19:39:22 +0000 (12:39 -0700)]

Add link to clojure bindings to Readme. (#2659)

commit | commitdiff | tree

Georgi Gerganov [Fri, 18 Aug 2023 14:48:31 +0000 (17:48 +0300)]

readme : incoming BREAKING CHANGE

commit | commitdiff | tree

slaren [Fri, 18 Aug 2023 10:44:58 +0000 (12:44 +0200)]

llama : add benchmark example (#2626)

* llama : add benchmark example

* add to examples CMakeLists.txt

* fix msvc build

* add missing include

* add Bessel's correction to stdev calculation

Co-authored-by: Johannes Gäßler <redacted>
* improve markdown formatting

* add missing include

* print warning is NDEBUG is not defined

* remove n_prompt and n_gen from the matrix, use each value separately instead

* better checks for non-optimized builds

* llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call

* fix json formatting

* add sql output

* add basic cpu and gpu info (linx/cuda only)

* markdown: also show values that differ from the default

* markdown: add build id

* cleanup

* improve formatting

* formatting

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

mdrokz [Fri, 18 Aug 2023 10:17:58 +0000 (15:47 +0530)]

readme : add link to Rust bindings (#2656)

commit | commitdiff | tree

Georgi Gerganov [Fri, 18 Aug 2023 09:48:55 +0000 (12:48 +0300)]

perplexity : more meaningful ETA number - 2 decimal points

commit | commitdiff | tree

Evan Jones [Thu, 17 Aug 2023 23:54:44 +0000 (19:54 -0400)]

Fix unicode in grammars (fixes #2501) (#2553)

* Fix unicode in grammars (fixes #2501)

* add more comments

* fix test-llama-grammar

commit | commitdiff | tree

staviq [Thu, 17 Aug 2023 23:34:01 +0000 (23:34 +0000)]

server : support for saving templates in browser LocalStorage (#2486)

* support for templates in browser LocalStorage

* sync accepted #2409 fix from upstream

* convert autosave invocation to useEffect

* Apply suggestions from code review

Co-authored-by: Jhen-Jie Hong <redacted>
* Regen index.html.cpp, suggested from code review

---------

Co-authored-by: Jhen-Jie Hong <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 17 Aug 2023 21:57:59 +0000 (23:57 +0200)]

README: fix LLAMA_CUDA_MMV_Y documentation (#2647)

commit | commitdiff | tree

Henri Vasserman [Thu, 17 Aug 2023 20:11:18 +0000 (23:11 +0300)]

[Zig] Fixing Zig build and improvements (#2554)

* Fix zig after console.o was split

* Better include and flag management

* Change LTO to option

commit | commitdiff | tree

Kerfuffle [Thu, 17 Aug 2023 13:29:44 +0000 (07:29 -0600)]

Add --cfg-negative-prompt-file option for examples (#2591)

Add --cfg-negative-prompt-file option for examples

commit | commitdiff | tree

Georgi Gerganov [Thu, 17 Aug 2023 07:47:09 +0000 (10:47 +0300)]

llama : replace (permute + reshape + view_1d) with (view_3d) (#2538)

ggml-ci

commit | commitdiff | tree

drbh [Thu, 17 Aug 2023 07:41:01 +0000 (03:41 -0400)]

tests : adds simple llama grammar tests (#2618)

* adds simple llama grammar tests

* fix lint and add Makefile

* 0 terminate code_points

* avoid dangling pointers in candidate cleanup

* cleanup grammar at end of test

commit | commitdiff | tree

Shouzheng Liu [Thu, 17 Aug 2023 07:35:53 +0000 (03:35 -0400)]

ggml-alloc : fix discrepency between measure&eval (#2639)

The GGML memory allocator consistently places a tensor within the
optimal-fit memory block, which is the smallest block capable of
accommodating the tensor's size. During the measurement phase, the final
block is generously sized, ensuring it never qualifies as the
optimal-fit block as long as there exists another block capable of
accommodating the tensor. Nevertheless, in the evaluation phase, the
last block is constrained in size and could potentially qualify as the
optimal-fit block. Consequently, there exists the possibility of a
tensor being allocated to a different region during evaluation, leading
to more memory fragmentation in our scratch buffer.

This recent commit guarantees uniform behavior of the allocator across
both the measurement and evaluation phases, eliminating discrepancies
between the two.

commit | commitdiff | tree

Kolen Cheung [Wed, 16 Aug 2023 20:09:49 +0000 (21:09 +0100)]

cmake : install ggml-meta.metal if LLAMA_METAL (#2449)

commit | commitdiff | tree

Jhen-Jie Hong [Wed, 16 Aug 2023 20:09:03 +0000 (04:09 +0800)]

metal : print error of load pipeline state (#2564)

* metal : print error of load pipeline state

* metal : return null if load pipeline failed

commit | commitdiff | tree

Shouzheng Liu [Wed, 16 Aug 2023 20:08:28 +0000 (16:08 -0400)]

metal : enable ggml-alloc (#2627)

* metal: enable ggml-alloc

Make ggml-alloc work with concurrently dispatch.

* style-fix

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Shouzheng Liu [Wed, 16 Aug 2023 20:07:04 +0000 (16:07 -0400)]

metal : matrix-matrix multiplication kernel (#2615)

* metal: matrix-matrix multiplication kernel

This commit removes MPS and uses custom matrix-matrix multiplication
kernels for all quantization types. This commit also adds grouped-query
attention to support llama2 70B.

* metal: fix performance degradation from gqa

Integers are slow on the GPU, and 64-bit divides are extremely slow.
In the context of GQA, we introduce a 64-bit divide that cannot be
optimized out by the compiler, which results in a decrease of ~8% in
inference performance. This commit fixes that issue by calculating a
part of the offset with a 32-bit divide. Naturally, this limits the
size of a single matrix to ~4GB. However, this limitation should
suffice for the near future.

* metal: fix bugs for GQA and perplexity test.

I mixed up ne02 and nb02 in previous commit.

commit | commitdiff | tree

Georgi Gerganov [Tue, 15 Aug 2023 07:04:58 +0000 (10:04 +0300)]

scripts : add helper script to get wikitext

commit | commitdiff | tree

Jhen-Jie Hong [Mon, 14 Aug 2023 22:14:14 +0000 (06:14 +0800)]

server : add missing /json-schema-to-grammar.mjs (#2616)

fixes #2611

commit | commitdiff | tree

Jhen-Jie Hong [Mon, 14 Aug 2023 13:37:39 +0000 (21:37 +0800)]

metal : return null instead of exit(1) (#2573)

commit | commitdiff | tree

Cheng Shao [Mon, 14 Aug 2023 13:36:42 +0000 (15:36 +0200)]

server : add --numa support (#2524)

commit | commitdiff | tree

Kamil Tomšík [Mon, 14 Aug 2023 13:35:16 +0000 (15:35 +0200)]

llama : add missing enum keyword in function signatures (#2610)

commit | commitdiff | tree

Johannes Gäßler [Mon, 14 Aug 2023 08:41:22 +0000 (10:41 +0200)]

CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596)

commit | commitdiff | tree

Jhen-Jie Hong [Mon, 14 Aug 2023 08:20:17 +0000 (16:20 +0800)]

server : fix default grammar by use empty string in the UI (#2604)

commit | commitdiff | tree

Jhen-Jie Hong [Mon, 14 Aug 2023 07:16:54 +0000 (15:16 +0800)]

server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588)

* server : implement json-schema-to-grammar.mjs by follow python impl

* server : add grammar support in chat.mjs

* server : implement grammer param in the UI

* server : generate .hpp

* server : remove trailing whitespaces

* server : generate .hpp

* server : fix sort of prop pairs

* server : optimize regex & iteration

commit | commitdiff | tree

vxiiduu [Mon, 14 Aug 2023 03:59:16 +0000 (13:59 +1000)]

Enhance Windows 7 and below compatibility. (#2592)

* Enhance Windows 7 compatibility.
* Clean away unnecessary preprocessor conditional

commit | commitdiff | tree

drbh [Sun, 13 Aug 2023 14:00:48 +0000 (10:00 -0400)]

test : add simple grammar parsing tests (#2594)

* adds simple grammar parsing tests

* adds cassert header

commit | commitdiff | tree

Johannes Gäßler [Sat, 12 Aug 2023 22:24:45 +0000 (00:24 +0200)]

CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590)

commit | commitdiff | tree

byte-6174 [Fri, 11 Aug 2023 23:17:25 +0000 (19:17 -0400)]

Adding support for llama2.c models (#2559)

commit | commitdiff | tree

Equim [Fri, 11 Aug 2023 22:35:14 +0000 (06:35 +0800)]

server: fixed wrong variable name in timing json (#2579)

* server: fixed wrong variable name in timing json

* remove redunct entry

commit | commitdiff | tree

DannyDaemonic [Thu, 10 Aug 2023 20:11:36 +0000 (13:11 -0700)]

Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.

commit | commitdiff | tree

Christian Demsar [Thu, 10 Aug 2023 14:28:27 +0000 (10:28 -0400)]

Add --n-predict -2 for stopping generation on full context (#2565)

commit | commitdiff | tree

Martin Krasser [Thu, 10 Aug 2023 10:16:38 +0000 (12:16 +0200)]

Fix grammar-based sampling issue in server (#2566)

commit | commitdiff | tree

Sam Spilsbury [Wed, 9 Aug 2023 20:47:42 +0000 (23:47 +0300)]

ggml-alloc: Don't try to re-use buffers of external tensors (#2562)

* ggml-alloc: Don't try to re-use buffers of external tensors

They might be weights that came from another context, so we
have no control over them (and they might be re-used elsewhere
so writing to them would be a bad idea).

* ggml-alloc: >= when checking for out-of-bounds

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

grahameth [Wed, 9 Aug 2023 20:46:40 +0000 (22:46 +0200)]

add log_callback to llama_context_params for custom logging. (#2234)

* add log_callback to llama_context_params for custom logging.

* Fix macro expansion on gcc

* Add struct llama_state for global variables and move log_callback there

* Turn log level into enum and some minor changes.

* Remove model_for_logging parameter (not needed anymore)

* Convert remaining fprintf(stderr, ...) calls to use new macros.

* Fix enum and initialize g_state

* Fix log calls after merge

* Fix missing static

* Add back all the new lines in the logging strings

* Add comment for llama_log_callback and replace remaining printf calls

---------

Co-authored-by: grahameth <->
Co-authored-by: Helmut <redacted>

commit | commitdiff | tree

Johannes Gäßler [Wed, 9 Aug 2023 07:42:34 +0000 (09:42 +0200)]

CUDA: tuned mul_mat_q kernels (#2546)

commit | commitdiff | tree

Martin Krasser [Tue, 8 Aug 2023 13:29:19 +0000 (15:29 +0200)]

Allow passing grammar to completion endpoint (#2532)

* Allow passing grammar to completion endpoint

commit | commitdiff | tree

Johannes Gäßler [Tue, 8 Aug 2023 12:38:16 +0000 (14:38 +0200)]

CUDA: tighter VRAM scratch size for 65b/70b (#2551)

commit | commitdiff | tree

chaihahaha [Tue, 8 Aug 2023 12:07:02 +0000 (20:07 +0800)]

llm.vim : multiline autocompletion, get rid of "^@" (#2543)

commit | commitdiff | tree

Georgi Gerganov [Tue, 8 Aug 2023 12:05:30 +0000 (15:05 +0300)]

vim : bring back simple llm.vim example

commit | commitdiff | tree

AustinMroz [Tue, 8 Aug 2023 11:44:48 +0000 (06:44 -0500)]

vim : streaming and more (#2495)

* Update Vim plugin

* Remove getbufoneline usage, Add input bind example.

getbufoneline() appears to be a recently added function and has been
replaced with getbufline for compatibility.

An additional example that explains how to add a keybind that works in
insert mode was added.

commit | commitdiff | tree

klosax [Mon, 7 Aug 2023 17:07:19 +0000 (19:07 +0200)]

Add --rope-scale parameter (#2544)

* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Aug 2023 11:25:58 +0000 (14:25 +0300)]

ggml : mul mat tweaks (#2372)

* ggml : mul mat wip

ggml-ci

* ggml : alternative thread distribution for mul_mat

ggml-ci

* ggml : mul_mat block tiling attempt

* ggml : mul_mat threads yield

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Aug 2023 11:24:42 +0000 (14:24 +0300)]

ggml : pad result of ggml_nbytes()

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Aug 2023 10:55:18 +0000 (13:55 +0300)]

ggml : change params pointer (style change) (#2539)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Aug 2023 10:20:09 +0000 (13:20 +0300)]

ggml : sync (custom ops) (#2537)

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Mon, 7 Aug 2023 08:09:40 +0000 (10:09 +0200)]

Fixed mmap prefetch for GPU offloading (#2529)

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Aug 2023 07:52:57 +0000 (10:52 +0300)]

metal : fix out-of-bounds access + inc concurrency nodes (#2416)

* metal : fix out-of-bounds access + style changes

* metal : increase concurrency nodes to 2*GGML_MAX_NODES

commit | commitdiff | tree

GiviMAD [Mon, 7 Aug 2023 06:21:46 +0000 (23:21 -0700)]

[Makefile] Move ARM CFLAGS before compilation (#2536)

commit | commitdiff | tree

Henri Vasserman [Mon, 7 Aug 2023 05:35:53 +0000 (08:35 +0300)]

[Zig] Rewrite build for Zig 0.11 (#2514)

* zig build fixes

* Disable LTO on Windows.

commit | commitdiff | tree

DannyDaemonic [Sun, 6 Aug 2023 06:49:34 +0000 (23:49 -0700)]

console : fix issue related to Windows 11 PowerShell console mode persistence (#2521)

commit | commitdiff | tree

Keiichi Tabata [Sun, 6 Aug 2023 06:34:05 +0000 (15:34 +0900)]

convert.py : add missing abstract methods for quantized data (#2491)

commit | commitdiff | tree

Johannes Gäßler [Sat, 5 Aug 2023 16:20:44 +0000 (18:20 +0200)]

CUDA: faster k-quant mul_mat_q kernels (#2525)

commit | commitdiff | tree

Jonas Wunderlich [Fri, 4 Aug 2023 20:16:11 +0000 (20:16 +0000)]

fix firefox autoscroll (#2519)

commit | commitdiff | tree

Cebtenzzre [Fri, 4 Aug 2023 19:00:57 +0000 (15:00 -0400)]

server: regenerate completion.js.hpp (#2515)

commit | commitdiff | tree

Cebtenzzre [Fri, 4 Aug 2023 15:35:22 +0000 (11:35 -0400)]

CUDA: use min compute capability of GPUs actually used (#2506)

commit | commitdiff | tree

Cebtenzzre [Fri, 4 Aug 2023 15:34:32 +0000 (11:34 -0400)]

CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)

Fixes #2503

commit | commitdiff | tree

DannyDaemonic [Fri, 4 Aug 2023 15:20:12 +0000 (08:20 -0700)]

Add --simple-io option for subprocesses and break out console.h and cpp (#1558)

commit | commitdiff | tree

Stephen Nichols [Fri, 4 Aug 2023 11:37:24 +0000 (06:37 -0500)]

Fixing race condition in server and partial stream handling in frontend. (#2391)

* Fixing race condition in server.cpp and partial stream handling in completion.js

* Reverting assert edits.

* Adding newline to eof

commit | commitdiff | tree

l3utterfly [Fri, 4 Aug 2023 11:29:52 +0000 (19:29 +0800)]

Stream save llama context data to file instead of allocating entire buffer upfront (#2488)

* added stream saving context data to file to avoid allocating unnecessary amounts of memory

* generalised copying state data to file or buffer

* added comments explaining how copy_state_data works

* fixed trailing whitespaces

* fixed save load state example

* updated save load state to use public function in llama.cpp

* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function

* fixed function declaration order

* restored save load state example

* fixed whitepace

* removed unused llama-util.h include

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* Apply code review suggestions

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Fri, 4 Aug 2023 10:07:21 +0000 (13:07 +0300)]

build : fix several cast and printf warnings (#2499)

commit | commitdiff | tree

Evan Jones [Thu, 3 Aug 2023 02:05:44 +0000 (22:05 -0400)]

examples : generate JSON according to schema (#1887)

* examples : add JSON schema grammars

* complete JSON grammar

* ensure primitive types can be used as root of schema

* support integer type and adjust usage text

commit | commitdiff | tree

Johannes Gäßler [Wed, 2 Aug 2023 16:04:04 +0000 (18:04 +0200)]

CUDA: faster non k-quant mul_mat_q kernels (#2483)

commit | commitdiff | tree

Johannes Gäßler [Wed, 2 Aug 2023 14:48:10 +0000 (16:48 +0200)]

CUDA: Fix models with output size != 32000 (#2480)

commit | commitdiff | tree

ldwang [Wed, 2 Aug 2023 08:21:11 +0000 (16:21 +0800)]

readme : add Aquila-7B model series to supported models (#2487)

* support bpe tokenizer in convert

Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert

Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <redacted>
* Add Aquila-7B models in README.md

Signed-off-by: ldwang <redacted>
* Up Aquila-7B models in README.md

Signed-off-by: ldwang <redacted>
---------

Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>

commit | commitdiff | tree

Eve [Wed, 2 Aug 2023 08:06:19 +0000 (04:06 -0400)]

tests : Fix compilation warnings (Linux/GCC) (#2451)

* fix hellaswag print format, cast away warning in test-double-float

* c++11 cannot use designated initializers

* add static to test-grad0.c internal functions

* use memcpy in test-double-float.c

* port c tests to c++

* use initializer list for ggml_init_params

commit | commitdiff | tree

Yiming Cui [Wed, 2 Aug 2023 06:18:31 +0000 (14:18 +0800)]

readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)

* add support for chinese llama-2 / alpaca-2

* remove white spaces

commit | commitdiff | tree

Bono Lv [Tue, 1 Aug 2023 12:54:28 +0000 (20:54 +0800)]

fix a typo in examples/server/README.md (#2478)

commit | commitdiff | tree

ebraminio [Tue, 1 Aug 2023 08:56:23 +0000 (01:56 -0700)]

server : Support dark mode (#2414)

* server : Support dark mode

So it respects user system light / dark settings.

* Update index.html.hpp by running ./deps.sh

commit | commitdiff | tree

Matteo Boschini [Tue, 1 Aug 2023 07:43:12 +0000 (09:43 +0200)]

metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)

* Added gqa8 kernel to allow llama-2-70B on metal

* Update ggml-metal.m

Co-authored-by: Cebtenzzre <redacted>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast

* Added ne03==ne13 assertion

---------

Co-authored-by: Cebtenzzre <redacted>

commit | commitdiff | tree

Johannes Gäßler [Mon, 31 Jul 2023 19:02:19 +0000 (21:02 +0200)]

CUDA: fixed LLAMA_FAST compilation option (#2473)

commit | commitdiff | tree

Johannes Gäßler [Mon, 31 Jul 2023 17:52:22 +0000 (19:52 +0200)]

CUDA: fixed cmake F16 option (#2471)

commit | commitdiff | tree

Johannes Gäßler [Mon, 31 Jul 2023 13:44:35 +0000 (15:44 +0200)]

CUDA: mmq CLI option, fixed mmq build issues (#2453)

commit | commitdiff | tree

Johannes Gäßler [Mon, 31 Jul 2023 12:32:30 +0000 (14:32 +0200)]

CUDA: Implemented row flattening for non-glm RoPE (#2468)

commit | commitdiff | tree

Johannes Gäßler [Mon, 31 Jul 2023 11:18:51 +0000 (13:18 +0200)]

CUDA: fewer memory bank conflicts for mul_mat_q (#2458)

commit | commitdiff | tree

slaren [Mon, 31 Jul 2023 09:02:53 +0000 (11:02 +0200)]

Fix Metal backend broken from the allocator changes (#2455)

* fix Metal backend broken from the allocator changes

commit | commitdiff | tree

slaren [Sun, 30 Jul 2023 13:58:01 +0000 (15:58 +0200)]

ggml : add graph tensor allocator (#2411)

* ggml : add graph tensor allocator

* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset

* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset

commit | commitdiff | tree

Johannes Gäßler [Sat, 29 Jul 2023 21:04:44 +0000 (23:04 +0200)]

CUDA: Quantized matrix matrix multiplication (#2160)

* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds

commit | commitdiff | tree

Johannes Gäßler [Sat, 29 Jul 2023 21:04:10 +0000 (23:04 +0200)]

CUDA: faster multi GPU synchronization (#2448)

commit | commitdiff | tree

klosax [Fri, 28 Jul 2023 18:25:36 +0000 (20:25 +0200)]

perplexity : add Hellaswag calculation (#2389)

* common.h : add hellaswag / remove perplexity-lines

* common.cpp : add hellaswag / remove perplexity-lines

* perplexity.cpp : add hellswag scores / remove perplexity-lines

* perplexity.cpp : clean up

* common.h : change default param value

* common.cpp : Change default param

* perplexity.cpp : alter wording

* common.h : alter wording

* common.cpp : alter wording

commit | commitdiff | tree

Lee [Fri, 28 Jul 2023 18:17:45 +0000 (02:17 +0800)]

ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405)

commit | commitdiff | tree

eric8607242 [Fri, 28 Jul 2023 18:10:05 +0000 (02:10 +0800)]

llama : support more diverse tokenizers? (#2420)

* supporting more diverse tokenizers

* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Jul 2023 18:05:08 +0000 (21:05 +0300)]

examples : fix whitespace

commit | commitdiff | tree

nhamanasu [Fri, 28 Jul 2023 18:02:10 +0000 (03:02 +0900)]

examples : server chat mode with llama2 (#2400)

* add: server chat mode with llama2

* fix: remove the unnecessary last \n

commit | commitdiff | tree

Weird Constructor [Fri, 28 Jul 2023 08:44:43 +0000 (10:44 +0200)]

readme : fix the description of the Tail free sampling (TFS) method (#2431)

commit | commitdiff | tree

Rand Xie [Fri, 28 Jul 2023 08:42:53 +0000 (01:42 -0700)]

llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)

commit | commitdiff | tree

niansa/tuxifan [Fri, 28 Jul 2023 01:14:11 +0000 (03:14 +0200)]

Obtaining LLaMA 2 instructions (#2308)

* Obtaining LLaMA 2 instructions

* Removed sharing warning for LLaMA 2

* Linked TheBloke's GGML repos

* Add LLaMA 2 to list of supported models

* Added LLaMA 2 usage instructions

* Added links to LLaMA 2 70B models

commit | commitdiff | tree

mj-shifu [Thu, 27 Jul 2023 20:39:17 +0000 (22:39 +0200)]

convert.py : Update to support 70B HF format model files (#2427)

* convert.py : fix llama 2 70b conversion from Huggingface

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Jul 2023 08:00:54 +0000 (11:00 +0300)]

metal : disable graph concurrency optimization due to bug (#2413)

commit | commitdiff | tree

slaren [Wed, 26 Jul 2023 21:57:23 +0000 (23:57 +0200)]

ggml : fix assert in ggml_set_unary_op (#2410)

commit | commitdiff | tree

Cebtenzzre [Wed, 26 Jul 2023 18:00:04 +0000 (14:00 -0400)]

make : build with -Wmissing-prototypes (#2394)

commit | commitdiff | tree

slaren [Wed, 26 Jul 2023 13:56:53 +0000 (15:56 +0200)]

ggml : allocate graphs in a context (#2392)

* ggml : graph allocation in contexts

* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx

* llama.cpp : allocate graph in the context

* add GGML_PAD

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Kawrakow [Tue, 25 Jul 2023 15:35:53 +0000 (18:35 +0300)]

Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

slaren [Tue, 25 Jul 2023 14:20:12 +0000 (16:20 +0200)]

ggml : fix ggml_flash_attn to use op_params (#2387)

* ggml : fix ggml_flash_attn to use op_params

commit | commitdiff | tree

ldwang [Tue, 25 Jul 2023 13:22:09 +0000 (21:22 +0800)]

convert.py : support bpe tokenizer (#2228)

* support bpe tokenizer in convert

Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert

Signed-off-by: ldwang <redacted>
* support bpe tokenizer in convert, fix

Signed-off-by: ldwang <redacted>
---------

Signed-off-by: ldwang <redacted>
Co-authored-by: ldwang <redacted>

commit | commitdiff | tree

Jiahao Li [Tue, 25 Jul 2023 12:58:32 +0000 (20:58 +0800)]

ggml : relax contiguous constraints in activation function (#2371)

commit | commitdiff | tree

slaren [Tue, 25 Jul 2023 12:32:20 +0000 (14:32 +0200)]

ggml : improve graph build time via hash table lookup (#2329)

* improve graph build time

* ggml_tensor : use 1 bit per flag

* use a hash table instead

Packaging of ggml-org/llama.cpp

RSS Atom