git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

Georgi Gerganov [Sun, 10 Nov 2024 19:45:25 +0000 (21:45 +0200)]

flake.lock: Update (#10243)

Flake lock file updates:

• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/807e9154dcb16384b1b765ebe9cd2bba2ac287fd?narHash=sha256-l253w0XMT8nWHGXuXqyiIC/bMvh1VRszGXgdpQlfhvU%3D' (2024-10-29)
→ 'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

MaggotHATE [Sun, 10 Nov 2024 19:42:25 +0000 (00:42 +0500)]

server : (web UI) Add back sampler settings (#10239)

* Add back samplers to server

* Added tooltips with basic information

* Fixed stretching of input fields.

* use component for settings input, move help msg to tooltips

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Jeff Bolz [Sun, 10 Nov 2024 11:37:56 +0000 (05:37 -0600)]

vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226)

commit | commitdiff | tree

Georgi Gerganov [Sat, 9 Nov 2024 09:53:13 +0000 (11:53 +0200)]

metal : reorder write loop in mul mat kernel + style (#10231)

* metal : reorder write loop

* metal : int -> short, style

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 9 Nov 2024 09:53:02 +0000 (11:53 +0200)]

metal : fix build and some more comments (#10229)

commit | commitdiff | tree

Georgi Gerganov [Sat, 9 Nov 2024 09:52:45 +0000 (11:52 +0200)]

metal : fix F32 accumulation in FA vec kernel (#10232)

commit | commitdiff | tree

Georgi Gerganov [Sat, 9 Nov 2024 09:26:34 +0000 (11:26 +0200)]

llama : fix Qwen model type strings

commit | commitdiff | tree

Georgi Gerganov [Sat, 9 Nov 2024 09:21:49 +0000 (11:21 +0200)]

metal : hide debug messages from normal log

commit | commitdiff | tree

SXX [Sat, 9 Nov 2024 07:35:46 +0000 (15:35 +0800)]

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213)

commit | commitdiff | tree

amritahs-ibm [Sat, 9 Nov 2024 07:17:50 +0000 (12:47 +0530)]

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for FP32 datatype.

This change results in a consistent 90%
improvement in input processing time, and 20%
to 80% improvement in output processing time,
across various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

haopeng [Sat, 9 Nov 2024 07:06:54 +0000 (15:06 +0800)]

scripts : fix pattern and get n_tokens in one go (#10221)

commit | commitdiff | tree

Georgi Gerganov [Fri, 8 Nov 2024 19:59:46 +0000 (21:59 +0200)]

metal : opt-in compile flag for BF16 (#10218)

* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 8 Nov 2024 16:37:41 +0000 (18:37 +0200)]

metal : improve clarity (minor) (#10171)

commit | commitdiff | tree

Georgi Gerganov [Fri, 8 Nov 2024 11:47:22 +0000 (13:47 +0200)]

metal : optimize FA kernels (#10171)

* ggml : add ggml_flash_attn_ext_get_prec

* metal : use F16 precision in FA kernels

ggml-ci

* metal : minor clean-up

* metal : compile-guard bf16 FA kernels

ggml-ci

* build : remove obsolete compile flag [no ci]

* metal : prevent int overflows [no ci]

* cuda : disable BF16 FA

ggml-ci

* metal : fix BF16 requirement for FA kernels

ggml-ci

* make : clean-up [no ci]

commit | commitdiff | tree

Jhen-Jie Hong [Fri, 8 Nov 2024 09:34:06 +0000 (17:34 +0800)]

swift : exclude ggml-metal-embed.metal (#10211)

* llama.swift : exclude ggml-metal-embed.metal

* swift : exclude build/

commit | commitdiff | tree

Xuan Son Nguyen [Thu, 7 Nov 2024 22:44:38 +0000 (18:44 -0400)]

server : minor UI fix (#10207)

commit | commitdiff | tree

Xuan Son Nguyen [Thu, 7 Nov 2024 21:31:10 +0000 (17:31 -0400)]

server : revamp chat UI with vuejs and daisyui (#10175)

* server : simple chat UI with vuejs and daisyui

* move old files to legacy folder

* embed deps into binary

* basic markdown support

* add conversation history, save to localStorage

* fix bg-base classes

* save theme preferences

* fix tests

* regenerate, edit, copy buttons

* small fixes

* docs: how to use legacy ui

* better error handling

* make CORS preflight more explicit

* add GET method for CORS

* fix tests

* clean up a bit

* better auto scroll

* small fixes

* use collapse-arrow

* fix closeAndSaveConfigDialog

* small fix

* remove console.log

* fix style for <pre> element

* lighter bubble color (less distract when reading)

commit | commitdiff | tree

Georgi Gerganov [Thu, 7 Nov 2024 21:11:36 +0000 (23:11 +0200)]

scripts : add amx to sync-ggml.sh [no ci]

commit | commitdiff | tree

Georgi Gerganov [Thu, 7 Nov 2024 21:08:24 +0000 (23:08 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Thu, 7 Nov 2024 21:07:55 +0000 (23:07 +0200)]

scripts : sync update

commit | commitdiff | tree

Diego Devesa [Thu, 7 Nov 2024 17:16:08 +0000 (18:16 +0100)]

ggml : add ggml-cpu.h to the public headers (#10204)

commit | commitdiff | tree

Faisal Zaghloul [Thu, 7 Nov 2024 16:46:12 +0000 (11:46 -0500)]

Remove identical wte/etw logic for jais (#10203)

commit | commitdiff | tree

wwoodsTM [Thu, 7 Nov 2024 15:20:25 +0000 (08:20 -0700)]

DRY: Fixes clone functionality (#10192)

commit | commitdiff | tree

snadampal [Thu, 7 Nov 2024 08:02:08 +0000 (02:02 -0600)]

fix q4_0_8_8 format for corrupted tokens issue (#10198)

Co-authored-by: EC2 Default User <redacted>

commit | commitdiff | tree

Zhiyuan Li [Thu, 7 Nov 2024 07:19:10 +0000 (18:19 +1100)]

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133)

* rwkv6: rename to wkv6

* rwkv6: support avx2 avx512 armv8 armv9

* rwkv6: update cuda file name

* rwkv6: rename params

* wkv on sycl

* sycl: add some ops

* sycl: Enhance OP support judgment

* wkv6: drop armv9 and tranfer to GGML style

ggml-ci

* sync : ggml

* update the function to use appropriate types

* fix define error

* Update ggml/src/ggml-cpu.c

* add appropriate asserts

* move element-wise functions outside

* put the declaration outside the loop

* rewrite to be more inline with the common pattern for distributing threads

* use recommended way GGML_TENSOR_LOCALS

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Diego Devesa <redacted>
Co-authored-by: Plamen Minev <redacted>
Co-authored-by: Yuri Khrustalev <redacted>
Co-authored-by: Meng, Hengyu <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 6 Nov 2024 17:53:51 +0000 (19:53 +0200)]

metal : add BF16 support (#8439)

* ggml : add initial BF16 support

ggml-ci

* metal : add mul_mat_id BF16 support

ggml-ci

* metal : check for bfloat support on the Metal device

ggml-ci

* metal : better var names [no ci]

* metal : do not build bfloat kernels when not supported

ggml-ci

* metal : try to fix BF16 support check

ggml-ci

* metal : this should correctly check bfloat support

commit | commitdiff | tree

Georgi Gerganov [Wed, 6 Nov 2024 11:29:01 +0000 (13:29 +0200)]

server : remove hack for extra parallel slot (#10187)

ggml-ci

commit | commitdiff | tree

Diego Devesa [Wed, 6 Nov 2024 11:10:07 +0000 (12:10 +0100)]

metal : fix from ptr buffer name (#10189)

commit | commitdiff | tree

Georgi Gerganov [Wed, 6 Nov 2024 09:20:10 +0000 (11:20 +0200)]

ggml : adjust is_first_call init value (#10193)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 6 Nov 2024 08:24:23 +0000 (10:24 +0200)]

metal : add quantized FA support (#10149)

* metal : add quantized FA (vec) support

ggml-ci

* metal : add quantized FA (non-vec) support

* metal : fix support check

ggml-ci

* metal : clean-up

* metal : clean-up (cont)

* metal : fix shared memory calc + reduce smem + comments

* metal : float-correctness

* metal : minor [no ci]

commit | commitdiff | tree

Gabe Goodhart [Tue, 5 Nov 2024 12:23:04 +0000 (05:23 -0700)]

llama : add <|tool_call|> formatting to Granite template (#10177)

Branch: GraniteToolCallTemplate

Signed-off-by: Gabe Goodhart <redacted>

commit | commitdiff | tree

Diego Devesa [Mon, 4 Nov 2024 22:17:01 +0000 (23:17 +0100)]

ggml : fix arch check in bf16_to_fp32 (#10164)

commit | commitdiff | tree

Eve [Mon, 4 Nov 2024 22:06:31 +0000 (22:06 +0000)]

Q6_K AVX improvements (#10118)

* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86

commit | commitdiff | tree

Diego Devesa [Mon, 4 Nov 2024 19:06:58 +0000 (20:06 +0100)]

ggml : fix gelu tables initialization (#10172)

commit | commitdiff | tree

Diego Devesa [Mon, 4 Nov 2024 16:34:08 +0000 (17:34 +0100)]

ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 4 Nov 2024 15:33:29 +0000 (16:33 +0100)]

server : clarify /slots endpoint, add is_processing (#10162)

* server : clarify /slots endpoint, add is_processing

* fix tests

commit | commitdiff | tree

snadampal [Mon, 4 Nov 2024 15:08:33 +0000 (09:08 -0600)]

fix build break on arm64 linux (#10166)

This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144

commit | commitdiff | tree

Diego Devesa [Mon, 4 Nov 2024 12:10:23 +0000 (13:10 +0100)]

cuda : clear error after changing peer access (#10153)

commit | commitdiff | tree

Georgi Gerganov [Mon, 4 Nov 2024 11:49:34 +0000 (13:49 +0200)]

metal : simplify f16 and f32 dequant kernels (#0)

commit | commitdiff | tree

Georgi Gerganov [Mon, 4 Nov 2024 11:43:32 +0000 (13:43 +0200)]

metal : move dequantize templates to beginning of MSL source (#0)

commit | commitdiff | tree

leo-pony [Mon, 4 Nov 2024 11:08:22 +0000 (19:08 +0800)]

CANN: adjust backend registry refactor. (#10158)

remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.

commit | commitdiff | tree

Georgi Gerganov [Mon, 4 Nov 2024 08:33:37 +0000 (10:33 +0200)]

sync : ggml

commit | commitdiff | tree

Yuri Khrustalev [Sat, 2 Nov 2024 09:09:12 +0000 (05:09 -0400)]

cmake : make it possible linking ggml as external lib (ggml/1003)

commit | commitdiff | tree

Plamen Minev [Fri, 1 Nov 2024 14:55:10 +0000 (16:55 +0200)]

metal : fix minor string leaks (ggml/1004)

commit | commitdiff | tree

Diego Devesa [Sun, 3 Nov 2024 18:34:08 +0000 (19:34 +0100)]

ggml : move CPU backend to a separate file (#10144)

commit | commitdiff | tree

Georgi Gerganov [Sun, 3 Nov 2024 13:18:40 +0000 (15:18 +0200)]

metal : minor fixup in FA kernel (#10143)

* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var

commit | commitdiff | tree

Georgi Gerganov [Sun, 3 Nov 2024 13:14:15 +0000 (15:14 +0200)]

flake.lock: Update (#10146)

commit | commitdiff | tree

Christian Köhnenkamp [Sat, 2 Nov 2024 22:35:31 +0000 (23:35 +0100)]

Add apple arm to presets (#10134)

* Add apple arm to presets

* Add final new line

commit | commitdiff | tree

sasha0552 [Sat, 2 Nov 2024 16:34:56 +0000 (16:34 +0000)]

server : fix slot selection by lru (#10126)

* server : fix slot selection by lru, migrate lcs to `size_t`

* minor debug log fix

commit | commitdiff | tree

Georgi Gerganov [Sat, 2 Nov 2024 16:34:00 +0000 (18:34 +0200)]

server : fix endpoint checks (#10135)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 2 Nov 2024 13:18:56 +0000 (15:18 +0200)]

llama : adjust default context size + print warnings (#10136)

* llama : adjust default context size + print warnings

ggml-ci

* ggml-ci : add missing gpu-layers + adjust context sizes

commit | commitdiff | tree

Diego Devesa [Sat, 2 Nov 2024 12:08:53 +0000 (13:08 +0100)]

simple-chat : only add bos on first prompt (#10129)

commit | commitdiff | tree

Xuan Son Nguyen [Sat, 2 Nov 2024 11:53:17 +0000 (12:53 +0100)]

convert-lora : make `--base` optional (#10110)

* convert-lora : make `--base` optional

* lint

* handle case where base_model_name_or_path is invalid

* do not include metadata from base model

* clarify unspecified --base

* add small comment [no ci]

* trigger ci

commit | commitdiff | tree

Diego Devesa [Fri, 1 Nov 2024 22:50:59 +0000 (23:50 +0100)]

llama : add simple-chat example (#10124)

* llama : add simple-chat example

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Diego Devesa [Fri, 1 Nov 2024 22:48:26 +0000 (23:48 +0100)]

llama : use smart pointers for ggml resources (#10117)

commit | commitdiff | tree

Shupei Fan [Fri, 1 Nov 2024 18:33:14 +0000 (02:33 +0800)]

vulkan : improve ggml_vk_create_buffer error handling (#9898)

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 15:31:51 +0000 (17:31 +0200)]

readme : update hot topics

commit | commitdiff | tree

sasha0552 [Fri, 1 Nov 2024 13:33:14 +0000 (13:33 +0000)]

server : fix smart selection of available slot (#10120)

* Fix smart selection of available slot

* minor fix

* replace vectors of tokens with shorthands

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 10:58:45 +0000 (12:58 +0200)]

ggml : remove ggml_scratch (#10121)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 08:28:24 +0000 (10:28 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 08:23:05 +0000 (10:23 +0200)]

ggml : alloc ggml_contexts on the heap (whisper/2525)

commit | commitdiff | tree

Zhenwei Jin [Fri, 1 Nov 2024 03:09:59 +0000 (11:09 +0800)]

build: fix build error in Windows env with OneAPI setup (#10107)

commit | commitdiff | tree

Diego Devesa [Thu, 31 Oct 2024 23:49:53 +0000 (00:49 +0100)]

llama : improve output buffer type selection (#10098)

commit | commitdiff | tree

Diego Devesa [Thu, 31 Oct 2024 23:45:34 +0000 (00:45 +0100)]

quantize : fix --keep-split (#10114)

commit | commitdiff | tree

Diego Devesa [Thu, 31 Oct 2024 21:54:23 +0000 (22:54 +0100)]

llama : fix buffer checks for mamba and rwk (#10111)

* llama : fix buffer checks for mamba and rwk

* llama : fix missing worst case flag during reserve

* cuda : fix supports_op for norm

* disable sched SET_CAUSE

commit | commitdiff | tree

Zhenwei Jin [Thu, 31 Oct 2024 18:50:39 +0000 (02:50 +0800)]

loader: refactor tensor weights storage (#9935)

* loader: refactor tensor weights storage

* use sorted map, sort weights by layer

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Kevin Gibbons [Thu, 31 Oct 2024 13:02:35 +0000 (06:02 -0700)]

server : include scheme when printing URL (#10106)

commit | commitdiff | tree

Diego Devesa [Thu, 31 Oct 2024 10:40:59 +0000 (11:40 +0100)]

ggml : check tensor name lengths in gguf files (#10100)

commit | commitdiff | tree

Sergio López [Thu, 31 Oct 2024 09:09:52 +0000 (10:09 +0100)]

kompute: add mul_mat_q4_k shader (#10097)

This is a more or less direct translation from the Metal implementation
to GLSL.

Signed-off-by: Sergio Lopez <redacted>

commit | commitdiff | tree

Sergio López [Wed, 30 Oct 2024 16:01:52 +0000 (17:01 +0100)]

kompute: add backend registry / device interfaces (#10045)

Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <redacted>

commit | commitdiff | tree

Diego Devesa [Wed, 30 Oct 2024 13:51:21 +0000 (14:51 +0100)]

ggml : fix memory leaks when loading invalid gguf files (#10094)

* ggml : fix gguf string leak when reading kv pairs fails

* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type

* ggml : avoid crashing on failed memory allocations when loading a gguf file

commit | commitdiff | tree

Rich Dougherty [Wed, 30 Oct 2024 12:22:39 +0000 (01:22 +1300)]

readme : more lora detail in main example readme (#10064)

commit | commitdiff | tree

Rich Dougherty [Wed, 30 Oct 2024 12:22:21 +0000 (01:22 +1300)]

convert : more detailed convert lora usage docs (#10065)

commit | commitdiff | tree

xctan [Wed, 30 Oct 2024 07:00:40 +0000 (15:00 +0800)]

ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)

* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM

commit | commitdiff | tree

Diego Devesa [Wed, 30 Oct 2024 01:01:23 +0000 (02:01 +0100)]

llama : refactor model loader with backend registry (#10026)

commit | commitdiff | tree

Changyeon Kim [Tue, 29 Oct 2024 08:52:56 +0000 (17:52 +0900)]

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)

* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <redacted>
* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <redacted>
---------

Signed-off-by: Changyeon Kim <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 29 Oct 2024 08:42:05 +0000 (10:42 +0200)]

llama : remove Tail-Free sampling (#10071)

ggml-ci

commit | commitdiff | tree

arch-btw [Mon, 28 Oct 2024 17:45:33 +0000 (10:45 -0700)]

llama : Add IBM granite template (#10013)

* Add granite template to llama.cpp

* Add granite template to test-chat-template.cpp

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* Update tests/test-chat-template.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* Added proper template and expected output

* Small change to \n

Small change to \n

* Add code space &

Co-authored-by: Xuan Son Nguyen <redacted>
* Fix spacing

* Apply suggestions from code review

* Update src/llama.cpp

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 28 Oct 2024 15:41:24 +0000 (17:41 +0200)]

flake.lock: Update (#10063)

Flake lock file updates:

• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/4c2fcb090b1f3e5b47eaa7bd33913b574a11e0a0?narHash=sha256-/uilDXvCIEs3C9l73JTACm4quuHUsIHcns1c%2BcHUJwA%3D' (2024-10-18)
→ 'github:NixOS/nixpkgs/2768c7d042a37de65bb1b5b3268fc987e534c49d?narHash=sha256-AlcmCXJZPIlO5dmFzV3V2XF6x/OpNWUV8Y/FMPGd8Z4%3D' (2024-10-23)

Co-authored-by: github-actions[bot] <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 28 Oct 2024 09:02:48 +0000 (17:02 +0800)]

musa: workaround for Guilty Lockup in cleaning src0 (#10042)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 28 Oct 2024 06:49:32 +0000 (08:49 +0200)]

server : don't overfill the batch during infill (#10018)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sun, 27 Oct 2024 18:59:58 +0000 (20:59 +0200)]

llama : switch KQ multiplication to F32 precision by default (#10015)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 26 Oct 2024 07:34:08 +0000 (10:34 +0300)]

sync : ggml

commit | commitdiff | tree

bssrdf [Wed, 23 Oct 2024 18:34:00 +0000 (14:34 -0400)]

increase cuda_cpy block size (ggml/996)

Co-authored-by: bssrdf <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 26 Oct 2024 07:33:31 +0000 (10:33 +0300)]

scripts : fix amx sync [no ci]

commit | commitdiff | tree

Georgi Gerganov [Fri, 25 Oct 2024 19:26:15 +0000 (22:26 +0300)]

metal : support permuted matrix multiplicaions (#10033)

* metal : support permuted matrix multiplicaions

ggml-ci

* cont : use nb01 directly for row steps

ggml-ci

* cont : add comments [no ci]

* metal : minor refactor

* metal : minor

commit | commitdiff | tree

wwoodsTM [Fri, 25 Oct 2024 16:07:34 +0000 (10:07 -0600)]

llama : add DRY sampler (#9702)

* sampling : add DRY sampler (post-refactor)

* DRY: Trying to fix coauthors, removed unneeded line

* DRY: Fixed redundant code

* DRY: Fixed crash issue due to DRY being in chain but uninitialized

---------

Co-authored-by: l3utterfly <redacted>
Co-authored-by: pi6am <redacted>

commit | commitdiff | tree

Michael Podvitskiy [Fri, 25 Oct 2024 15:57:54 +0000 (17:57 +0200)]

llama: string_split fix (#10022)

* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces

* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string

commit | commitdiff | tree

Srihari-mcw [Fri, 25 Oct 2024 07:27:41 +0000 (12:57 +0530)]

llamafile : extend sgemm.cpp support for Q5_0 models (#10010)

commit | commitdiff | tree

Georgi Gerganov [Fri, 25 Oct 2024 07:13:46 +0000 (10:13 +0300)]

server : check that the prompt fits in the slot's context (#10030)

ggml-ci

commit | commitdiff | tree

Xuan Son Nguyen [Thu, 24 Oct 2024 19:51:22 +0000 (21:51 +0200)]

server : refactor slot input data, move tokenizer to HTTP thread (#10023)

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere

commit | commitdiff | tree

Georgi Gerganov [Thu, 24 Oct 2024 18:23:33 +0000 (21:23 +0300)]

ci : fix cmake flags for SYCL

commit | commitdiff | tree

Johannes Gäßler [Thu, 24 Oct 2024 12:40:23 +0000 (14:40 +0200)]

CUDA: fix insufficient buffer clearing for MMQ (#10032)

commit | commitdiff | tree

Johannes Gäßler [Thu, 24 Oct 2024 09:09:36 +0000 (11:09 +0200)]

CUDA: fix MMQ for non-contiguous src0, add tests (#10021)

* CUDA: fix MMQ for non-contiguous src0, add tests

* revise test code

commit | commitdiff | tree

wwoodsTM [Wed, 23 Oct 2024 19:27:51 +0000 (13:27 -0600)]

server : samplers accept the prompt correctly (#10019)

commit | commitdiff | tree

Georgi Gerganov [Wed, 23 Oct 2024 14:23:55 +0000 (17:23 +0300)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Wed, 23 Oct 2024 14:16:56 +0000 (17:16 +0300)]

llama.vim : bump generation time limit to 3s [no ci]

commit | commitdiff | tree

Johannes Gäßler [Fri, 18 Oct 2024 07:24:44 +0000 (09:24 +0200)]

CUDA: fix 1D im2col, add tests (ggml/993)

commit | commitdiff | tree

Daniel Bevenius [Wed, 16 Oct 2024 18:10:01 +0000 (20:10 +0200)]

ggml : remove redundant set of contexts used field (ggml/978)

This commit removes the setting of the `used` field of the contexts in
the global state (g_state) in `ggml_init`.

The motivation for this change is that I believe that this additional
initialization might not be required after the changes in Commit
45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from
whisper.cpp"), which changed the initialization of the contexts field
from `{ 0 }` to `{ { 0 } }`:

```console
             g_state = (struct ggml_state) {
-                /*.contexts =*/ { 0 },
+                /*.contexts =*/ { { 0 } },
             };
```
My understanding is that the `{0}` initialization might not have
zero-initialized all the nested fields in every array element because of
compiler differences, and might have been the reason for having the
explicit setting of the `used` fields to false.

commit | commitdiff | tree

Michael Coppola [Wed, 23 Oct 2024 11:09:26 +0000 (07:09 -0400)]

llama.vim : add classic vim support (#9995)

* added classic vim support

* fixed ring update, removed blank line

* minor

* minor

* minor doc update

* removed uneeded var

* minor

* minor

* fixed job_start creating new scratch buffers

* fixed job_start creating new scratch buffers

* fixed ghost text indenting when expandtab is on

* removed unused code

* minor

* unified fim_on_exit

* minor

* vim ghost text rendering now uses pos_x and pos_y parameters

* renamed *_hlgroup to hlgroup_*

* renamed *_ghost_text to ghost_text_*, moved nvim/vim detection to llama#init()

* minor

---------

Co-authored-by: Michael Coppola <redacted>

Packaging of ggml-org/llama.cpp

RSS Atom