]>
git.djapps.eu Git - pkg/ggml/sources/ggml/log
Georgi Gerganov [Mon, 26 Aug 2024 14:55:36 +0000 (17:55 +0300)]
ggml : add SSM Metal kernels (llama/8546)
* ggml : add ggml_ssm_conv metal impl
* ggml : add ssm_scan metal impl
ggml-ci
slaren [Mon, 26 Aug 2024 09:08:59 +0000 (11:08 +0200)]
metal : gemma2 flash attention support (llama/9159)
Johannes Gäßler [Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)]
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
* CPU/CUDA: Gemma 2 FlashAttention support
* apply logit_softcap to scale in kernel
* disable logit softcapping tests on Metal
* remove metal check
Akarshan Biswas [Thu, 22 Aug 2024 14:09:47 +0000 (19:39 +0530)]
Add a space to supress a cmake warning (llama/9133)
luoyu-intel [Thu, 22 Aug 2024 04:50:10 +0000 (12:50 +0800)]
Add oneDNN primitive support (llama/9091)
* add onednn
* add sycl_f16
* add dnnl stream
* add engine map
* use dnnl for intel only
* use fp16fp16fp16
* update doc
compilade [Wed, 21 Aug 2024 21:58:11 +0000 (17:58 -0400)]
llama : simplify Mamba with advanced batch splits (llama/8526)
* llama : advanced batch splits
This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.
* llama : always make recurrent state slots contiguous
* ggml : simplify mamba operators
* llama : fix integer signedness mixing
* llama : logits_all has priority over batch->logits
Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.
* llama : apply suggestions
Co-authored-by: Georgi Gerganov <redacted>
* llama : fix t5 segfault
* llama : fix Mamba session save and restore
* llama : minor cosmetic changes
* llama : rename llama_reorder_outputs to llama_output_reorder
Also move it closer to llama_output_reserve.
* llama : fix pooled embeddings when using batches with equal_seqs
* minor : add struct members for clarity
ggml-ci
* llama : fix T5 segfault again
* llama : fix Mamba pooled embeddings with multiple sequences
Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.
* llama : add llama_model_is_recurrent to simplify figuring that out
This will make it easier to more cleanly support RWKV-v6 and Mamba-2.
* llama : fix simple splits when the batch contains embeddings
---------
Co-authored-by: Georgi Gerganov <redacted>
Meng, Hengyu [Tue, 20 Aug 2024 15:50:17 +0000 (23:50 +0800)]
fallback mmvq (llama/9088)
* fallback mmvq to mul_mat
* mmvq in cuda path
* Update ggml/src/ggml-sycl.cpp
Co-authored-by: Alberto Cabrera Pérez <redacted>
---------
Co-authored-by: Alberto Cabrera Pérez <redacted>
zhentaoyu [Tue, 20 Aug 2024 15:06:51 +0000 (23:06 +0800)]
Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)
* sycl: fix im2col overflow and sync with cuda
Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert overflow
Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert and dequantize
Signed-off-by: zhentaoyu <redacted>
* sycl: fix ib in dmmv
Signed-off-by: zhentaoyu <redacted>
* sycl:refine convert
Signed-off-by: zhentaoyu <redacted>
* sycl: move downsample global_range into common
Signed-off-by: zhentaoyu <redacted>
* test: add im2col and convert test cases
Signed-off-by: zhentaoyu <redacted>
* test: make new cases only in sycl
Signed-off-by: zhentaoyu <redacted>
* test: comment new test_cases for only local testing
Signed-off-by: zhentaoyu <redacted>
---------
Signed-off-by: zhentaoyu <redacted>
Radoslav Gerganov [Mon, 19 Aug 2024 07:11:45 +0000 (10:11 +0300)]
rpc : print error message when failed to connect endpoint (llama/9042)
Radoslav Gerganov [Mon, 19 Aug 2024 07:10:21 +0000 (10:10 +0300)]
rpc : prevent crashes on invalid input (llama/9040)
Add more checks which prevent RPC server from crashing if invalid input
is received from client
Nico Bosshard [Fri, 16 Aug 2024 02:22:55 +0000 (04:22 +0200)]
ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
* ggml : Dynamic ggml_sched_max_splits based on graph_size
* Fixed and readded debug code for causes
Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]
cmake : remove unused option GGML_CURL (llama/9011)
Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]
ggml : move rope type enum to ggml.h (llama/8949)
* ggml : move rope type enum to ggml.h
This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.
The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.
Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.
* squash! ggml : move rope type enum to ggml.h
This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.
I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.
* squash! ggml : move rope type enum to ggml.h
This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.
* squash! ggml : move rope type enum to ggml.h
This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.
* squash! ggml : move rope type enum to ggml.h
This commit fixes the editorconfig-checker warnings.
* squash! ggml : move rope type enum to ggml.h
Update comment for ggml_rope function.
* Revert "squash! ggml : move rope type enum to ggml.h"
This reverts commit
6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6 .
* squash! ggml : move rope type enum to ggml.h
Add GGML_ROPE_TYPE_NEOX to rope_common.comp.
* remove extra line
---------
Co-authored-by: slaren <redacted>
DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]
ggml: fix div-by-zero (llama/9003)
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5
Signed-off-by: David Korczynski <redacted>
Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)
* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.
- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.
* Fix small typo
---------
Co-authored-by: 0cc4m <redacted>
Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]
feat: ref. cross entropy, add CUDA, fix grad test (#929)
Salvatore Mesoraca [Tue, 27 Aug 2024 06:25:12 +0000 (08:25 +0200)]
tests : fix memory leaks (#936)
It is annoying to run the tests using the sanitizers
because of all the uninteresting reports about the memory
leaked by the tests themselves.
Signed-off-by: Salvatore Mesoraca <redacted>
Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]
ggml: remove bad assert (#928)
ucag.li [Thu, 22 Aug 2024 19:49:45 +0000 (03:49 +0800)]
set NULL to ggml_context pointer to pass assert check in in case some compiler does not set uninitialized pointer to NULL for mnist example
Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]
examples: add MNIST training + missing ops
Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]
yolo : add backend support (#924)
* yolo : add backend support
* metal : add sub and sqrt kernels
---------
Co-authored-by: Georgi Gerganov <redacted>
Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]
ggml : fix typo in ggml-quants.c comment (#922)
Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]
feat: add new `sin` and `cos` operators (#919)
* ggml : add sin/cos operators
* ggml-cuda : add sin/cos operators
* ggml : add corresponding tests for sin/cos
* ggml : add backward computation for sin/cos operators
* ggml-vulkan : add sin/cos operators
* ggml-vulkan : add sin/cos shader source
* metal : add sin, cos
---------
Co-authored-by: Georgi Gerganov <redacted>
Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]
ggml : support forward pass broadcasting in ggml_sub (#914)
* ggml: support forward pass broadcasting in ggml_sub
Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32
The check is already performed in ggml_sub_impl
Signed-off-by: Salvatore Mesoraca <redacted>
---------
Signed-off-by: Salvatore Mesoraca <redacted>
Georgi Gerganov [Sun, 11 Aug 2024 08:06:21 +0000 (11:06 +0300)]
sync : llama.cpp
slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]
metal : fix uninitialized abort_callback (llama/8968)
Georgi Gerganov [Sat, 10 Aug 2024 06:51:19 +0000 (09:51 +0300)]
sync : llama.cpp
Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]
rpc : sanitize tensor data + warnings (llama/0)
Co-authored-by: slaren <redacted>
Georgi Gerganov [Fri, 9 Aug 2024 07:03:29 +0000 (10:03 +0300)]
sync : whisper.cpp
Matt Stephenson [Tue, 16 Jul 2024 07:21:09 +0000 (03:21 -0400)]
whisper : use vulkan as gpu backend when available (whisper/2302)
* ggml: use vulkan as gpu backend when available
Signed-off-by: Matt Stephenson <redacted>
* whisper: enable using vk as default buffer type
Signed-off-by: Matt Stephenson <redacted>
---------
Signed-off-by: Matt Stephenson <redacted>
hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]
ggml : add CANN backend (llama/0)
ggml-ci
Georgi Gerganov [Thu, 8 Aug 2024 11:46:24 +0000 (14:46 +0300)]
sync : vulkan (llama/0)
Georgi Gerganov [Thu, 8 Aug 2024 10:57:33 +0000 (13:57 +0300)]
scripts : sync sycl (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:54:39 +0000 (13:54 +0300)]
scripts : remove obsolete header (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:51:09 +0000 (13:51 +0300)]
scripts : update sync scripts (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:25:41 +0000 (13:25 +0300)]
sync : llama.cpp
ggml-ci
slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]
ggml-backend : fix async copy from CPU (llama/8897)
* ggml-backend : fix async copy from CPU
* cuda : more reliable async copy, fix stream used when the devices are the same
Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]
Updated SYCL device filtering (llama/8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]
CUDA/HIP: fix tests/test-backend-ops (llama/8896)
Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]
CUDA: fix padding logic for FP16/FP32 (llama/8884)
Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]
ggml : add epsilon as a parameter for group_norm (llama/8818)
Signed-off-by: Molly Sophia <redacted>
Mengqing Cao [Tue, 6 Aug 2024 04:42:42 +0000 (12:42 +0800)]
Fix ggml_backend_cann_buffer_get_tensor (llama/8871)
* cann: fix ggml_backend_cann_buffer_get_tensor
1. fix data ptr offset
2. enable the acquisition of incomplete tensors
* fix backend cann set_tensor
wangshuai09 [Mon, 5 Aug 2024 13:10:37 +0000 (21:10 +0800)]
cann: fix buffer_num and runtime speed slowly error (llama/8865)
Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]
ggml : fix overflows in elu function (llama/8866)
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.
0cc4m [Mon, 5 Aug 2024 05:52:55 +0000 (07:52 +0200)]
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8855)
* Fix Vulkan mul mat vec invalid results when ncols < warp size
* Only run backend ops mul mat vec block size test if block size not already covered
wangshuai09 [Mon, 5 Aug 2024 04:22:30 +0000 (12:22 +0800)]
cann: support q4_0 model (llama/8822)
jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]
ggml : reading the runtime sve config of the cpu (llama/8709)
* ggml : reading the runtime sve config of the cpu
* change to one time init to prevent performance drop
* prefix variable to avoid possible conflicts
* revert xxhash fix and add brackets
---------
Co-authored-by: domke <redacted>
Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
* add truncate_bf16
* truncate intermediate fp32 if converting bf16 to bf16
* fix masking in __compute_fp32_to_bf16
* np.int16 no longer used
* missing cast and additional numpy 2.x fix
* ggml-impl : do not flush bf16 subnormals to zero
* ggml : add reference fp32 to bf16 conversion
The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.
* gguf-py : remove flush to zero for bf16 subnormals
* gguf-py : remove float32 truncation to bf16
Rounding achieves the same thing in the cases where this was used.
* missed prototype update in merge
* merge cleanup
---------
Co-authored-by: Francis Couture-Harpin <redacted>
Mengqing Cao [Fri, 2 Aug 2024 08:50:53 +0000 (16:50 +0800)]
cann: Fix ggml_cann_im2col for 1D im2col (llama/8819)
* fix ggml_cann_im2col for 1D im2col
* fix build warning
Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]
Fixing wrong VDR iq4nl value (llama/8812)
matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]
ggml-cuda: Adding support for unified memory (llama/8035)
* Adding support for unified memory
* adding again the documentation about unified memory
* refactoring: Moved the unified memory code in the correct location.
* Fixed compilation error when using hipblas
* cleaning up the documentation
* Updating the documentation
Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled
---------
Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>
Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]
Build: Only include execinfo.h on linux systems that support it (llama/8783)
* Only enable backtrace on GLIBC linux systems
* fix missing file from copy
* use glibc macro instead of defining a custom one
slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X
* update asserts
* only use dmmv for supported types
* add test
l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]
added android implementation of ggml_print_backtrace_symbols (llama/8751)
* added android implementation of ggml_print_backtrace_symbols
* Update ggml/src/ggml.c
Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c
Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c
Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c
Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c
Co-authored-by: slaren <redacted>
---------
Co-authored-by: slaren <redacted>
wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]
cann: update cmake (llama/8765)
zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]
Add `TIMESTEP_EMBEDDING` OP (llama/8707)
Signed-off-by: zhentaoyu <redacted>
CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.
Co-authored-by: carter.li <redacted>
R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]
cuda : organize vendor-specific headers into vendors directory (llama/8746)
Signed-off-by: Xiaodong Ye <redacted>
Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]
add conv support (llama/8688)
R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]
feat: Support Moore Threads GPU (llama/8383)
* Update doc for MUSA
Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile
Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake
Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA
Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4
Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure
Signed-off-by: Xiaodong Ye <redacted>
---------
Signed-off-by: Xiaodong Ye <redacted>
Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]
ggml : ignore more msvc warnings (#906)
Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]
metal : fix struct name (#912)
ggml-ci
Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]
metal : add abort callback (#905)
0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]
vulkan : implement Stable Diffusion operators (#904)
* Fix Vulkan repeat op
* Implement Vulkan concat op
* Delete old Vulkan shader generator
* Implement Vulkan im2col op
* Implement Vulkan unary gelu_quick op
* Implement Vulkan group_norm op
* Implement Vulkan timestep_embedding op
* Implement Vulkan upscale op
* Fix Vulkan vk_context tensor extra index issue
* Fix Vulkan matmul shader parameter bug
* Properly fix Vulkan matmul shader parameter bug
* Add Vulkan ADD f16 + f32 -> f16 operator support
* Implement Vulkan tanh op
* Fix Vulkan group count too large Validation error on non-Nvidia GPUs
* Throw error when too much memory is requested
* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs
* Fix matmul MMQ condition
* Implement Vulkan pad op
* Fix Vulkan crash when tensor is used multiple times in a compute graph
* Add Vulkan CONCAT f16 + f16 -> f16 op
* Add Vulkan LEAKY_RELU op
Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]
ggml : move c parameter comment to ggml_rope_ext (#901)
This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).
Signed-off-by: Daniel Bevenius <redacted>
Johannes Gäßler [Mon, 29 Jul 2024 13:03:08 +0000 (15:03 +0200)]
examples: add TensorFlow to requirements.txt (#902)
0cc4m [Sat, 27 Jul 2024 14:52:35 +0000 (17:52 +0300)]
ggml : sync vulkan shaders (#0)
ggml-ci
Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]
ggml : resolve sync conflicst (#0)
ggml-ci
Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]
common : handle new quant types (#0)
Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]
ggml : add ggml-aarch64 (#0)
wangshuai09 [Sat, 27 Jul 2024 08:36:44 +0000 (16:36 +0800)]
cann: Fix Multi-NPU execution error (llama/8710)
* cann: fix multi-npu exec error
* cann: update comment for ggml_backend_cann_supports_buft
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]
ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.
This fixes it by bailing out if no context is found.
Andreas (Andi) Kunar [Thu, 25 Jul 2024 16:01:00 +0000 (18:01 +0200)]
ggml : fix build on Windows with Snapdragon X (llama/8531)
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit
bf21397ae5ea7c73d3494db3b91505599909227d .
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <redacted>
Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]
fix multi-gpu issue on sycl (llama/8554)
---------
Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]
ggml : add and use ggml_cpu_has_llamafile() (llama/8664)
Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)
Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]
sycl : Add support for non-release DPC++ & oneMKL (llama/8644)
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>
0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]
Vulkan IQ4_NL Support (llama/8613)
* Fix Vulkan matmul tests compile errors
* Add Vulkan IQ4_NL support
* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]
Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]
fix scratch size of softmax (llama/8642)
Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]
ggml: fix compile error for RISC-V (llama/8623)
Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]
CUDA: MMQ code deduplication + iquant support (llama/8495)
* CUDA: MMQ code deduplication + iquant support
* 1 less parallel job for CI build
Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]
gguf : handle null name during init (llama/8587)
slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]
ggml : fix quant dot product with odd number of blocks (llama/8549)
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix odd blocks for ARM_NEON (llama/8556)
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix q4_1
* ggml : fix q5_0
* ggml : fix q5_1
* ggml : fix iq4_nl metal
ggml-ci
* ggml : fix q4_0
* ggml : fix q8_0
ggml-ci
* ggml : remove special Q4_0 code for first 2 blocks
* ggml : fix sumf redefinition
---------
Co-authored-by: slaren <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]
ggml : add friendlier error message to fopen errors (llama/8575)
* Add additional error information when model files fail to load.
* Adding additional error information to most instances of fopen.
Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]
CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)
65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]
cmake : install all ggml public headers (llama/8480)
Co-authored-by: 65a <redacted>
hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]
Add Ascend NPU backend (llama/6035)
* [CANN] Add Ascend NPU backend
Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.
CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.
Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces
* Modify the code based on review comment
* Rename LLAMA_CANN to GGML_CANN
* Make ggml-common.h private
* add ggml_cann prefix for acl funcs
* Add logging for CANN backend
* Delete Trailing whitespace
---------
Co-authored-by: wangshuai09 <redacted>
Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]
make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)
Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]
Refactor lora adapter support (llama/8332)
* lora: load to devide buft
* add patch tensor function
* correct tensor patch
* llama_lora_adapter_apply
* correct ggml_backend_tensor_copy
* add llm_build_mm
* fix auto merge
* update based on review comments
* add convert script
* no more transpose A
* add f16 convert
* add metadata check
* add sanity check
* fix ftype
* add requirements
* fix requirements
* fix outfile
* conversion: only allow selected models
* fix types
* cuda : do not use dmmv if the tensor does not have enough cols
* llama : lora fixes
* do not disable mmap with lora
Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id
* convert_lora : MoE LoRA conversion support
* convert_lora : prefer safetensors, similarly to convert_hf
* convert_hf : simplify modify_tensors for InternLM2
* convert_lora : lazy conversion
* llama : load and use alpha from LoRA adapters
* llama : use llm_build_lora_mm in most model graphs
* auto scale
* Revert "auto scale"
This reverts commit
42415a4874e0f963e4aca6796ea5dfb97cd17464 .
* remove redundant params
* Apply suggestions from code review
Co-authored-by: slaren <redacted>
* change kv metadata
* move add_type to __init__
* convert_hf : move add_type to main()
* convert_lora : use the GGUFWriter from Model instead of overwriting it
---------
Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>
Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]
ggml : suppress unknown pragma 'GCC' on windows (llama/8460)
This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:
```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```
Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]
add concat through dim 1/2 (llama/8483)
* add concat through dim 1/2
0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]
Vulkan MMQ Fix (llama/8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter
* Fix Vulkan op result checker build error
bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]
vulkan : cmake integration (llama/8119)
* Add Vulkan to CMake pkg
* Add Sycl to CMake pkg
* Add OpenMP to CMake pkg
* Split generated shader file into separate translation unit
* Add CMake target for Vulkan shaders
* Update README.md
* Add make target for Vulkan shaders
* Use pkg-config to locate vulkan library
* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow
* Clean up tabs
* Move sudo to apt-key invocation
* Forward GGML_EXTRA_LIBS to CMake config pkg
* Update vulkan obj file paths
* Add shaderc to nix pkg
* Add python3 to Vulkan nix build
* Link against ggml in cmake pkg
* Remove Python dependency from Vulkan build
* code review changes
* Remove trailing newline
* Add cflags from pkg-config to fix w64devkit build
* Update README.md
* Remove trailing whitespace
* Update README.md
* Remove trailing whitespace
* Fix doc heading
* Make glslc required Vulkan component
* remove clblast from nix pkg
Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]
metal : template-ify some of the kernels (llama/8447)
ggml-ci
Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]
ggml : minor naming changes (llama/8433)
* ggml : minor naming changes
ggml-ci
* ggml : use PRId64 [no ci]
* ggml : revert FA K/Q names
Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]
fix the mul_mat_id ut issues (llama/8427)
* fix part of mul_mat_id
* skip the bfloat 16 sycl ut
Signed-off-by: Chen Xi <redacted>
---------
Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>
Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]
ggml : add NVPL BLAS support (#8329) (llama/8425)
* ggml : add NVPL BLAS support
* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`
---------
Co-authored-by: ntukanov <redacted>
Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]
cuda : suppress 'noreturn' warn in no_device_code (llama/8414)
* cuda : suppress 'noreturn' warn in no_device_code
This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:
```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```
The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.
Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code
Update __trap macro instead of using a while loop to suppress the
warning.
Signed-off-by: Daniel Bevenius <redacted>
---------
Signed-off-by: Daniel Bevenius <redacted>