]>
git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log
jettoblack [Tue, 29 Oct 2024 06:47:21 +0000 (02:47 -0400)]
whisper : move new-segment callback after DTW step (#2515)
KITAITI Makoto [Tue, 29 Oct 2024 06:45:37 +0000 (15:45 +0900)]
ruby : fix installation test (#2519)
KITAITI Makoto [Mon, 28 Oct 2024 17:23:23 +0000 (02:23 +0900)]
ruby : add more APIs (#2518)
* Add test for built package existence
* Add more tests for Whisper::Params
* Add more Whisper::Params attributes
* Add tests for callbacks
* Add progress and abort callback features
* [skip ci] Add prompt usage in README
* Change prompt text in example
KITAITI Makoto [Mon, 28 Oct 2024 13:43:27 +0000 (22:43 +0900)]
ruby : support new-segment callback (#2506)
* Add Params#new_segment_callback= method
* Add tests for Params#new_segment_callback=
* Group tests for #transcribe
* Don't use static for thread-safety
* Set new_segment_callback only when necessary
* Remove redundant check
* [skip ci] Add Ruby version README
* Revert "Group tests for #transcribe"
This reverts commit
71b65b00ccf1816c9ea8a247fb30f71bc09707d3 .
* Revert "Add tests for Params#new_segment_callback="
This reverts commit
81e6df3bab7662da5379db51f28a989db7408c02 .
* Add test for Context#full_n_segments
* Add Context#full_n_segments
* Add tests for lang API
* Add lang API
* Add tests for Context#full_lang_id API
* Add Context#full_lang_id
* Add abnormal test cases for lang
* Raise appropriate errors from lang APIs
* Add tests for Context#full_get_segment_t{0,1} API
* Add Context#full_get_segment_t{0,1}
* Add tests for Context#full_get_segment_speaker_turn_next API
* Add Context#full_get_segment_speaker_turn_next
* Add tests for Context#full_get_segment_text
* Add Context#full_get_setgment_text
* Add tests for Params#new_segment_callback=
* Run new segment callback
* Split tests to multiple files
* Use container struct for new segment callback
* Add tests for Params#new_segment_callback_user_data=
* Add Whisper::Params#new_user_callback_user_data=
* Add GC-related test for new segment callback
* Protect new segment callback related structs from GC
* Add meaningful test for build
* Rename: new_segment_callback_user_data -> new_segment_callback_container
* Add tests for Whisper::Segment
* Add Whisper::Segment and Whisper::Context#each_segment
* Extract c_ruby_whisper_callback_container_allocate()
* Add test for Whisper::Params#on_new_segment
* Add Whisper::Params#on_new_egment
* Assign symbol IDs to variables
* Make extsources.yaml simpler
* Update README
* Add document comments
* Add test for calling Whisper::Params#on_new_segment multiple times
* Add file dependencies to GitHub actions config and .gitignore
* Add more files to ext/.gitignore
KITAITI Makoto [Mon, 28 Oct 2024 11:08:09 +0000 (20:08 +0900)]
ruby : add Metal support (#2516)
Josscii [Wed, 23 Oct 2024 12:14:03 +0000 (20:14 +0800)]
whisper : fix index overflow in token-level timestamp logic (#2505)
toboil-features [Thu, 17 Oct 2024 10:25:18 +0000 (13:25 +0300)]
readme : update links and make commands (#2489)
* Update links to headers in README.md
* Add link to Vulkan section in README.md
* Add "-j" for parallelism for "make" in README.md
* Update README.md
KITAITI Makoto [Wed, 16 Oct 2024 15:44:04 +0000 (00:44 +0900)]
ruby : fix bindings (#2484)
* Improve Rakefile
* Remove intermediate files
* Remove unnecessary manipulations from extconf.rb
* Add README and LINCENSE to source files
* Manage ext source files using YAML file
* Use extsources.yaml to include files into gem package file
* Add git-managed source files to build dependency
* Add test task
* Download model for test if not exists
* Add test for build
* Ignore gem package directory
* Enable GitHub action for Ruby binding
* Fix model name
* Build lib file for test
* Use extension for each platform
* Use extension for each platform on testing
* Move built lib file rather than copy
* Add intermediate files to clean targets
toboil-features [Wed, 16 Oct 2024 15:43:26 +0000 (18:43 +0300)]
readme : add Vulkan notice (#2488)
* Add Vulkan notice in README.md
* Fix formatting for Vulkan section in README.md
* Fix formatting in README.md
Georgi Gerganov [Wed, 16 Oct 2024 15:42:47 +0000 (18:42 +0300)]
make : fix GGML_VULKAN=1 build (#2485)
Rotem Dan [Tue, 15 Oct 2024 18:00:21 +0000 (21:00 +0300)]
whisper : add dtw preset for large-v3-turbo (#2481)
CrispStrobe [Mon, 14 Oct 2024 07:46:33 +0000 (09:46 +0200)]
convert : handle max_target_positions (#2477)
as needed eg for
https://huggingface.co/primeline/whisper-large-v3-turbo-german/blob/main/config.json
Salman Faroz [Mon, 14 Oct 2024 07:44:57 +0000 (13:14 +0530)]
readme : update the Quick Start section (#2475)
navigating into the directory
Sandro Hanea [Tue, 8 Oct 2024 17:08:00 +0000 (19:08 +0200)]
whisper : add OpenVINO init with state (#2464)
* Fixed OpenVino init on state
* Removed an empty line
* Fixed typo
* Replaced tabs with spaces
---------
Co-authored-by: Sandro Hanea <redacted>
Georgi Gerganov [Mon, 7 Oct 2024 10:06:48 +0000 (13:06 +0300)]
release : v1.7.1
SRHMorris [Sun, 6 Oct 2024 07:34:20 +0000 (08:34 +0100)]
vulkan : retry allocation with fallback flags (#2451)
Co-authored-by: Samuel Morris <redacted>
Georgi Gerganov [Sat, 5 Oct 2024 13:43:26 +0000 (16:43 +0300)]
release : v1.7.0
Georgi Gerganov [Sat, 5 Oct 2024 13:22:53 +0000 (16:22 +0300)]
scripts : bench v3-turbo
Georgi Gerganov [Sat, 5 Oct 2024 13:13:03 +0000 (16:13 +0300)]
whisper : remove mel leftover constants (
396089f )
Georgi Gerganov [Sat, 5 Oct 2024 12:22:17 +0000 (15:22 +0300)]
whisper : zero-out the KV cache upon clear (#2445)
Georgi Gerganov [Sat, 5 Oct 2024 12:18:50 +0000 (15:18 +0300)]
objc : fix build
Georgi Gerganov [Sat, 5 Oct 2024 11:33:54 +0000 (14:33 +0300)]
metal : zero-init buffer contexts (#0)
Georgi Gerganov [Sat, 5 Oct 2024 11:29:45 +0000 (14:29 +0300)]
whisper : revert mel-related changes (#0)
too much extra logic and complexity for small benefit
Georgi Gerganov [Sat, 5 Oct 2024 10:14:03 +0000 (13:14 +0300)]
whisper : adapt to latest ggml (skip) (#0)
Daniel Bevenius [Fri, 4 Oct 2024 13:46:18 +0000 (15:46 +0200)]
ggml : fix typo in example usage ggml_gallocr_new (ggml/984)
Diego Devesa [Fri, 4 Oct 2024 06:41:40 +0000 (08:41 +0200)]
ggml : fixes after sync (ggml/983)
ggml : remove test-backend-buffer
ggml : fix CUDA build warnings
Diego Devesa [Thu, 3 Oct 2024 18:25:11 +0000 (21:25 +0300)]
ggml-backend : add device and backend reg interfaces (llama/9707)
Also:
- metal : fix compute pass descriptor autorelease crash
- ggml-backend : add device description to CPU backend
- ggml: unify backend logging mechanism
Ouadie EL FAROUKI [Thu, 3 Oct 2024 06:50:44 +0000 (07:50 +0100)]
Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711)
Diego Devesa [Wed, 2 Oct 2024 23:49:47 +0000 (01:49 +0200)]
ggml-backend : add device and backend reg interfaces (llama/9707)
Co-authored-by: Johannes Gäßler <redacted>
Alberto Cabrera Pérez [Wed, 2 Oct 2024 12:57:18 +0000 (13:57 +0100)]
Initial cmake support of SYCL for AMD GPUs (llama/9658)
sycl: initial cmake support of SYCL for AMD GPUs
Radoslav Gerganov [Wed, 2 Oct 2024 10:49:16 +0000 (13:49 +0300)]
vulkan : do not use tensor->extra (llama/9407)
* vulkan : do not use tensor->extra
This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.
Ref: #8536
* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (llama/2)
---------
Co-authored-by: 0cc4m <redacted>
Johannes Gäßler [Thu, 3 Oct 2024 15:29:59 +0000 (17:29 +0200)]
ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)
Johannes Gäßler [Wed, 2 Oct 2024 13:32:39 +0000 (15:32 +0200)]
ggml: refactor cross entropy loss CPU impl. (ggml/976)
Georgi Gerganov [Sat, 5 Oct 2024 10:09:36 +0000 (13:09 +0300)]
scripts : sync ggml-backend.cpp
Georgi Gerganov [Sat, 5 Oct 2024 09:36:40 +0000 (12:36 +0300)]
whisper : fix excessive memory usage (#2443)
* whisper : fix KV cache allocation
* whisper : reduce memory overhead from unused input tensors
Rahul Vadhyar [Fri, 4 Oct 2024 08:04:51 +0000 (13:34 +0530)]
examples : update dr_wav.h to newer version (#2449)
Georgi Gerganov [Wed, 2 Oct 2024 12:14:46 +0000 (15:14 +0300)]
talk-llama : sync llama.cpp
Georgi Gerganov [Wed, 2 Oct 2024 12:12:16 +0000 (15:12 +0300)]
metal : reduce command encoding overhead (llama/9698)
Georgi Gerganov [Wed, 2 Oct 2024 12:11:43 +0000 (15:11 +0300)]
sync : ggml
Johannes Gäßler [Mon, 30 Sep 2024 07:55:23 +0000 (09:55 +0200)]
test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974)
Salvatore Mesoraca [Mon, 30 Sep 2024 07:14:09 +0000 (09:14 +0200)]
vulkan : mul_mat: fix UB with small warps (ggml/952)
When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.
The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.
When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.
We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).
Signed-off-by: Salvatore Mesoraca <redacted>
Borislav Stanimirov [Mon, 30 Sep 2024 07:11:41 +0000 (10:11 +0300)]
ggml : fix ggml_cast (ggml/973)
Johannes Gäßler [Sun, 29 Sep 2024 21:18:02 +0000 (23:18 +0200)]
ggml: fix gradient allocation logic (ggml/966)
* ggml: fix gradient allocation logic
* gradient allocation in ggml_build_backward_expand
* fixup
* fix test-backend-ops grad
* suggestions by slaren
* fix test1.c
* fix legacy opt API
* fix test-grad0
* remove keep arg
Georgi Gerganov [Sun, 29 Sep 2024 18:18:23 +0000 (21:18 +0300)]
ggml : define missing HWCAP flags (llama/9684)
ggml-ci
Co-authored-by: Willy Tarreau <redacted>
Dan Johansson [Sat, 28 Sep 2024 12:06:16 +0000 (14:06 +0200)]
ggml : add run-time detection of neon, i8mm and sve (llama/9331)
* ggml: Added run-time detection of neon, i8mm and sve
Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.
* ggml: Extend feature detection to include non aarch64 Arm arch
* ggml: Move definition of ggml_arm_arch_features to the global data section
Markus Tavenrath [Sat, 28 Sep 2024 10:05:05 +0000 (12:05 +0200)]
Enable use to the rebar feature to upload buffers to the device. (llama/9251)
R0CKSTAR [Thu, 26 Sep 2024 01:27:40 +0000 (09:27 +0800)]
mtgpu: enable VMM (llama/9597)
Signed-off-by: Xiaodong Ye <redacted>
Charles Xu [Wed, 25 Sep 2024 13:12:20 +0000 (15:12 +0200)]
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217)
* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels
* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.
* fix for build errors
* remove prints from the low-level code
* Rebase to the latest upstream
Dou Xinpeng [Wed, 25 Sep 2024 03:30:38 +0000 (11:30 +0800)]
cann: fix crash when llama-bench is running on multiple cann devices (llama/9627)
Johannes Gäßler [Sun, 29 Sep 2024 17:56:17 +0000 (19:56 +0200)]
CUDA: remove bad assert (ggml/972)
Jeff Bolz [Sun, 29 Sep 2024 16:50:17 +0000 (11:50 -0500)]
vulkan : multithread pipeline creation (ggml/963)
Jeff Bolz [Fri, 27 Sep 2024 07:58:01 +0000 (02:58 -0500)]
vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961)
Salvatore Mesoraca [Thu, 26 Sep 2024 06:59:42 +0000 (08:59 +0200)]
vulkan : argsort barriers must be under uniform control flow (ggml/951)
a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).
BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.
Signed-off-by: Salvatore Mesoraca <redacted>
Georgi Gerganov [Tue, 24 Sep 2024 10:23:59 +0000 (13:23 +0300)]
ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969)
gilbertgong [Wed, 2 Oct 2024 12:06:40 +0000 (05:06 -0700)]
server : ffmpeg overwrite leftover temp file (#2431)
* Remove possible leftover ffmpeg temp file from a previous failed conversion
* Revert "Remove possible leftover ffmpeg temp file from a previous failed conversion"
This reverts commit
00797403bd43ebcb1e0678989a4fc676d417b4af .
* Flag to force ffmpeg to overwrite output file if it exists
Georgi Gerganov [Tue, 1 Oct 2024 12:57:06 +0000 (15:57 +0300)]
whisper : add large-v3-turbo (#2440)
Georgi Gerganov [Fri, 27 Sep 2024 08:48:33 +0000 (11:48 +0300)]
tests : remove test-backend-ops (#2434)
Georgi Gerganov [Wed, 25 Sep 2024 07:03:34 +0000 (10:03 +0300)]
ci : disable failing CUDA and Java builds
Hugo [Tue, 24 Sep 2024 18:07:51 +0000 (20:07 +0200)]
readme : fix references to download-ggml-model.sh (#2427)
The script itself has a hashbang indicating that it is a shell script,
but the README indicates that it must be executed with `bash`.
I checked the script itself, and it seems to be valid POSIX shell. I can
confirm that it works with busybox sh.
Clarify the reference on the README, so it is clear that bash is not
actually a dependency for this script.
Georgi Gerganov [Tue, 24 Sep 2024 11:15:09 +0000 (14:15 +0300)]
make : remove "talk" target until updated
Georgi Gerganov [Tue, 24 Sep 2024 10:27:33 +0000 (13:27 +0300)]
ggml : add ggml-cpu-impl.h (skip) (#0)
Georgi Gerganov [Tue, 24 Sep 2024 10:23:04 +0000 (13:23 +0300)]
sync : ggml
Georgi Gerganov [Tue, 24 Sep 2024 10:22:55 +0000 (13:22 +0300)]
talk-llama : sync llama.cpp
Eric Zhang [Tue, 24 Sep 2024 08:03:21 +0000 (16:03 +0800)]
ggml : add AVX512DQ requirement for AVX512 builds (llama/9622)
Georgi Gerganov [Tue, 24 Sep 2024 07:15:35 +0000 (10:15 +0300)]
log : add CONT level for continuing previous log entry (llama/9610)
Max Krasnyansky [Tue, 24 Sep 2024 04:18:48 +0000 (21:18 -0700)]
threads: fix msvc build without openmp (llama/9615)
We're missing atomic_thread_fence() in MSVC builds when openmp is disabled.
Ivan [Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)]
cuda: add q8_0->f32 cpy operation (llama/9571)
llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
Max Krasnyansky [Mon, 23 Sep 2024 18:42:43 +0000 (11:42 -0700)]
threads: improve ggml_barrier scaling with large number of threads (llama/9598)
Make sure n_barrier and n_barrier_passed do not share the cache line to avoid cache line bouncing.
This optimization shows performance improvements even for n_threads <= 8 cases.
Resurect TSAN (Thread Sanitizer) check so that we can avoid doing expensive read-modify-write
in the normal case and just use thread-fence as originally intended.
Srihari-mcw [Mon, 23 Sep 2024 14:06:38 +0000 (19:36 +0530)]
ggml : AVX512 gemm for Q4_0_8_8 (llama/9532)
* AVX512 version of ggml_gemm_q4_0_8x8_q8_0
* Remove zero vector parameter passing
* Rename functions and rearrange order of macros
* Edit commments
* style : minor adjustments
* Update x to start from 0
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Mon, 23 Sep 2024 08:27:47 +0000 (11:27 +0300)]
metal : use F32 prec for K*Q in vec FA (llama/9595)
ggml-ci
Akarshan Biswas [Mon, 23 Sep 2024 03:28:06 +0000 (08:58 +0530)]
Revert "[SYCL] fallback mmvq (ggml/9088)" (llama/9579)
This reverts commit
50addec9a532a6518146ab837a85504850627316 .
R0CKSTAR [Sun, 22 Sep 2024 14:55:49 +0000 (22:55 +0800)]
musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)
* mtgpu: add mp_21 support
Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas
Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable unified memory
Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)
Signed-off-by: Xiaodong Ye <redacted>
---------
Signed-off-by: Xiaodong Ye <redacted>
Molly Sophia [Sun, 22 Sep 2024 13:26:50 +0000 (21:26 +0800)]
Fix merge error in #9454 (llama/9589)
Signed-off-by: Molly Sophia <redacted>
Johannes Gäßler [Sun, 22 Sep 2024 07:34:52 +0000 (09:34 +0200)]
CUDA: enable Gemma FA for HIP/Pascal (llama/9581)
Molly Sophia [Sun, 22 Sep 2024 02:29:12 +0000 (10:29 +0800)]
RWKV v6: RWKV_WKV op CUDA implementation (llama/9454)
* ggml: CUDA unary op EXP
Signed-off-by: Molly Sophia <redacted>
* ggml: rwkv_wkv op CUDA impl
Signed-off-by: Molly Sophia <redacted>
---------
Signed-off-by: Molly Sophia <redacted>
slaren [Sat, 21 Sep 2024 12:24:23 +0000 (14:24 +0200)]
ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573)
agray3 [Sat, 21 Sep 2024 00:41:07 +0000 (01:41 +0100)]
Update CUDA graph on scale change plus clear nodes/params (llama/9550)
* Avoid using saved CUDA graph if scale changes and reset nodes/params on update
Fixes https://github.com/ggerganov/llama.cpp/issues/9451
* clear before resize
Georgi Gerganov [Fri, 20 Sep 2024 18:50:16 +0000 (21:50 +0300)]
examples : adapt to ggml.h changes (ggml/0)
ggml-ci
Georgi Gerganov [Fri, 20 Sep 2024 18:24:06 +0000 (21:24 +0300)]
ggml : refactoring (llama/#0)
-
d6a04f87
-
23e0d70b
Georgi Gerganov [Fri, 20 Sep 2024 17:12:52 +0000 (20:12 +0300)]
ggml : fix builds (llama/0)
ggml-ci
Georgi Gerganov [Fri, 20 Sep 2024 16:13:02 +0000 (19:13 +0300)]
ggml : fix trailing whitespace (llama/0)
ggml-ci
Johannes Gäßler [Fri, 20 Sep 2024 16:35:35 +0000 (18:35 +0200)]
CUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562)
slaren [Wed, 18 Sep 2024 17:13:08 +0000 (19:13 +0200)]
ggml : fix n_threads_cur initialization with one thread (llama/9538)
* ggml : fix n_threads_cur initialization with one thread
* Update ggml/src/ggml.c
---------
Co-authored-by: Max Krasnyansky <redacted>
Max Krasnyansky [Tue, 17 Sep 2024 08:19:46 +0000 (01:19 -0700)]
threadpool : skip polling for unused threads (llama/9461)
* threadpool: skip polling for unused threads
Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1).
This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur).
n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written
from one thread and read from other threads (not a race conditions).
* threadpool: further simplify and improve ggml_barrier
Avoid using strict memory order while polling, yet make sure that all threads go through
full memory barrier (memory fence) on ggml_barrier entrace and exit.
* threads: add simple barrier test
This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead.
* threadpool: improve thread sync for new-graphs
Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order
to keep it efficient, once the new graph is detected we do full fence using read-modify-write
with strict memory order.
* threadpool: improve abort handling
Do not use threadpool->ec (exit code) to decide whether to exit the compute loop.
threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it.
Instead introduce atomic threadpool->abort flag used for this. This is consistent with
how we handle threadpool->stop or pause.
While at it add an explicit atomic_load for n_threads_cur for consistency.
* test-barrier: release threadpool before releasing the context
fixes use-after-free detected by gcc thread-sanitizer on x86-64
for some reason llvm sanitizer is not detecting this issue.
Michael Podvitskiy [Mon, 16 Sep 2024 11:06:50 +0000 (13:06 +0200)]
ggml : link MATH_LIBRARY not by its full path (llama/9339)
Georgi Gerganov [Mon, 16 Sep 2024 07:27:50 +0000 (10:27 +0300)]
cmake : do not hide GGML options + rename option (llama/9465)
* cmake : do not hide GGML options
ggml-ci
* build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS
for consistency
ggml-ci
Eve [Mon, 16 Sep 2024 06:48:24 +0000 (06:48 +0000)]
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)
* squashed
readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049
have ggml_vec_dot_q4_0 do two blocks per loop for avx
try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue
* shuffle
* remove f16c iq4_nl as i cant make it faster than before
Georgi Gerganov [Mon, 16 Sep 2024 06:05:56 +0000 (09:05 +0300)]
metal : handle zero-sized allocs (llama/9466)
Georgi Gerganov [Sun, 15 Sep 2024 17:46:12 +0000 (20:46 +0300)]
common : reimplement logging (llama/9418)
https://github.com/ggerganov/llama.cpp/pull/9418
Michael Podvitskiy [Sun, 15 Sep 2024 16:55:52 +0000 (18:55 +0200)]
cmake : correct order of sycl flags (llama/9497)
Michael Podvitskiy [Sun, 15 Sep 2024 07:06:38 +0000 (09:06 +0200)]
cmake : try to fix sycl+intel build (llama/9487)
Yuri Khrustalev [Sat, 14 Sep 2024 09:54:37 +0000 (05:54 -0400)]
ggml : ggml_type_name return "NONE" for invalid values (llama/9458)
When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.
Georgi Gerganov [Sat, 14 Sep 2024 07:55:05 +0000 (10:55 +0300)]
cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9463)
* cmake : use list(APPEND ...) instead of set() + dedup linker
ggml-ci
* cmake : try fix sycl
* cmake : try to fix sycl 2
* cmake : fix sycl build (llama/9469)
* try fix sycl build
* use CMAKE_CXX_FLAGS as a string variable
---------
Co-authored-by: Georgi Gerganov <redacted>
* one more CMAKE_CXX_FLAGS fix (llama/9471)
---------
Co-authored-by: Michael Podvitskiy <redacted>
Dou Xinpeng [Thu, 12 Sep 2024 11:46:43 +0000 (19:46 +0800)]
cann: Add host buffer type for Ascend NPU (llama/9406)
* feat: Add host buffer type for Ascend NPU(CANN backend)
* fix some checking errors
* Add a few comments
Ahmad Tameem [Thu, 12 Sep 2024 11:24:31 +0000 (16:24 +0500)]
riscv : modify Makefile and add a RISCV_VECT to print log info (llama/9442)
- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V
Xinpeng Dou [Thu, 12 Sep 2024 01:02:35 +0000 (09:02 +0800)]
cann: Fix error when running a non-exist op (llama/9424)
Johannes Gäßler [Wed, 11 Sep 2024 08:22:40 +0000 (10:22 +0200)]
CUDA: fix --split-mode row race condition (llama/9413)
R0CKSTAR [Wed, 11 Sep 2024 01:46:55 +0000 (09:46 +0800)]
musa: remove Clang builtins mapping (llama/9421)
Signed-off-by: Xiaodong Ye <redacted>
Alberto Cabrera Pérez [Wed, 11 Sep 2024 00:53:42 +0000 (01:53 +0100)]
sycl : update support conditions (llama/9394)
* sycl : update support condition to im2col
Signed-off-by: Alberto Cabrera <redacted>
* Added TODO to remind supporting FP32 im2col
---------
Signed-off-by: Alberto Cabrera <redacted>
Georgi Gerganov [Tue, 10 Sep 2024 07:17:03 +0000 (10:17 +0300)]
metal : fix compile warning with GGML_METAL_NDEBUG (llama/0)