]>
git.djapps.eu Git - pkg/ggml/sources/ggml/log
Georgi Gerganov [Wed, 3 Jan 2024 09:25:18 +0000 (11:25 +0200)]
sync : llama.cpp
ggml-ci
Georgi Gerganov [Tue, 2 Jan 2024 19:07:47 +0000 (21:07 +0200)]
metal : optimize ggml_mul_mat_id (faster Mixtral PP) (llama/4725)
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (llama/4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
* metal : optimizing ggml_mul_mat_id (wip)
* metal : minor fix
* metal : opt mul_mm_id
Georgi Gerganov [Tue, 2 Jan 2024 08:57:44 +0000 (10:57 +0200)]
metal : enable shader debugging (cmake option) (llama/4705)
* ggml : disable fast-math for Metal (cmake build only)
ggml-ci
* metal : fix Metal API debug warnings
* cmake : add -fno-inline for Metal build (llama/4545)
* metal : fix API debug warnings
* metal : fix compile warnings
* metal : use uint64_t for strides
* cmake : rename option to LLAMA_METAL_SHADER_DEBUG
* metal : fix mat-vec Q8_0 kernel for BS > 1
* metal : normalize mat-vec kernel signatures
* cmake : respect LLAMA_QKK_64 option
* metal : fix mat-vec Q4_K kernel for QK_K == 64
ggml-ci
Georgi Gerganov [Sun, 31 Dec 2023 09:43:31 +0000 (11:43 +0200)]
ggml : add ggml_vdotq_s32 alias (llama/4715)
ggml-ci
Johannes Gäßler [Sat, 30 Dec 2023 12:52:01 +0000 (13:52 +0100)]
CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)
automaticcat [Sat, 30 Dec 2023 08:07:48 +0000 (15:07 +0700)]
ggml : add ggml_cpu_has_avx_vnni() (llama/4589)
* feat: add avx_vnni based on intel documents
* ggml: add avx vnni based on intel document
* llama: add avx vnni information display
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* docs: add more details about using oneMKL and oneAPI for intel processors
* Update ggml.c
Fix indentation upgate
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Johannes Gäßler [Fri, 29 Dec 2023 22:12:53 +0000 (23:12 +0100)]
CUDA: fix tensor core logic for Pascal and HIP (llama/4682)
hydai [Fri, 29 Dec 2023 16:31:19 +0000 (00:31 +0800)]
cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)
Signed-off-by: hydai <redacted>
Georgi Gerganov [Wed, 3 Jan 2024 09:24:48 +0000 (11:24 +0200)]
scripts : fix sync order + metal sed
Ashraful Islam [Wed, 3 Jan 2024 09:07:30 +0000 (03:07 -0600)]
swift : add Swift Package declaration (#674)
* feat: adds swift package declaration
- allows importing ggml as package dependency in llama.cpp and whisper.cpp
- resolves issues with duplicate symbol error when importing llama.cpp and whisper.cpp as package dependency
* fixes the src paths in package.swift
Guillaume Wenzek [Fri, 29 Dec 2023 17:07:03 +0000 (18:07 +0100)]
ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (#639)
* add more int ops
* ggml_compute_forward_dup_bytes
* add tests
* PR comments
* tests : minor indentations
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 29 Dec 2023 13:17:18 +0000 (15:17 +0200)]
scripts : do not sync synced commits
Georgi Gerganov [Fri, 29 Dec 2023 13:08:38 +0000 (15:08 +0200)]
sync : whisper.cpp
Tamotsu Takahashi [Fri, 29 Dec 2023 10:23:27 +0000 (19:23 +0900)]
ci : build with CLBlast + ggml-opencl use GGML_API (whisper/1576)
* Build with CLBlast
* Declare GGML_API
After rebasing, examples/talk-llama failed:
"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
Georgi Gerganov [Fri, 29 Dec 2023 13:04:01 +0000 (15:04 +0200)]
scripts : print list of sync commits
Georgi Gerganov [Fri, 29 Dec 2023 12:58:01 +0000 (14:58 +0200)]
scripts : fix format-patch range
Georgi Gerganov [Fri, 29 Dec 2023 12:32:00 +0000 (14:32 +0200)]
scripts : do not sync commits from this repo
Georgi Gerganov [Wed, 27 Dec 2023 10:07:02 +0000 (12:07 +0200)]
sync : whisper.cpp
ggml-ci
bobqianic [Sat, 23 Dec 2023 12:02:58 +0000 (12:02 +0000)]
whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG (whisper/1681)
Georgi Gerganov [Wed, 27 Dec 2023 09:56:42 +0000 (11:56 +0200)]
scripts : add sync-whisper-am.sh
bssrdf [Fri, 29 Dec 2023 08:32:31 +0000 (03:32 -0500)]
ggml : fix some mul mat cases + add tests for src1 F16 (#669)
* fixed mul-mat error for old GPUs
* style fixes
* add mul mat src1 f16 test cases, fix more cases
ggml-ci
---------
Co-authored-by: bssrdf <redacted>
Co-authored-by: slaren <redacted>
Astariul [Fri, 29 Dec 2023 07:43:41 +0000 (16:43 +0900)]
gpt-2 : update README.md (#671)
Mayank Kumar Pal [Fri, 29 Dec 2023 07:35:19 +0000 (13:05 +0530)]
readme : add steps for Android compilation and inference (#664)
* Update README.md for android instructions
* fix minor typo
Georgi Gerganov [Wed, 27 Dec 2023 09:42:45 +0000 (11:42 +0200)]
scripts : fix sed in sync-llama.am.sh
Georgi Gerganov [Wed, 27 Dec 2023 09:06:32 +0000 (11:06 +0200)]
sync : llama.cpp
ggml-ci
Georgi Gerganov [Wed, 27 Dec 2023 09:02:13 +0000 (11:02 +0200)]
ggml : fix dot product for ARM (llama/4630)
ggml-ci
slaren [Tue, 26 Dec 2023 20:23:59 +0000 (21:23 +0100)]
cuda : fix vmm pool with multi GPU (llama/4620)
* cuda : fix vmm pool with multi GPU
* hip
* use recommended granularity instead of minimum
* better error checking
* fix mixtral
* use cudaMemcpy3DPeerAsync
* use cuda_pool_alloc in ggml_cuda_op_mul_mat
* consolidate error checking in ggml_cuda_set_device
* remove unnecessary inlines
ggml-ci
* style fixes
* only use vmm for the main device
* fix scratch buffer size, re-enable vmm pool for all devices
* remove unnecessary check id != g_main_device
WillCorticesAI [Tue, 26 Dec 2023 10:42:08 +0000 (05:42 -0500)]
Update comment for AdamW implementation reference. (llama/4604)
Co-authored-by: Will Findley <redacted>
FantasyGmm [Tue, 26 Dec 2023 10:38:36 +0000 (18:38 +0800)]
Fix new CUDA10 compilation errors (llama/4635)
Georgi Gerganov [Mon, 25 Dec 2023 09:25:19 +0000 (11:25 +0200)]
sync : llama.cpp
ggml-ci
Georgi Gerganov [Mon, 25 Dec 2023 09:00:39 +0000 (11:00 +0200)]
cmake : update CUDA build to support VMM
slaren [Sun, 24 Dec 2023 13:34:22 +0000 (14:34 +0100)]
cuda : improve cuda pool efficiency using virtual memory (llama/4606)
* cuda : improve cuda pool efficiency using virtual memory
* fix mixtral
* fix cmake build
* check for vmm support, disable for hip
ggml-ci
* fix hip build
* clarify granularity
* move all caps to g_device_caps
* refactor error checking
* add cuda_pool_alloc, refactor most pool allocations
ggml-ci
* fix hip build
* CUBLAS_TF32_TENSOR_OP_MATH is not a macro
* more hip crap
* llama : fix msvc warnings
* ggml : fix msvc warnings
* minor
* minor
* cuda : fallback to CPU on host buffer alloc fail
* Update ggml-cuda.cu
Co-authored-by: Johannes Gäßler <redacted>
* Update ggml-cuda.cu
Co-authored-by: Johannes Gäßler <redacted>
* ensure allocations are always aligned
* act_size -> actual_size
---------
Co-authored-by: Johannes Gäßler <redacted>
slaren [Sat, 23 Dec 2023 15:10:51 +0000 (16:10 +0100)]
fallback to CPU buffer if host buffer alloc fails (llama/4610)
Johannes Gäßler [Sat, 23 Dec 2023 08:16:33 +0000 (09:16 +0100)]
CUDA: fixed row rounding for 0 tensor splits (llama/4594)
Georgi Gerganov [Mon, 25 Dec 2023 08:58:54 +0000 (10:58 +0200)]
scripts : fix PR number parsing during sync
Georgi Gerganov [Sun, 24 Dec 2023 13:49:12 +0000 (15:49 +0200)]
scripts : improve llama sync patch
Georgi Gerganov [Sat, 23 Dec 2023 16:05:29 +0000 (18:05 +0200)]
scripts : sync tests / headers
Georgi Gerganov [Sat, 23 Dec 2023 15:54:42 +0000 (17:54 +0200)]
scripts : remove exit
Georgi Gerganov [Sat, 23 Dec 2023 15:54:07 +0000 (17:54 +0200)]
scripts : fix PR number sed
Georgi Gerganov [Sat, 23 Dec 2023 15:49:08 +0000 (17:49 +0200)]
scripts : add sync-llama-am.sh
Georgi Gerganov [Fri, 22 Dec 2023 15:53:50 +0000 (17:53 +0200)]
sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec) (#662)
* sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec)
ggml-ci
* ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203)
* llama : fix platforms without mmap (#4578)
* llama : fix platforms without mmap
* win32 : limit prefetch size to the file size
* fix win32 error clobber, unnecessary std::string in std::runtime_error
* ggml-alloc : fix ggml_tallocr_is_own
* whisper : minor
* ggml : cuda jetson + arm quants warnings
ggml-ci
---------
Co-authored-by: Herman Semenov <redacted>
Co-authored-by: slaren <redacted>
slaren [Mon, 18 Dec 2023 17:05:43 +0000 (18:05 +0100)]
cuda : fix synchronization with tensor get/set (#659)
leejet [Mon, 18 Dec 2023 16:46:10 +0000 (00:46 +0800)]
cuda : fix im2col_f32_f16 (#658)
chengchi [Thu, 14 Dec 2023 08:12:31 +0000 (16:12 +0800)]
cmake : change installation path of static libraries to standard directory (#645)
Co-authored-by: Cheng-Chi Wang <redacted>
Georgi Gerganov [Wed, 13 Dec 2023 19:53:20 +0000 (21:53 +0200)]
sync : llama (mul_mat_id + get_rows kernels, typos) (#649)
* sync : llama (mul_mat_id + get_rows kernels, typos)
ggml-ci
* cuda : restore im2col
ggml-ci
* metal : fix accuracy of dequantization kernels
* cuda : restore correct im2col kernel
ggml-ci
* metal : fix moe test by reducing the expert size
ggml-ci
* cuda : fix bin bcast when src1 and dst have different types
---------
Co-authored-by: slaren <redacted>
Steward Garcia [Wed, 13 Dec 2023 14:08:48 +0000 (09:08 -0500)]
ggml: new gpu kernels + extends ggml_leaky_relu + ggml_pad (#621)
* add new cuda kernels and new op ggml_pad
* add ggml_tanh cuda kernel
* remove old broadcast impl
* restore some changes
* cuda: optimized im2col + group_norm kernels
* extent ggml_leaky -> ggml_leaky_relu
* fix some code issues
* cuda: concat support 4 dims
* cuda: fix ggml_acc + add backends ops test
* restore ggml_pad + add backend op test
* metal : implement GGML_OP_ACC
* ggml : fix bug in ggml_upscale
* metal : add ggml_upscale
* metal : add ggml_tanh
* metal : add ggml_gelu_quick
* ggml : make ggml_pad more general purpose
* metal : add ggml_pad
* ggml_leaky_relu as regular op + fix identation
* cuda: ggml_acc admit all op_parms
* negative_slope better pass param
* metal : add ggml_leaky_relu
* metal : add ggml_group_norm
* cuda : minor
* ggml : add GGML_OP_LEAKY_RELU to ggml_compute_backward
* metal : soft max, tanh, supports_op fixes
* test-backend-ops : add sentinels between tensors to detect overflows
---------
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>
ariez-xyz [Wed, 13 Dec 2023 12:01:31 +0000 (13:01 +0100)]
gguf : document Mixtral changes in spec (#646)
* add new tensor names
* add new keys
* fix tensor names
* gguf : change wording a bit
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 8 Dec 2023 15:04:39 +0000 (17:04 +0200)]
sync : whisper.cpp (metal soft max fix + example prints)
Georgi Gerganov [Thu, 7 Dec 2023 20:26:34 +0000 (22:26 +0200)]
sync : llama.cpp (fused soft max, gpu cpy ops, etc.) (#640)
* sync : llama.cpp (fused soft max, gpu cpy ops, etc.)
ggml-ci
* cuda : restore accidentally deleted changes
ggml-ci
* cuda : fix rope + disable device-side dequantize
ggml-ci
* test-backend-ops : enable stablelm rope test
* cuda : remove rope assert
* sync.sh : add test-backend-ops
* ggml : fix ggml_concat + ggml_get_n_tasks logic
* sync : whisper.cpp
ggml-ci
* metal : fix assert
* ci : fix Metal path to shaders
ggml-ci
* whisper : fix bug if metal init fails
---------
Co-authored-by: slaren <redacted>
slaren [Thu, 7 Dec 2023 17:54:01 +0000 (18:54 +0100)]
ggml-backend : remove backend self-registration (#641)
slaren [Thu, 7 Dec 2023 08:51:46 +0000 (09:51 +0100)]
test-backend-ops : add performance eval mode + improve CUDA repeat and binary broadcast ops performance (#636)
* ggml-cuda : implement repeat with bin_bcast
* ggml-cuda : change supports_op for mul_mat to match compute_forward
* test-backend-ops : add performance eval mode
* improve formatting
* add sd test cases
* fix test case
* ggml-cuda : bin_bcast: better block sizes, two elements per thread
* metal : add dim3 broadcast support for mul mat
* cleanup
* typo
* metal : enable mul mat-vec for dim2 > 1
* metal : mul mat-vec support dim3 broadcasts
ggml-ci
* ggml-cuda : fix bin_bcast for ne0=1
ggml-ci
* ggml-cuda : limit block size z dim to 64
* test-backend-ops : add test cases
* test-backend-ops : add warmup run, print test type before trying to compute
* ggml-cuda : bin_bcast: collapse dimensions when possible, add fallback kernel for large tensors
ggml-ci
* test-backend-ops : avoid division by zero
---------
Co-authored-by: Georgi Gerganov <redacted>
slaren [Tue, 5 Dec 2023 15:12:15 +0000 (16:12 +0100)]
test-backend-ops : initialize ggml_argsort test with unique values to avoid ties (#634)
ggml-ci
Georgi Gerganov [Tue, 5 Dec 2023 13:17:48 +0000 (15:17 +0200)]
metal : check supported ops at runtime (#632)
* metal : check supported ops at runtime
* metal : remove TODOs
slaren [Tue, 5 Dec 2023 12:56:07 +0000 (13:56 +0100)]
ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k (#625)
* ggml : support broadcasting in dim 0 in add and mul
* add cuda add/mul broadcast impl
add configurable eps to cuda norm
* add metal impl
ggml-ci
* deduplicate code in cuda impl
* try to optimize cuda impl
* ggml : support broadcasting in ggml_div
* test-backend-ops : allow filtering by op and backend
* ggml-cuda : add ggml_div impl
* ggml : add ggml_mul_mat_id, ggml_sort, ggml_top_k (CPU only)
* fix ggml_div threads
* fix ggml_div with accelerate
* ggml_sort -> ggml_argsort
* whatever
* actually fix accelerate div
* disable opencl ci
* ci : disable ctest error check temporarily until we fix backend ops test
* cmake : propagete GGML_USE_xxx compile flags with ggml target
* whisper : utlize new ggml_add broadcast for dim 0
* cmake : adendum to
ee666ae9
* ggml_backend_graph_copy : fix leak
* ggml_cuda : add ggml_sum_rows impl
* metal : add ggml_div
* metal : add ggml_sum_rows
* ggml_cuda : add ggml_argsort impl
* move kernel
* metal : add ggml_argsort
* mul_mat_id : fix missing init task
* cuda/metal: fix argsort synchronization
* metal : add ggml_mul_mat_id
* ggml-cuda : add mul_mat_id for f16 + tensor cores
* test-backend-ops : add tests for quants mat mul
* ggml : fix q5_0 and q5_1 hist stats
* test-backend-ops : use smaller matrices to avoid automatic offloading, add mat-vec tests
* metal : fix alibi to match the CPU behavior
* metal : check dimensions in supports_op
* test-backend-ops : reduce error threshold for mat muls
* ggml-cuda : simplify dequantize funs, add supports_op by type for mul_mat_id
* ggml-cuda : support quantized types in mul_mat_id with cublas
* ggml-cuda : add fallback over CPU for mul_mat_id
* test-backend-ops : increase mul mat error threshold
* cleanup
ggml-ci
* test-backend-ops : fix usage
* cleanup
* ci : re-enable tests
* metal : fix compile warnings
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Tue, 5 Dec 2023 11:36:51 +0000 (13:36 +0200)]
readme : add link to seamless_comm repo
Judd [Tue, 5 Dec 2023 10:06:32 +0000 (18:06 +0800)]
ggml : disable `fprintf` when building with NDEBUG (#631)
Co-authored-by: Judd <redacted>
slaren [Fri, 1 Dec 2023 20:05:59 +0000 (21:05 +0100)]
ggml-cuda : fix usage without CUDA devices (#627)
RiverZhou [Fri, 1 Dec 2023 08:01:31 +0000 (16:01 +0800)]
cmake : add ROCm config (#626)
slaren [Thu, 30 Nov 2023 18:03:03 +0000 (19:03 +0100)]
ggml-backend update: buffer types, backend registry, graph compare, tests (#620)
* ggml-backend update
* update metal backend
* show metal logs with ggml-backend
* move buffer types to functions
* cuda: add per-device backends
* cuda: add host buffer type
* fix metal build
* ggml_backend_alloc_ctx_tensors : ignore allocated tensors
* ggml_backend_compare_graph_backend fixes
* ci : try to fix metal build
* metal : first print device info, then build kernels
* ci : disable GGML_METAL on Github Actions
* test-backend-ops initial impl (unary and get_rows)
* more op tests
* cleanup
* print test params, add more tests cases for add and mul
* add tests for im2col
* better f16 init
* metal : add basic impl of supports_op
* add test for ggml_concat
* update im2col test params, show callstack with GGML_ASSERT on CUDA failures
* add more rope tests
* add more rope and mul_mat test cases
* add more get_rows test cases
ggml-ci
* add more norm and rms_norm test cases with different eps
* ci : fix metal resource path
ggml-ci
* tests : silence warning
* add ggml_backend_tensor_alloc and ggml_backend_view_init for initializing tensors without ggml-alloc
* add mul_mat test cases without dims 3 and 4
ggml-ci
* check for nans and infs
ggml-ci
* add diag_mask_inf test cases without dims 3 and 4
ggml-ci
* fix cuda leak while backend reg
* fix msvc issues
* remove backend_sched debug causes by default
* gpt-2 : increase graph size
ggml-ci
---------
Co-authored-by: Georgi Gerganov <redacted>
magicse [Thu, 23 Nov 2023 10:07:50 +0000 (12:07 +0200)]
tests : update test-vec0.c for mingw (#619)
For correct building under mingw64
Georgi Gerganov [Thu, 23 Nov 2023 08:13:43 +0000 (10:13 +0200)]
readme : add vit.cpp (#618)
slaren [Sun, 19 Nov 2023 08:37:08 +0000 (09:37 +0100)]
gguf : add tokenizer.chat_template documentation (#616)
Guillaume Wenzek [Fri, 17 Nov 2023 12:24:25 +0000 (07:24 -0500)]
ggml : fix ggml_set_2d_inplace (#611)
Georgi Gerganov [Fri, 17 Nov 2023 08:12:58 +0000 (10:12 +0200)]
gguf : prevent out-of-bounds-access on invalid magic (close #614)
Georgi Gerganov [Fri, 17 Nov 2023 08:00:11 +0000 (10:00 +0200)]
sync : whisper.cpp (update whisper example + minor) (#613)
ggml-ci
Georgi Gerganov [Thu, 16 Nov 2023 15:06:55 +0000 (17:06 +0200)]
sync : llama.cpp (cuda, gguf and linker fixes)
slaren [Mon, 13 Nov 2023 15:19:49 +0000 (16:19 +0100)]
update examples and tests to use ggml_allocr_new_measure_from_backend (#608)
* update examples and tests to use ggml_allocr_new_measure_from_backend
* update comments
Georgi Gerganov [Mon, 13 Nov 2023 14:54:34 +0000 (16:54 +0200)]
sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) (#607)
ggml-ci
Georgi Gerganov [Sun, 12 Nov 2023 14:35:03 +0000 (16:35 +0200)]
sync : whisper.cpp (whisper full GPU, fix warnings) (#606)
* sync : whisper.cpp (whisper full GPU, fix warnings)
ggml-ci
* ci : enable CUDA / Metal
ggml-ci
* cuda : fallback to CPU for mul mat ne03 != ne13 (fix SAM + CUDA)
ggml-ci
Steward Garcia [Sun, 12 Nov 2023 13:34:04 +0000 (08:34 -0500)]
ggml : replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat (#564)
* added conv2d stage 0 - 1 cuda kernels
* add im2col + refactor conv1d and conv2d
* fix params invalid index
* add conv1d and conv2d unit tests
* resolving wrong values and fix mul_mat validation
* improve tests + reduce code duplication
* add cuda kernels
* more data test
* fix ggml_op_count to 70
* add temp test - gemm != mul_mat
* tests : fix test-mul-mat matrix multiplication
* test-mul-mat match gemm == ggml_mul_mat with conv2d op
* replaced gemm by ggml_mul_mat
* ggml_mul_mat cpu backend support fp16 src1
* ggml_mul_mat cuda backend fp16 fixed
* remove unnecessary ggml_cont and removed conv1d-2d functions deprecated
* some fixes
* explain conv1d reshapes
* ggml : fix tests on Arm + do not use BLAS for F16 data
* tests : fix FP16 handling on Arm
* ggml : avoid ggml_cont and ggml_transpose in ggml_conv_xd
* ci : switch back to release
* cuda : fix wrong pointer usage
* ggml : add metal support for im2col and f16xf16 mul mat
* ggml : im2col opts
* Update src/ggml-cuda.cu
Co-authored-by: slaren <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>
Georgi Gerganov [Fri, 3 Nov 2023 20:26:27 +0000 (22:26 +0200)]
sync : whisper.cpp (ARM 32-bit, abort callback, wav_writer, etc.) (#602)
Georgi Gerganov [Fri, 3 Nov 2023 08:08:17 +0000 (10:08 +0200)]
sync : llama.cpp (CUDA opts, ggml-quants, YARN, etc.) (#601)
ggml-ci
Jiří Podivín [Thu, 2 Nov 2023 19:28:11 +0000 (20:28 +0100)]
sam : passing parameters and simple prompt (#598)
- most of the model hyperparameters can now be set on CLI
- user can define their own mask prefix
- user can define their own point prompt, although just one
Signed-off-by: Jiri Podivin <redacted>
Jiří Podivín [Thu, 2 Nov 2023 19:24:10 +0000 (20:24 +0100)]
sam : update documentation to provide executable example (#596)
Also adds the example sample image to the repo to simplify replication.
Signed-off-by: Jiri Podivin <redacted>
Philpax [Wed, 1 Nov 2023 17:01:49 +0000 (18:01 +0100)]
gguf : add file format specification (#302)
* docs: gguf spec first pass
* docs(gguf): update with review comments
* docs(gguf): update with review comments
* docs(gguf): quant version optional for unquant
* docs(gguf): normalize naming, add whisper
* docs(gguf): more review updates
* docs(gguf): add norm eps and added_tokens
* docs(gguf): move padding
* docs(gguf): remove migration tool
* docs(gguf): make offset base explicit
* docs(gguf): fix replace oops
* docs(gguf): alignment metadata+tensor name len max
* docs(gguf): clarification, fixes, tensor names
* docs(gguf): clarify license
* docs(gguf): minor tweaks
* docs(gguf): data layout, GQA eq, no ft, LE GGUF
* docs(gguf): fix magic order
* docs(gguf): match impl
* docs(gguf): specify fallback alignment
* docs(gguf): remove TensorInfo::n_elements
* docs(gguf): filetype, rope base/linear scale
* docs(gguf): v2 - uint64 all the things
* docs(gguf): tweak extensibility wording
* docs(gguf): fix spec discrepancies
* docs(gguf): v3 + other fixes
* fix(editorconfig): use 2-space tabs for markdown
* docs(gguf): clarify big-endian
Andrei [Wed, 1 Nov 2023 12:08:28 +0000 (08:08 -0400)]
ggml-backend : use __declspec with msvc (#599)
slaren [Mon, 30 Oct 2023 20:28:09 +0000 (21:28 +0100)]
ggml-backend v2 : add ggml_backend_sched (#586)
* ggml-backend-v2 wip
* fix metal build
* ggml-alloc : use a real backend buffer in measure mode
* backend sched : ignore view ops to reduce the number of splits
* dynamic ggml_cgraph wip
* dyn graphs : remove n_tasks from ggml_cplan
* dyn graphs : update ggml_graph_import
* reset hash table in ggml_build_forward
* ggml-alloc : split into tensor and graph allocators
* add ggml_backend_sched_set_node_backend
* remove ggml_build_forward_ctx, ggml_build_backward_ctx
add ggml_opt_params::graph_size
add ggml_new_graph_custom, ggml_graph_overhead_custom
add ggml_graph_clear
* update examples and tests, fix issues
* update more examples
* update gpt-2/main-backend.cpp from master
* ggml : fix copmile warning
* ci : update yolo, fix mnist, use gpt-2-backend
* ggml : fix uninit warning
* ci : switch to gpt-2-backend2
ggml-ci
* metal : skip noops early to avoid warnings from ggml_metal_get_buffer
---------
Co-authored-by: Georgi Gerganov <redacted>
Radoslav Gerganov [Mon, 30 Oct 2023 04:34:14 +0000 (06:34 +0200)]
yolo : add example implementing YOLO object detection (#576)
* Add leaky relu activation
* Add padding support in ggml_pool_2d()
* Add yolov3-tiny example
Jiří Podivín [Mon, 30 Oct 2023 04:29:41 +0000 (05:29 +0100)]
gitignore : add ggml-model-f16.bin (#597)
Signed-off-by: Jiri Podivin <redacted>
Borislav Stanimirov [Mon, 30 Oct 2023 04:28:11 +0000 (06:28 +0200)]
cmake : cuda architectures: allow user override, only set local if not globally set (#595)
Georgi Gerganov [Tue, 24 Oct 2023 18:51:12 +0000 (21:51 +0300)]
sync : llama.cpp (CUDA, Metal, OpenCL, gguf magic, ggml iter) (#592)
ggml-ci
PAB [Tue, 24 Oct 2023 16:37:06 +0000 (18:37 +0200)]
ggml : memset dst to 0 in `ggml_conv_transpose_1d` and `ggml_conv_transpose_2d` (#591)
* wrong indexation of kernel buffer
* memset in dst
* apply same fix to ggml_conv_transpose_2d
Georgi Gerganov [Fri, 20 Oct 2023 07:12:39 +0000 (10:12 +0300)]
gpt-2 : fix allocr worst-case when n_parallel > prompt size
Georgi Gerganov [Fri, 20 Oct 2023 07:05:28 +0000 (10:05 +0300)]
gpt-2 : add ignore-eos flag
Georgi Gerganov [Fri, 20 Oct 2023 06:57:04 +0000 (09:57 +0300)]
gpt-2 : allow setting custom context size (i.e. large KV cache)
Georgi Gerganov [Mon, 16 Oct 2023 13:48:40 +0000 (16:48 +0300)]
ci : add SAM test + improve whisper test (#583)
ggml-ci
PAB [Sun, 15 Oct 2023 21:24:27 +0000 (23:24 +0200)]
tests : add ggml_conv_transpose_1d test (#582)
Georgi Gerganov [Thu, 12 Oct 2023 14:07:15 +0000 (17:07 +0300)]
ci : add gpt-2-batched test
Yavor Ivanov [Thu, 12 Oct 2023 14:08:09 +0000 (17:08 +0300)]
gpt-2 : add batched decoding example (#572)
* Initial attempt to make gpt2 do parallel decoding
* Fix crash on trying to use empty embd
* Make it work for n_parallel=1
* Add short way of passing n_parallel argument
* Move gpt-2 batched to a separate target and cpp file
* Add batched sample output to README and remove hardcoded model path and prompt
* gpt-2-batched : fix n_kv heuristic
* Free batch at end of example
* gpt-2-batched : simplify kv cache stuff (#574)
ggml-ci
* Fix not generating n_predict tokens and fix warn
* minor : readme
* Add check for end token and mark the stream as finished
---------
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: YavorGIvanov <redacted>
Georgi Gerganov [Thu, 12 Oct 2023 11:39:39 +0000 (14:39 +0300)]
ci : add M1 node (#577)
ggml-ci
Shijie [Thu, 12 Oct 2023 07:13:22 +0000 (15:13 +0800)]
readme : add qwen example (#575)
Georgi Gerganov [Wed, 11 Oct 2023 19:33:24 +0000 (22:33 +0300)]
cmake : fix string matching
slaren [Wed, 11 Oct 2023 18:52:43 +0000 (20:52 +0200)]
tests : do not build test-vec1 on systems without avx (#573)
leejet [Mon, 9 Oct 2023 15:18:47 +0000 (23:18 +0800)]
ggml : faster ggml_conv_2d using 2-stage op (#483)
* ggml : fix ggm_conv_2d impl
* ggml : make ggml_conv_2d a little faster
* ggml : reorganize ggml_conv_2d code
* ggml : make ggml_conv_2d faster
* use int64_t in conv_2d stage 0
* ggml : add TODO about im2col
---------
Co-authored-by: Georgi Gerganov <redacted>
Borislav Stanimirov [Mon, 9 Oct 2023 15:15:20 +0000 (18:15 +0300)]
cuda : int counters for device, fix fprintf warning (#560)
Yavor Ivanov [Mon, 9 Oct 2023 12:24:37 +0000 (15:24 +0300)]
Disable ggml-alloc assert for CPU version of Sam.cpp if the view doesn't have a buffer (#562)
Georgi Gerganov [Sun, 8 Oct 2023 16:44:14 +0000 (19:44 +0300)]
sync : llama.cpp (Metal + OpenCL + minor alibi) (#558)
ggml-ci
slaren [Sun, 8 Oct 2023 13:45:22 +0000 (15:45 +0200)]
fix MSVC build issues (#557)
* fix MSVC build issues
slaren [Sat, 7 Oct 2023 10:36:54 +0000 (12:36 +0200)]
ggml-alloc : fix crash when used without ggml-backend (#555)
* ggml-alloc : fix crash when used without ggml-backend
* fix regression in parent reuse that caused increased memory usage
Pierre Alexandre SCHEMBRI [Sat, 7 Oct 2023 10:29:33 +0000 (12:29 +0200)]
readme : mention Metal could be used for gpt-2 (#553)