git.djapps.eu Git - pkg/ggml/sources/ggml/log

]> git.djapps.eu Git - pkg/ggml/sources/ggml/log

overview / pkg / ggml / sources / ggml / log

commit | commitdiff | tree

Georgi Gerganov [Wed, 27 Dec 2023 09:42:45 +0000 (11:42 +0200)]

scripts : fix sed in sync-llama.am.sh

commit | commitdiff | tree

Georgi Gerganov [Wed, 27 Dec 2023 09:06:32 +0000 (11:06 +0200)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 27 Dec 2023 09:02:13 +0000 (11:02 +0200)]

ggml : fix dot product for ARM (llama/4630)

ggml-ci

commit | commitdiff | tree

slaren [Tue, 26 Dec 2023 20:23:59 +0000 (21:23 +0100)]

cuda : fix vmm pool with multi GPU (llama/4620)

* cuda : fix vmm pool with multi GPU

* hip

* use recommended granularity instead of minimum

* better error checking

* fix mixtral

* use cudaMemcpy3DPeerAsync

* use cuda_pool_alloc in ggml_cuda_op_mul_mat

* consolidate error checking in ggml_cuda_set_device

* remove unnecessary inlines

ggml-ci

* style fixes

* only use vmm for the main device

* fix scratch buffer size, re-enable vmm pool for all devices

* remove unnecessary check id != g_main_device

commit | commitdiff | tree

WillCorticesAI [Tue, 26 Dec 2023 10:42:08 +0000 (05:42 -0500)]

Update comment for AdamW implementation reference. (llama/4604)

Co-authored-by: Will Findley <redacted>

commit | commitdiff | tree

FantasyGmm [Tue, 26 Dec 2023 10:38:36 +0000 (18:38 +0800)]

Fix new CUDA10 compilation errors (llama/4635)

commit | commitdiff | tree

Georgi Gerganov [Mon, 25 Dec 2023 09:25:19 +0000 (11:25 +0200)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 25 Dec 2023 09:00:39 +0000 (11:00 +0200)]

cmake : update CUDA build to support VMM

commit | commitdiff | tree

slaren [Sun, 24 Dec 2023 13:34:22 +0000 (14:34 +0100)]

cuda : improve cuda pool efficiency using virtual memory (llama/4606)

* cuda : improve cuda pool efficiency using virtual memory

* fix mixtral

* fix cmake build

* check for vmm support, disable for hip

ggml-ci

* fix hip build

* clarify granularity

* move all caps to g_device_caps

* refactor error checking

* add cuda_pool_alloc, refactor most pool allocations

ggml-ci

* fix hip build

* CUBLAS_TF32_TENSOR_OP_MATH is not a macro

* more hip crap

* llama : fix msvc warnings

* ggml : fix msvc warnings

* minor

* minor

* cuda : fallback to CPU on host buffer alloc fail

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <redacted>
* ensure allocations are always aligned

* act_size -> actual_size

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

slaren [Sat, 23 Dec 2023 15:10:51 +0000 (16:10 +0100)]

fallback to CPU buffer if host buffer alloc fails (llama/4610)

commit | commitdiff | tree

Johannes Gäßler [Sat, 23 Dec 2023 08:16:33 +0000 (09:16 +0100)]

CUDA: fixed row rounding for 0 tensor splits (llama/4594)

commit | commitdiff | tree

Georgi Gerganov [Mon, 25 Dec 2023 08:58:54 +0000 (10:58 +0200)]

scripts : fix PR number parsing during sync

commit | commitdiff | tree

Georgi Gerganov [Sun, 24 Dec 2023 13:49:12 +0000 (15:49 +0200)]

scripts : improve llama sync patch

commit | commitdiff | tree

Georgi Gerganov [Sat, 23 Dec 2023 16:05:29 +0000 (18:05 +0200)]

scripts : sync tests / headers

commit | commitdiff | tree

Georgi Gerganov [Sat, 23 Dec 2023 15:54:42 +0000 (17:54 +0200)]

scripts : remove exit

commit | commitdiff | tree

Georgi Gerganov [Sat, 23 Dec 2023 15:54:07 +0000 (17:54 +0200)]

scripts : fix PR number sed

commit | commitdiff | tree

Georgi Gerganov [Sat, 23 Dec 2023 15:49:08 +0000 (17:49 +0200)]

scripts : add sync-llama-am.sh

commit | commitdiff | tree

Georgi Gerganov [Fri, 22 Dec 2023 15:53:50 +0000 (17:53 +0200)]

sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec) (#662)

* sync : llama.cpp (ggml_scale, ggml_row_size, ggml_mul_mat_set_prec)

ggml-ci

* ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203)

* llama : fix platforms without mmap (#4578)

* llama : fix platforms without mmap

* win32 : limit prefetch size to the file size

* fix win32 error clobber, unnecessary std::string in std::runtime_error

* ggml-alloc : fix ggml_tallocr_is_own

* whisper : minor

* ggml : cuda jetson + arm quants warnings

ggml-ci

---------

Co-authored-by: Herman Semenov <redacted>
Co-authored-by: slaren <redacted>

commit | commitdiff | tree

slaren [Mon, 18 Dec 2023 17:05:43 +0000 (18:05 +0100)]

cuda : fix synchronization with tensor get/set (#659)

commit | commitdiff | tree

leejet [Mon, 18 Dec 2023 16:46:10 +0000 (00:46 +0800)]

cuda : fix im2col_f32_f16 (#658)

commit | commitdiff | tree

chengchi [Thu, 14 Dec 2023 08:12:31 +0000 (16:12 +0800)]

cmake : change installation path of static libraries to standard directory (#645)

Co-authored-by: Cheng-Chi Wang <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 13 Dec 2023 19:53:20 +0000 (21:53 +0200)]

sync : llama (mul_mat_id + get_rows kernels, typos) (#649)

* sync : llama (mul_mat_id + get_rows kernels, typos)

ggml-ci

* cuda : restore im2col

ggml-ci

* metal : fix accuracy of dequantization kernels

* cuda : restore correct im2col kernel

ggml-ci

* metal : fix moe test by reducing the expert size

ggml-ci

* cuda : fix bin bcast when src1 and dst have different types

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Steward Garcia [Wed, 13 Dec 2023 14:08:48 +0000 (09:08 -0500)]

ggml: new gpu kernels + extends ggml_leaky_relu + ggml_pad (#621)

* add new cuda kernels and new op ggml_pad

* add ggml_tanh cuda kernel

* remove old broadcast impl

* restore some changes

* cuda: optimized im2col + group_norm kernels

* extent ggml_leaky -> ggml_leaky_relu

* fix some code issues

* cuda: concat support 4 dims

* cuda: fix ggml_acc + add backends ops test

* restore ggml_pad + add backend op test

* metal : implement GGML_OP_ACC

* ggml : fix bug in ggml_upscale

* metal : add ggml_upscale

* metal : add ggml_tanh

* metal : add ggml_gelu_quick

* ggml : make ggml_pad more general purpose

* metal : add ggml_pad

* ggml_leaky_relu as regular op + fix identation

* cuda: ggml_acc admit all op_parms

* negative_slope better pass param

* metal : add ggml_leaky_relu

* metal : add ggml_group_norm

* cuda : minor

* ggml : add GGML_OP_LEAKY_RELU to ggml_compute_backward

* metal : soft max, tanh, supports_op fixes

* test-backend-ops : add sentinels between tensors to detect overflows

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>

commit | commitdiff | tree

ariez-xyz [Wed, 13 Dec 2023 12:01:31 +0000 (13:01 +0100)]

gguf : document Mixtral changes in spec (#646)

* add new tensor names

* add new keys

* fix tensor names

* gguf : change wording a bit

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 8 Dec 2023 15:04:39 +0000 (17:04 +0200)]

sync : whisper.cpp (metal soft max fix + example prints)

commit | commitdiff | tree

Georgi Gerganov [Thu, 7 Dec 2023 20:26:34 +0000 (22:26 +0200)]

sync : llama.cpp (fused soft max, gpu cpy ops, etc.) (#640)

* sync : llama.cpp (fused soft max, gpu cpy ops, etc.)

ggml-ci

* cuda : restore accidentally deleted changes

ggml-ci

* cuda : fix rope + disable device-side dequantize

ggml-ci

* test-backend-ops : enable stablelm rope test

* cuda : remove rope assert

* sync.sh : add test-backend-ops

* ggml : fix ggml_concat + ggml_get_n_tasks logic

* sync : whisper.cpp

ggml-ci

* metal : fix assert

* ci : fix Metal path to shaders

ggml-ci

* whisper : fix bug if metal init fails

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

slaren [Thu, 7 Dec 2023 17:54:01 +0000 (18:54 +0100)]

ggml-backend : remove backend self-registration (#641)

commit | commitdiff | tree

slaren [Thu, 7 Dec 2023 08:51:46 +0000 (09:51 +0100)]

test-backend-ops : add performance eval mode + improve CUDA repeat and binary broadcast ops performance (#636)

* ggml-cuda : implement repeat with bin_bcast

* ggml-cuda : change supports_op for mul_mat to match compute_forward

* test-backend-ops : add performance eval mode

* improve formatting

* add sd test cases

* fix test case

* ggml-cuda : bin_bcast: better block sizes, two elements per thread

* metal : add dim3 broadcast support for mul mat

* cleanup

* typo

* metal : enable mul mat-vec for dim2 > 1

* metal : mul mat-vec support dim3 broadcasts

ggml-ci

* ggml-cuda : fix bin_bcast for ne0=1
ggml-ci

* ggml-cuda : limit block size z dim to 64

* test-backend-ops : add test cases

* test-backend-ops : add warmup run, print test type before trying to compute

* ggml-cuda : bin_bcast: collapse dimensions when possible, add fallback kernel for large tensors
ggml-ci

* test-backend-ops : avoid division by zero

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

slaren [Tue, 5 Dec 2023 15:12:15 +0000 (16:12 +0100)]

test-backend-ops : initialize ggml_argsort test with unique values to avoid ties (#634)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 5 Dec 2023 13:17:48 +0000 (15:17 +0200)]

metal : check supported ops at runtime (#632)

* metal : check supported ops at runtime

* metal : remove TODOs

commit | commitdiff | tree

slaren [Tue, 5 Dec 2023 12:56:07 +0000 (13:56 +0100)]

ggml : full broadcast in mul, add, div + ggml_mul_mat_id, ggml_argsort, ggml_top_k (#625)

* ggml : support broadcasting in dim 0 in add and mul

* add cuda add/mul broadcast impl
add configurable eps to cuda norm

* add metal impl
ggml-ci

* deduplicate code in cuda impl

* try to optimize cuda impl

* ggml : support broadcasting in ggml_div

* test-backend-ops : allow filtering by op and backend

* ggml-cuda : add ggml_div impl

* ggml : add ggml_mul_mat_id, ggml_sort, ggml_top_k (CPU only)

* fix ggml_div threads

* fix ggml_div with accelerate

* ggml_sort -> ggml_argsort

* whatever

* actually fix accelerate div

* disable opencl ci

* ci : disable ctest error check temporarily until we fix backend ops test

* cmake : propagete GGML_USE_xxx compile flags with ggml target

* whisper : utlize new ggml_add broadcast for dim 0

* cmake : adendum to ee666ae9

* ggml_backend_graph_copy : fix leak

* ggml_cuda : add ggml_sum_rows impl

* metal : add ggml_div

* metal : add ggml_sum_rows

* ggml_cuda : add ggml_argsort impl

* move kernel

* metal : add ggml_argsort

* mul_mat_id : fix missing init task

* cuda/metal: fix argsort synchronization

* metal : add ggml_mul_mat_id

* ggml-cuda : add mul_mat_id for f16 + tensor cores

* test-backend-ops : add tests for quants mat mul

* ggml : fix q5_0 and q5_1 hist stats

* test-backend-ops : use smaller matrices to avoid automatic offloading, add mat-vec tests

* metal : fix alibi to match the CPU behavior

* metal : check dimensions in supports_op

* test-backend-ops : reduce error threshold for mat muls

* ggml-cuda : simplify dequantize funs, add supports_op by type for mul_mat_id

* ggml-cuda : support quantized types in mul_mat_id with cublas

* ggml-cuda : add fallback over CPU for mul_mat_id

* test-backend-ops : increase mul mat error threshold

* cleanup
ggml-ci

* test-backend-ops : fix usage

* cleanup

* ci : re-enable tests

* metal : fix compile warnings

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 5 Dec 2023 11:36:51 +0000 (13:36 +0200)]

readme : add link to seamless_comm repo

commit | commitdiff | tree

Judd [Tue, 5 Dec 2023 10:06:32 +0000 (18:06 +0800)]

ggml : disable `fprintf` when building with NDEBUG (#631)

Co-authored-by: Judd <redacted>

commit | commitdiff | tree

slaren [Fri, 1 Dec 2023 20:05:59 +0000 (21:05 +0100)]

ggml-cuda : fix usage without CUDA devices (#627)

commit | commitdiff | tree

RiverZhou [Fri, 1 Dec 2023 08:01:31 +0000 (16:01 +0800)]

cmake : add ROCm config (#626)

commit | commitdiff | tree

slaren [Thu, 30 Nov 2023 18:03:03 +0000 (19:03 +0100)]

ggml-backend update: buffer types, backend registry, graph compare, tests (#620)

* ggml-backend update

* update metal backend

* show metal logs with ggml-backend

* move buffer types to functions

* cuda: add per-device backends

* cuda: add host buffer type

* fix metal build

* ggml_backend_alloc_ctx_tensors : ignore allocated tensors

* ggml_backend_compare_graph_backend fixes

* ci : try to fix metal build

* metal : first print device info, then build kernels

* ci : disable GGML_METAL on Github Actions

* test-backend-ops initial impl (unary and get_rows)

* more op tests

* cleanup

* print test params, add more tests cases for add and mul

* add tests for im2col

* better f16 init

* metal : add basic impl of supports_op

* add test for ggml_concat

* update im2col test params, show callstack with GGML_ASSERT on CUDA failures

* add more rope tests

* add more rope and mul_mat test cases

* add more get_rows test cases
ggml-ci

* add more norm and rms_norm test cases with different eps

* ci : fix metal resource path

ggml-ci

* tests : silence warning

* add ggml_backend_tensor_alloc and ggml_backend_view_init for initializing tensors without ggml-alloc

* add mul_mat test cases without dims 3 and 4
ggml-ci

* check for nans and infs
ggml-ci

* add diag_mask_inf test cases without dims 3 and 4
ggml-ci

* fix cuda leak while backend reg

* fix msvc issues

* remove backend_sched debug causes by default

* gpt-2 : increase graph size

ggml-ci

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

magicse [Thu, 23 Nov 2023 10:07:50 +0000 (12:07 +0200)]

tests : update test-vec0.c for mingw (#619)

For correct building under mingw64

commit | commitdiff | tree

Georgi Gerganov [Thu, 23 Nov 2023 08:13:43 +0000 (10:13 +0200)]

readme : add vit.cpp (#618)

commit | commitdiff | tree

slaren [Sun, 19 Nov 2023 08:37:08 +0000 (09:37 +0100)]

gguf : add tokenizer.chat_template documentation (#616)

commit | commitdiff | tree

Guillaume Wenzek [Fri, 17 Nov 2023 12:24:25 +0000 (07:24 -0500)]

ggml : fix ggml_set_2d_inplace (#611)

commit | commitdiff | tree

Georgi Gerganov [Fri, 17 Nov 2023 08:12:58 +0000 (10:12 +0200)]

gguf : prevent out-of-bounds-access on invalid magic (close #614)

commit | commitdiff | tree

Georgi Gerganov [Fri, 17 Nov 2023 08:00:11 +0000 (10:00 +0200)]

sync : whisper.cpp (update whisper example + minor) (#613)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 16 Nov 2023 15:06:55 +0000 (17:06 +0200)]

sync : llama.cpp (cuda, gguf and linker fixes)

commit | commitdiff | tree

slaren [Mon, 13 Nov 2023 15:19:49 +0000 (16:19 +0100)]

update examples and tests to use ggml_allocr_new_measure_from_backend (#608)

* update examples and tests to use ggml_allocr_new_measure_from_backend

* update comments

commit | commitdiff | tree

Georgi Gerganov [Mon, 13 Nov 2023 14:54:34 +0000 (16:54 +0200)]

sync : llama.cpp (CUDA ReLU, CPU-only with CUDA, bloom fix, etc) (#607)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sun, 12 Nov 2023 14:35:03 +0000 (16:35 +0200)]

sync : whisper.cpp (whisper full GPU, fix warnings) (#606)

* sync : whisper.cpp (whisper full GPU, fix warnings)

ggml-ci

* ci : enable CUDA / Metal

ggml-ci

* cuda : fallback to CPU for mul mat ne03 != ne13 (fix SAM + CUDA)

ggml-ci

commit | commitdiff | tree

Steward Garcia [Sun, 12 Nov 2023 13:34:04 +0000 (08:34 -0500)]

ggml : replace conv 1D - 2D stage_0 and stage_1 with im2col and mul_mat (#564)

* added conv2d stage 0 - 1 cuda kernels

* add im2col + refactor conv1d and conv2d

* fix params invalid index

* add conv1d and conv2d unit tests

* resolving wrong values and fix mul_mat validation

* improve tests + reduce code duplication

* add cuda kernels

* more data test

* fix ggml_op_count to 70

* add temp test - gemm != mul_mat

* tests : fix test-mul-mat matrix multiplication

* test-mul-mat match gemm == ggml_mul_mat with conv2d op

* replaced gemm by ggml_mul_mat

* ggml_mul_mat cpu backend support fp16 src1

* ggml_mul_mat cuda backend fp16 fixed

* remove unnecessary ggml_cont and removed conv1d-2d functions deprecated

* some fixes

* explain conv1d reshapes

* ggml : fix tests on Arm + do not use BLAS for F16 data

* tests : fix FP16 handling on Arm

* ggml : avoid ggml_cont and ggml_transpose in ggml_conv_xd

* ci : switch back to release

* cuda : fix wrong pointer usage

* ggml : add metal support for im2col and f16xf16 mul mat

* ggml : im2col opts

* Update src/ggml-cuda.cu

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 3 Nov 2023 20:26:27 +0000 (22:26 +0200)]

sync : whisper.cpp (ARM 32-bit, abort callback, wav_writer, etc.) (#602)

commit | commitdiff | tree

Georgi Gerganov [Fri, 3 Nov 2023 08:08:17 +0000 (10:08 +0200)]

sync : llama.cpp (CUDA opts, ggml-quants, YARN, etc.) (#601)

ggml-ci

commit | commitdiff | tree

Jiří Podivín [Thu, 2 Nov 2023 19:28:11 +0000 (20:28 +0100)]

sam : passing parameters and simple prompt (#598)

- most of the model hyperparameters can now be set on CLI
- user can define their own mask prefix
- user can define their own point prompt, although just one

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Jiří Podivín [Thu, 2 Nov 2023 19:24:10 +0000 (20:24 +0100)]

sam : update documentation to provide executable example (#596)

Also adds the example sample image to the repo to simplify replication.

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Philpax [Wed, 1 Nov 2023 17:01:49 +0000 (18:01 +0100)]

gguf : add file format specification (#302)

* docs: gguf spec first pass

* docs(gguf): update with review comments

* docs(gguf): update with review comments

* docs(gguf): quant version optional for unquant

* docs(gguf): normalize naming, add whisper

* docs(gguf): more review updates

* docs(gguf): add norm eps and added_tokens

* docs(gguf): move padding

* docs(gguf): remove migration tool

* docs(gguf): make offset base explicit

* docs(gguf): fix replace oops

* docs(gguf): alignment metadata+tensor name len max

* docs(gguf): clarification, fixes, tensor names

* docs(gguf): clarify license

* docs(gguf): minor tweaks

* docs(gguf): data layout, GQA eq, no ft, LE GGUF

* docs(gguf): fix magic order

* docs(gguf): match impl

* docs(gguf): specify fallback alignment

* docs(gguf): remove TensorInfo::n_elements

* docs(gguf): filetype, rope base/linear scale

* docs(gguf): v2 - uint64 all the things

* docs(gguf): tweak extensibility wording

* docs(gguf): fix spec discrepancies

* docs(gguf): v3 + other fixes

* fix(editorconfig): use 2-space tabs for markdown

* docs(gguf): clarify big-endian

commit | commitdiff | tree

Andrei [Wed, 1 Nov 2023 12:08:28 +0000 (08:08 -0400)]

ggml-backend : use __declspec with msvc (#599)

commit | commitdiff | tree

slaren [Mon, 30 Oct 2023 20:28:09 +0000 (21:28 +0100)]

ggml-backend v2 : add ggml_backend_sched (#586)

* ggml-backend-v2 wip

* fix metal build

* ggml-alloc : use a real backend buffer in measure mode

* backend sched : ignore view ops to reduce the number of splits

* dynamic ggml_cgraph wip

* dyn graphs : remove n_tasks from ggml_cplan

* dyn graphs : update ggml_graph_import

* reset hash table in ggml_build_forward

* ggml-alloc : split into tensor and graph allocators

* add ggml_backend_sched_set_node_backend

* remove ggml_build_forward_ctx, ggml_build_backward_ctx
add ggml_opt_params::graph_size
add ggml_new_graph_custom, ggml_graph_overhead_custom
add ggml_graph_clear

* update examples and tests, fix issues

* update more examples

* update gpt-2/main-backend.cpp from master

* ggml : fix copmile warning

* ci : update yolo, fix mnist, use gpt-2-backend

* ggml : fix uninit warning

* ci : switch to gpt-2-backend2

ggml-ci

* metal : skip noops early to avoid warnings from ggml_metal_get_buffer

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Mon, 30 Oct 2023 04:34:14 +0000 (06:34 +0200)]

yolo : add example implementing YOLO object detection (#576)

* Add leaky relu activation

* Add padding support in ggml_pool_2d()

* Add yolov3-tiny example

commit | commitdiff | tree

Jiří Podivín [Mon, 30 Oct 2023 04:29:41 +0000 (05:29 +0100)]

gitignore : add ggml-model-f16.bin (#597)

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Mon, 30 Oct 2023 04:28:11 +0000 (06:28 +0200)]

cmake : cuda architectures: allow user override, only set local if not globally set (#595)

commit | commitdiff | tree

Georgi Gerganov [Tue, 24 Oct 2023 18:51:12 +0000 (21:51 +0300)]

sync : llama.cpp (CUDA, Metal, OpenCL, gguf magic, ggml iter) (#592)

ggml-ci

commit | commitdiff | tree

PAB [Tue, 24 Oct 2023 16:37:06 +0000 (18:37 +0200)]

ggml : memset dst to 0 in `ggml_conv_transpose_1d` and `ggml_conv_transpose_2d` (#591)

* wrong indexation of kernel buffer

* memset in dst

* apply same fix to ggml_conv_transpose_2d

commit | commitdiff | tree

Georgi Gerganov [Fri, 20 Oct 2023 07:12:39 +0000 (10:12 +0300)]

gpt-2 : fix allocr worst-case when n_parallel > prompt size

commit | commitdiff | tree

Georgi Gerganov [Fri, 20 Oct 2023 07:05:28 +0000 (10:05 +0300)]

gpt-2 : add ignore-eos flag

commit | commitdiff | tree

Georgi Gerganov [Fri, 20 Oct 2023 06:57:04 +0000 (09:57 +0300)]

gpt-2 : allow setting custom context size (i.e. large KV cache)

commit | commitdiff | tree

Georgi Gerganov [Mon, 16 Oct 2023 13:48:40 +0000 (16:48 +0300)]

ci : add SAM test + improve whisper test (#583)

ggml-ci

commit | commitdiff | tree

PAB [Sun, 15 Oct 2023 21:24:27 +0000 (23:24 +0200)]

tests : add ggml_conv_transpose_1d test (#582)

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 14:07:15 +0000 (17:07 +0300)]

ci : add gpt-2-batched test

commit | commitdiff | tree

Yavor Ivanov [Thu, 12 Oct 2023 14:08:09 +0000 (17:08 +0300)]

gpt-2 : add batched decoding example (#572)

* Initial attempt to make gpt2 do parallel decoding

* Fix crash on trying to use empty embd

* Make it work for n_parallel=1

* Add short way of passing n_parallel argument

* Move gpt-2 batched to a separate target and cpp file

* Add batched sample output to README and remove hardcoded model path and prompt

* gpt-2-batched : fix n_kv heuristic

* Free batch at end of example

* gpt-2-batched : simplify kv cache stuff (#574)

ggml-ci

* Fix not generating n_predict tokens and fix warn

* minor : readme

* Add check for end token and mark the stream as finished

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: YavorGIvanov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 12 Oct 2023 11:39:39 +0000 (14:39 +0300)]

ci : add M1 node (#577)

ggml-ci

commit | commitdiff | tree

Shijie [Thu, 12 Oct 2023 07:13:22 +0000 (15:13 +0800)]

readme : add qwen example (#575)

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Oct 2023 19:33:24 +0000 (22:33 +0300)]

cmake : fix string matching

commit | commitdiff | tree

slaren [Wed, 11 Oct 2023 18:52:43 +0000 (20:52 +0200)]

tests : do not build test-vec1 on systems without avx (#573)

commit | commitdiff | tree

leejet [Mon, 9 Oct 2023 15:18:47 +0000 (23:18 +0800)]

ggml : faster ggml_conv_2d using 2-stage op (#483)

* ggml : fix ggm_conv_2d impl

* ggml : make ggml_conv_2d a little faster

* ggml : reorganize ggml_conv_2d code

* ggml : make ggml_conv_2d faster

* use int64_t in conv_2d stage 0

* ggml : add TODO about im2col

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Mon, 9 Oct 2023 15:15:20 +0000 (18:15 +0300)]

cuda : int counters for device, fix fprintf warning (#560)

commit | commitdiff | tree

Yavor Ivanov [Mon, 9 Oct 2023 12:24:37 +0000 (15:24 +0300)]

Disable ggml-alloc assert for CPU version of Sam.cpp if the view doesn't have a buffer (#562)

commit | commitdiff | tree

Georgi Gerganov [Sun, 8 Oct 2023 16:44:14 +0000 (19:44 +0300)]

sync : llama.cpp (Metal + OpenCL + minor alibi) (#558)

ggml-ci

commit | commitdiff | tree

slaren [Sun, 8 Oct 2023 13:45:22 +0000 (15:45 +0200)]

fix MSVC build issues (#557)

* fix MSVC build issues

commit | commitdiff | tree

slaren [Sat, 7 Oct 2023 10:36:54 +0000 (12:36 +0200)]

ggml-alloc : fix crash when used without ggml-backend (#555)

* ggml-alloc : fix crash when used without ggml-backend

* fix regression in parent reuse that caused increased memory usage

commit | commitdiff | tree

Pierre Alexandre SCHEMBRI [Sat, 7 Oct 2023 10:29:33 +0000 (12:29 +0200)]

readme : mention Metal could be used for gpt-2 (#553)

commit | commitdiff | tree

slaren [Fri, 6 Oct 2023 16:51:25 +0000 (18:51 +0200)]

ggml backends interface v1 (#547)

* ggml backends interface v1

* ggml-backend : metal (#552)

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Hyunsung Lee [Fri, 6 Oct 2023 14:01:42 +0000 (23:01 +0900)]

ggml : delete duplicate logging macros (#531)

* remove duplicate macros

* .

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Oct 2023 12:53:05 +0000 (15:53 +0300)]

sync : llama.cpp (training, refactoring) (#548)

* sync : llama.cpp (training, refactoring)

* examples : fix ggml_rope

* ggml : better optimizer cancel handling

ggml-ci

* ggml : fix UBs

ggml-ci

* ggml : add TODO for refactoring the opt cancellation

commit | commitdiff | tree

布客飞龙 [Wed, 4 Oct 2023 09:04:16 +0000 (17:04 +0800)]

cmake : add OPENCL_LIB to solve problem [cannot resolve external symbol clxxxx ] (#527)

commit | commitdiff | tree

skirodev [Thu, 28 Sep 2023 21:10:45 +0000 (05:10 +0800)]

ggml : fix batch for ggml_conv_2d (#528)

commit | commitdiff | tree

PAB [Thu, 28 Sep 2023 21:09:51 +0000 (23:09 +0200)]

ggml : add `GGML_OP_CONV_TRANSPOSE_1D` (#524)

* introduce GGML_OP_CONV_TRANSPOSE_1D

* implementation

* increment GGML_OP_COUNT

* rename calc_conv_transpose

* fix permutation of kernel data

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

PAB [Thu, 28 Sep 2023 21:03:34 +0000 (23:03 +0200)]

ggml : complete implementation of `GGML_OP_CONV_1D` (#523)

* implementation

* fix wrong call to function

* matching closely ggml_conv_2d

* optimized conv_1d with stages 0 and 1

* working implementation

commit | commitdiff | tree

Georgi Gerganov [Fri, 15 Sep 2023 17:58:43 +0000 (20:58 +0300)]

ci : add whisper test (#525)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 15 Sep 2023 17:46:00 +0000 (20:46 +0300)]

examples : fix compile warnings

commit | commitdiff | tree

Georgi Gerganov [Fri, 15 Sep 2023 16:07:30 +0000 (19:07 +0300)]

sync : whisper.cpp (Metal + ggml sched_yield fix + reduce ggml-alloc size) (#522)

ggml-ci

commit | commitdiff | tree

Diogo [Fri, 8 Sep 2023 16:54:30 +0000 (12:54 -0400)]

ci : add Metal build (#514)

* metal on mac

* remove apt-get

* added xcrun prefix

commit | commitdiff | tree

Diogo [Fri, 8 Sep 2023 15:07:53 +0000 (11:07 -0400)]

ci : add CLBlast build (#513)

* added clblast test to ci

* moved threads to env

* changed name

* upgraded checkout to v3

commit | commitdiff | tree

Jiahao Li [Fri, 8 Sep 2023 15:01:21 +0000 (23:01 +0800)]

cuda : suppress compiler warning of unused variables (#505)

commit | commitdiff | tree

布客飞龙 [Fri, 8 Sep 2023 15:01:02 +0000 (23:01 +0800)]

cmake : solve prob "clblast.h not found" (#506)

commit | commitdiff | tree

Cebtenzzre [Fri, 8 Sep 2023 14:58:01 +0000 (10:58 -0400)]

ggml : mark ggml_format_name as a printf-like function (#508)

commit | commitdiff | tree

Cebtenzzre [Fri, 8 Sep 2023 14:57:35 +0000 (10:57 -0400)]

ggml : gguf_context const-correctness (#509)

commit | commitdiff | tree

Georgi Gerganov [Fri, 8 Sep 2023 14:57:04 +0000 (17:57 +0300)]

sync : whisper (POSIX) (#511)

* sync : whisper (POSIX)

ggml-ci

* sync : llama (HBM + Metal + style)

ggml-ci

commit | commitdiff | tree

YavorGIvanov [Fri, 8 Sep 2023 13:17:44 +0000 (16:17 +0300)]

Fix SAM example mask output with latest ggml

- I am not sure why this inplace removal causes the output to turn
  correct again. I spend some time debugging and trying different
things, but my assumption is that some dependency is not properly
propagated and the allocator doesn't know about some tensor and
therefore decided to free it and overwrite its memory
- I also added commented out build_forward_expand, which also fixes
the output, but I am still not sure why
- Additionally I am still trying to figure out why the
  ggml_allocr_alloc(..) calls after the ggml_conv_transpose_2d_p0 are
needed
- I guess I have to spend some time debugging the ggml allocator and
  figure out what wrong is happening in this operations. Probably
something wrong in the operation implementation that I am unable to
notice.

Fixes #510.

commit | commitdiff | tree

Jiahao Li [Tue, 5 Sep 2023 18:11:11 +0000 (02:11 +0800)]

cuda : support flattened GLM-style rope to reduce kernel launch (#477)

commit | commitdiff | tree

Georgi Gerganov [Tue, 5 Sep 2023 13:37:55 +0000 (16:37 +0300)]

whisper : minor sync

commit | commitdiff | tree

Yavor Ivanov [Tue, 5 Sep 2023 11:40:17 +0000 (14:40 +0300)]

sam : remove ggml_repeat and use inplace operation (#493)

commit | commitdiff | tree

Georgi Gerganov [Tue, 5 Sep 2023 11:38:30 +0000 (14:38 +0300)]

ggml : sync llama.cpp (view_src + alloc improvements) (#504)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 5 Sep 2023 10:55:06 +0000 (13:55 +0300)]

whisper : sync (match OpenAI input, convert, new features) (#495)

ggml-ci

Packaging of ggml-org/ggml

RSS Atom