]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml : improve ADD_REL_POS perf in SAM by doing it inplace + broadcast BLAS mul_mat...
authorYavor Ivanov <redacted>
Mon, 21 Aug 2023 12:31:27 +0000 (15:31 +0300)
committerGitHub <redacted>
Mon, 21 Aug 2023 12:31:27 +0000 (15:31 +0300)
commit170388d1e14e358e275b8cac564124fb048f89c4
tree5232c628c8394c6ff79a9b5bf239ffd680862e0f
parent08c57df1b98ff94d065c0ee2f42294a5bde6bb7b
ggml : improve ADD_REL_POS perf in SAM by doing it inplace + broadcast BLAS mul_mat (#466)

* Improve ADD_REL_POS perf in SAM by doing it inplace

- Add unit tests for the ADD_REL_POS operation
- I am not sure if this is valid implementation as we reuse the src0
  memory in order to avoid copying it
- When running SAM with the "Example output" command, image, point and
  16 threads, this reduces the cumulative time of the ADD_REL_POS operation
  from 1000-1100 ms to 180-200ms
- There is further room for optimization in the access patterns used in
  the implementation of the opration

* Add non-inplace version for the GGML_OP_ADD_REL_POS

* Fix map_unary warnings and refactor LayerNorm2d + remove ggml_cont in it

* Fix Mac printf format warnings

* sam : add ggml_graph_print() comment

* ggml : add broadcast support for BLAS ggml_mul_mat() (#460)

* Remove not needed build_forward_expand from add-rel-pos unit test

---------

Co-authored-by: Georgi Gerganov <redacted>
examples/sam/main.cpp
include/ggml/ggml.h
src/ggml.c
tests/CMakeLists.txt
tests/test-rel-pos.c [new file with mode: 0644]