]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CANN: implement LRU cache for ACL graphs (#15814)
authorChenguang Li <redacted>
Wed, 10 Sep 2025 07:29:12 +0000 (15:29 +0800)
committerGitHub <redacted>
Wed, 10 Sep 2025 07:29:12 +0000 (15:29 +0800)
commit28b5f190ef1dbea5edf82dbc8b4407b721fadd13
tree2528d063595ed28fc2fd520bb1c8581131682675
parent86587da03bd78df8f4e7d8b111a0c1d2494d6ed0
CANN: implement LRU cache for ACL graphs (#15814)

* CANN: implement LRU cache for ACL graphs in CANN backend

- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.

* fix typo

* The LRU cache capacity can be configured via an env variable

Signed-off-by: noemotiovon <redacted>
* refactory acl graph

* refactory && fix review comments

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
docs/backend/CANN.md
ggml/src/ggml-cann/common.h
ggml/src/ggml-cann/ggml-cann.cpp