]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CANN: implement LRU cache for ACL graphs (llama/15814)
authorChenguang Li <redacted>
Wed, 10 Sep 2025 07:29:12 +0000 (15:29 +0800)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:33:50 +0000 (13:33 +0300)
commit711433a7e6607b12c0bbff6d108a63e963bb24fc
treee47bfa97e1000e5fb146fc2715d63809ad04df4c
parent97ec0e5ba79b56231715bc972f363160556636ee
CANN: implement LRU cache for ACL graphs (llama/15814)

* CANN: implement LRU cache for ACL graphs in CANN backend

- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.

* fix typo

* The LRU cache capacity can be configured via an env variable

Signed-off-by: noemotiovon <redacted>
* refactory acl graph

* refactory && fix review comments

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
src/ggml-cann/common.h
src/ggml-cann/ggml-cann.cpp