]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
authorGeorgi Gerganov <redacted>
Thu, 13 Mar 2025 10:35:44 +0000 (12:35 +0200)
committerGitHub <redacted>
Thu, 13 Mar 2025 10:35:44 +0000 (12:35 +0200)
commite0dbec0bc6cd4b6230cda7a6ed1e9dac08d1600b
treee3ee4e085042df7a76d51f691ae46450f656860b
parent2048b5913d51beab82dfe29955f9008130b936c0
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)

* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
46 files changed:
common/common.cpp
common/speculative.cpp
examples/batched-bench/batched-bench.cpp
examples/batched.swift/Sources/main.swift
examples/cvector-generator/cvector-generator.cpp
examples/embedding/embedding.cpp
examples/gritlm/gritlm.cpp
examples/imatrix/imatrix.cpp
examples/infill/infill.cpp
examples/llama-bench/llama-bench.cpp
examples/llama.android/llama/src/main/cpp/llama-android.cpp
examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
examples/llava/gemma3-cli.cpp
examples/lookahead/lookahead.cpp
examples/lookup/lookup.cpp
examples/main/main.cpp
examples/parallel/parallel.cpp
examples/passkey/passkey.cpp
examples/perplexity/perplexity.cpp
examples/quantize-stats/quantize-stats.cpp
examples/retrieval/retrieval.cpp
examples/run/run.cpp
examples/save-load-state/save-load-state.cpp
examples/server/server.cpp
examples/server/tests/utils.py
examples/simple-chat/simple-chat.cpp
examples/speculative-simple/speculative-simple.cpp
examples/speculative/speculative.cpp
include/llama.h
src/CMakeLists.txt
src/llama-adapter.cpp
src/llama-adapter.h
src/llama-batch.h
src/llama-context.cpp
src/llama-context.h
src/llama-graph.cpp [new file with mode: 0644]
src/llama-graph.h [new file with mode: 0644]
src/llama-io.cpp [new file with mode: 0644]
src/llama-io.h [new file with mode: 0644]
src/llama-kv-cache.cpp
src/llama-kv-cache.h
src/llama-memory.cpp [new file with mode: 0644]
src/llama-memory.h [new file with mode: 0644]
src/llama-model.cpp
src/llama-model.h
src/llama.cpp