]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
fix: prevent segfault in tokenizer on highly repetitive input (#17786)
authorPascal <redacted>
Fri, 5 Dec 2025 11:52:23 +0000 (12:52 +0100)
committerGitHub <redacted>
Fri, 5 Dec 2025 11:52:23 +0000 (13:52 +0200)
commit1be97831e44a6335aca9c3f4f3edbb0e35bea98f
treecfef281976b9eb1e1692f9fa6d23150c141459ab
parenta6cfc212ed21b1cf6746827390160ba26c160ee9
fix: prevent segfault in tokenizer on highly repetitive input (#17786)

Add nosubs|optimize flags to std::regex constructors to prevent
catastrophic backtracking when processing prompts with repeated
identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing
memory usage and backtracking on uniform token sequences
src/unicode.cpp