]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml : use a simple std::thread in AMX without OpenMP (llama/20074)
authorAdrien Gallouët <redacted>
Wed, 4 Mar 2026 10:57:09 +0000 (11:57 +0100)
committerGeorgi Gerganov <redacted>
Mon, 16 Mar 2026 11:10:15 +0000 (13:10 +0200)
commitb1b018dfd11060e7f5f633ff57d2714966b38e5b
tree46854c428e76bc36c82543aca765ff44df69b429
parent169d723fa000f0325e0464585418a6f6a260152e
ggml : use a simple std::thread in AMX without OpenMP (llama/20074)

Disabling OpenMP generally provides better inference performance (at
least in my testing) but the loading becomes slightly slower.

Benchmark results for `convert_B_packed_format()`:

Before this commit:

         N      K |  No OpenMP     OpenMP |    Diff |  Speedup
    ------------------------------------------------------------
       512   2880 |    640.9us    263.5us |  -58.9% |    0.41x
      2880   4096 |     2.55ms    261.7us |  -89.8% |    0.10x
    201088   2880 |   256.44ms    21.61ms |  -91.6% |    0.08x
    ------------------------------------------------------------

    Total: 325.43ms vs 31.05ms

After:

         N      K |  No OpenMP     OpenMP |    Diff |  Speedup
    ------------------------------------------------------------
       512   2880 |     1.49ms    263.5us |  -82.3% |    0.18x
      2880   4096 |     1.55ms    261.7us |  -83.1% |    0.17x
    201088   2880 |    24.03ms    21.61ms |  -10.1% |    0.90x
    ------------------------------------------------------------

    Total: 78.97ms vs 31.05ms

Tested with unsloth/gpt-oss-20b-GGUF:Q4_K_M.

Signed-off-by: Adrien Gallouët <redacted>
ggml/src/ggml-cpu/amx/common.h