]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
mtmd: mtmd_audio_streaming_istft (#18645)
authorTarek Dakhran <redacted>
Tue, 6 Jan 2026 20:00:29 +0000 (21:00 +0100)
committerGitHub <redacted>
Tue, 6 Jan 2026 20:00:29 +0000 (21:00 +0100)
commitccbc84a5374bab7a01f68b129411772ddd8e7c79
treebf5dc84b777dd773c7b1716b81ae9c649444d67d
parent68b4d516c305325d31e698c4673b691d2a9d879f
mtmd: mtmd_audio_streaming_istft (#18645)

Change is decoupled from https://github.com/ggml-org/llama.cpp/pull/18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
tools/mtmd/mtmd-audio.cpp
tools/mtmd/mtmd-audio.h