2024-06-26 |
Johannes Gäßler | CUDA: fix misaligned shared memory read (#8123) |
commit | commitdiff | tree |
2024-06-26 |
Eddie-Wang | llama : extend llm_build_ffn() to support _scale tensor... |
commit | commitdiff | tree |
2024-06-26 |
Olivier Chafik | `json`: better support for "type" unions (e.g. nullable... |
commit | commitdiff | tree |
2024-06-26 |
Olivier Chafik | `json`: fix additionalProperties, allow space after... |
commit | commitdiff | tree |
2024-06-25 |
jukofyork | fixes #7999 (adds control vectors to all `build_XXX... |
commit | commitdiff | tree |
2024-06-25 |
fairydreaming | llama : implement Unigram tokenizer needed by T5 and... |
commit | commitdiff | tree |
2024-06-25 |
Daniel Bevenius | llama : return nullptr from llama_grammar_init (#8093) |
commit | commitdiff | tree |
2024-06-25 |
Olivier Chafik | `json`: support integer minimum, maximum, exclusiveMini... |
commit | commitdiff | tree |
2024-06-25 |
slaren | disable docker CI on pull requests (#8110) |
commit | commitdiff | tree |
2024-06-25 |
joecryptotoo | Add healthchecks to llama-server containers (#8081) |
commit | commitdiff | tree |
2024-06-25 |
Brian | Gguf dump start data offset via --data-offset and some... |
commit | commitdiff | tree |
2024-06-25 |
Xuan Son Nguyen | cvector: better prompt handling, add "mean vector"... |
commit | commitdiff | tree |
2024-06-25 |
Xuan Son Nguyen | Add chat template support for llama-cli (#8068) |
commit | commitdiff | tree |
2024-06-25 |
HanishKVC | SimpleChat v3.1: Boolean chat request options in Settin... |
commit | commitdiff | tree |
2024-06-25 |
HatsuneMikuUwU33 | Update control vector help (#8104) |
commit | commitdiff | tree |
2024-06-25 |
Meng, Hengyu | [SYCL] Re-enabled mul_mat_batched_sycl (#8095) |
commit | commitdiff | tree |
2024-06-24 |
Johannes Gäßler | CUDA: fix matrix multiplication algorithm choice (... |
commit | commitdiff | tree |
2024-06-24 |
Johannes Gäßler | CUDA: fix MMQ writeback for int8 tensor cores (#8100) |
commit | commitdiff | tree |
2024-06-24 |
Johannes Gäßler | CUDA: use MMQ instead of cuBLAS by default (#8075) |
commit | commitdiff | tree |
2024-06-24 |
fairydreaming | gguf-py : fix tensor groups for encoder-decoder models... |
commit | commitdiff | tree |
2024-06-24 |
Johannes Gäßler | CUDA: optimize MMQ int8 tensor core performance (#8062) |
commit | commitdiff | tree |
2024-06-24 |
Christian Zhou... | Option to split during conversion (#6942) |
commit | commitdiff | tree |
2024-06-24 |
slaren | disable publishing the full-rocm docker image (#8083) |
commit | commitdiff | tree |
2024-06-24 |
Yann Follet | embedding : more cli arguments (#7458) |
commit | commitdiff | tree |
2024-06-24 |
fairydreaming | gguf-py, convert-hf : model conversion support for... |
commit | commitdiff | tree |
2024-06-24 |
slaren | ggml : remove ggml_task_type and GGML_PERF (#8017) |
commit | commitdiff | tree |
2024-06-23 |
Eddie-Wang | llama : add support for BitnetForCausalLM (#7931) |
commit | commitdiff | tree |
2024-06-23 |
Aarni Koskela | server : fix JSON-Scheme typo (#7975) |
commit | commitdiff | tree |
2024-06-23 |
Daniel Bevenius | Fix typo in llama_set_embeddings comment (#8077) |
commit | commitdiff | tree |
2024-06-23 |
slaren | fix CI failures (#8066) |
commit | commitdiff | tree |
2024-06-23 |
0cc4m | Refactor Vulkan backend to allow multiple contexts... |
commit | commitdiff | tree |
2024-06-22 |
Clint Herron | Removing extra blank lines that were breaking Lint... |
commit | commitdiff | tree |
2024-06-22 |
Xuan Son Nguyen | cvector: fix CI + correct help message (#8064) |
commit | commitdiff | tree |
2024-06-22 |
HatsuneMikuUwU33 | cvector-generator: Moe Moe Fixie-Fixie for Lots of... |
commit | commitdiff | tree |
2024-06-22 |
0xspringtime | convert-hf : change assert to exception (#8015) |
commit | commitdiff | tree |
2024-06-22 |
ddh0 | Update llama-quantize ppl/file size output from LLaMA... |
commit | commitdiff | tree |
2024-06-22 |
Clint Herron | JSON Schema to GBNF integration tests (#7790) |
commit | commitdiff | tree |
2024-06-21 |
k.h.lai | vulkan: detect multiple devices by deviceUUID instead... |
commit | commitdiff | tree |
2024-06-21 |
Eve | ggml : AVX IQ quants (#7845) |
commit | commitdiff | tree |
2024-06-21 |
Georgi Gerganov | llama : optimize long word tokenization with WPM (... |
commit | commitdiff | tree |
2024-06-21 |
Douglas Hanley | llama : allow pooled embeddings on any model (#7477) |
commit | commitdiff | tree |
2024-06-21 |
Shuichi Tsutsumi | swiftui : enable stream updating (#7754) |
commit | commitdiff | tree |
2024-06-20 |
Hamdoud Hakem | requirements : Bump torch and numpy for python3.12... |
commit | commitdiff | tree |
2024-06-20 |
Hamdoud Hakem | convert-hf : Fix the encoding in the convert-hf-to... |
commit | commitdiff | tree |
2024-06-20 |
Johannes Gäßler | common: fix warning (#8036) |
commit | commitdiff | tree |
2024-06-20 |
luoyu-intel | [SYCL] Fix windows build and inference (#8003) |
commit | commitdiff | tree |
2024-06-20 |
Johannes Gäßler | CUDA: stream-k decomposition for MMQ (#8018) |
commit | commitdiff | tree |
2024-06-20 |
Michael de... | metal : fix `ggml_metal_supports_op` for BF16 (#8021) |
commit | commitdiff | tree |
2024-06-19 |
sasha0552 | server : fix smart slot selection (#8020) |
commit | commitdiff | tree |
2024-06-19 |
Michael de... | un-ignore `build-info.cmake` and `build-info.sh` (... |
commit | commitdiff | tree |
2024-06-19 |
slaren | ggml : synchronize threads using barriers (#7993) |
commit | commitdiff | tree |
2024-06-19 |
Georgi Gerganov | codecov : remove (#8004) |
commit | commitdiff | tree |
2024-06-19 |
Meng, Hengyu | [SYCL] refactor (#6408) |
commit | commitdiff | tree |
2024-06-18 |
jaime-m-p | tokenizer : BPE fixes (#7530) |
commit | commitdiff | tree |
2024-06-18 |
Sigbjørn Skjæret | Only use FIM middle token if it exists (#7648) |
commit | commitdiff | tree |
2024-06-18 |
jojorne | Fix no gcc pragma on Windows (#7751) |
commit | commitdiff | tree |
2024-06-18 |
Ulrich Drepper | Allow compiling with CUDA without CUDA runtime installe... |
commit | commitdiff | tree |
2024-06-18 |
Frank Mai | chore: clean useless beam search param (#7985) |
commit | commitdiff | tree |
2024-06-18 |
Abheek Gulati | readme : update UI list (#7943) |
commit | commitdiff | tree |
2024-06-18 |
Georgi Gerganov | ggml : sync |
commit | commitdiff | tree |
2024-06-18 |
Georgi Gerganov | whisper : use ggml_backend_sched (whisper/2239) |
commit | commitdiff | tree |
2024-06-17 |
Ștefan-Gabriel... | update: support Qwen2-57B-A14B (#7835) |
commit | commitdiff | tree |
2024-06-17 |
Srihari-mcw | Make updates to type cast based on compiler instead... |
commit | commitdiff | tree |
2024-06-17 |
Georgi Gerganov | llama : disable FA if KV head size do not match (#7982) |
commit | commitdiff | tree |
2024-06-17 |
Bryan Honof | Add Nix and Flox install instructions (#7899) |
commit | commitdiff | tree |
2024-06-17 |
slaren | sched : offload_op also requires supports_op (#7977) |
commit | commitdiff | tree |
2024-06-17 |
Frank Mai | fix: divide 0 exception in mamba (#7932) |
commit | commitdiff | tree |
2024-06-17 |
Markus Tavenrath | Implement non-mapped async IO for CUDA on Windows.... |
commit | commitdiff | tree |
2024-06-17 |
Georgi Gerganov | rpc : fix load/store misaligned addresses (#7948) |
commit | commitdiff | tree |
2024-06-17 |
Brian | gguf-dump.py: add --markdown dump output (#7853) |
commit | commitdiff | tree |
2024-06-17 |
Neo Zhang | [SYCL] Update README-sycl.md for Chapter "Recommended... |
commit | commitdiff | tree |
2024-06-16 |
Calvin Laurenson | Add support for sqrt on CUDA (#7953) |
commit | commitdiff | tree |
2024-06-16 |
Georgi Gerganov | cuda : fix bounds check for src0 rows in MMVQ kernel... |
commit | commitdiff | tree |
2024-06-16 |
Hong Bo PENG | ggml : fix and optimize ppc64le (ggml/849) |
commit | commitdiff | tree |
2024-06-16 |
Daniel Bevenius | ggml : remove duplicate include of ggml-common.h (ggml... |
commit | commitdiff | tree |
2024-06-16 |
Georgi Gerganov | flake.lock: Update (#7951) |
commit | commitdiff | tree |
2024-06-16 |
Georgi Gerganov | unicode : avoid char32_t (#7957) |
commit | commitdiff | tree |
2024-06-16 |
hopkins385 | readme : update UI list [no ci] (#7958) |
commit | commitdiff | tree |
2024-06-16 |
Georgi Gerganov | ggml : fix handling of zero blocks in IQ quants (#7955) |
commit | commitdiff | tree |
2024-06-16 |
Georgi Gerganov | github : update pr template |
commit | commitdiff | tree |
2024-06-16 |
0cc4m | Vulkan Shader Refactor, Memory Debugging Option (#7947) |
commit | commitdiff | tree |
2024-06-15 |
Xuan Son Nguyen | Add `cvector-generator` example (#7514) |
commit | commitdiff | tree |
2024-06-15 |
Meng, Hengyu | [SYCL] remove global variables (#7710) |
commit | commitdiff | tree |
2024-06-14 |
olexiyb | ci : fix macos x86 build (#7940) |
commit | commitdiff | tree |
2024-06-14 |
Johannes Gäßler | CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) |
commit | commitdiff | tree |
2024-06-14 |
Georgi Gerganov | metal : utilize max shared memory for mul_mat_id (... |
commit | commitdiff | tree |
2024-06-14 |
Radoslav Gerganov | llama-bench : fix RPC indication (#7936) |
commit | commitdiff | tree |
2024-06-14 |
Sigbjørn Skjæret | llama : more checks before assuming FIM tokens (#7644) |
commit | commitdiff | tree |
2024-06-14 |
Elaine | convert : add Poro-34B-chat tokenizer support (#7713) |
commit | commitdiff | tree |
2024-06-13 |
Radoslav Gerganov | rpc : fix ggml_backend_rpc_supports_buft() (#7918) |
commit | commitdiff | tree |
2024-06-13 |
Galunid | readme : Remove outdated instructions from README.md... |
commit | commitdiff | tree |
2024-06-13 |
slaren | move BLAS to a separate backend (#6210) |
commit | commitdiff | tree |
2024-06-12 |
Olivier Chafik | `build`: rename main → llama-cli, server → llama-server... |
commit | commitdiff | tree |
2024-06-12 |
Johannes Gäßler | CUDA: fix broken oob check for FA vec f32 kernel (... |
commit | commitdiff | tree |
2024-06-12 |
Georgi Gerganov | tests : add non-cont unary tests (#7857) |
commit | commitdiff | tree |
2024-06-12 |
Georgi Gerganov | ggml : improve ggml_is_contiguous logic (#7856) |
commit | commitdiff | tree |
2024-06-12 |
Georgi Gerganov | server : restore numeric prompts (#7883) |
commit | commitdiff | tree |
2024-06-12 |
Meng, Hengyu | update intel docker oneapi-basekit to 2024.1.1-devel... |
commit | commitdiff | tree |
2024-06-12 |
Patrice Ferlet | Fix a typo and add Fedora 40 pacakge to install for... |
commit | commitdiff | tree |
2024-06-11 |
k.h.lai | vulkan: select only one device for single gpu with... |
commit | commitdiff | tree |
next |