2024-02-18 |
Pierrick Hymbert | server : enhanced health endpoint (#5548) |
commit | commitdiff | tree |
2024-02-18 |
Pierrick Hymbert | server : --n-predict option document and cap to max... |
commit | commitdiff | tree |
2024-02-18 |
Daniel Hiltgen | server : graceful server shutdown (#5244) |
commit | commitdiff | tree |
2024-02-18 |
Georgi Gerganov | common : fix ub (#5530) |
commit | commitdiff | tree |
2024-02-18 |
Herman Semenov | ggml, common, examples, tests : fixed type arguments... |
commit | commitdiff | tree |
2024-02-18 |
Daniel Bevenius | llava : update surgery script to not remove tensors... |
commit | commitdiff | tree |
2024-02-18 |
Kawrakow | 1.5 bit quantization (#5453) |
commit | commitdiff | tree |
2024-02-18 |
github-actions... | flake.lock: Update |
commit | commitdiff | tree |
2024-02-17 |
Georgi Gerganov | ggml : add ALiBi support for ggml_soft_max_ext (#5488) |
commit | commitdiff | tree |
2024-02-17 |
Ananta Bastola | ci : add an option to fail on compile warning (#3952) |
commit | commitdiff | tree |
2024-02-17 |
clibdev | gitignore : update for CLion IDE (#5544) |
commit | commitdiff | tree |
2024-02-16 |
Georgi Gerganov | cmake : fix VULKAN and ROCm builds (#5525) |
commit | commitdiff | tree |
2024-02-16 |
Georgi Gerganov | scripts : add helpers script for bench comparing commit... |
commit | commitdiff | tree |
2024-02-16 |
Herman Semenov | llava : removed excess free(NULL) operation (#5531) |
commit | commitdiff | tree |
2024-02-16 |
Herman Semenov | llama : minor fixed return int value (#5529) |
commit | commitdiff | tree |
2024-02-16 |
Alexey Parfenov | server : add "samplers" param to control the samplers... |
commit | commitdiff | tree |
2024-02-16 |
Rőczey Barnabás | server : fix system prompt cli (#5516) |
commit | commitdiff | tree |
2024-02-16 |
bmwl | ggml : add numa options (#5377) |
commit | commitdiff | tree |
2024-02-16 |
Daniel Bevenius | llava : fix clip-model-is-vision flag in README.md... |
commit | commitdiff | tree |
2024-02-16 |
Georgi Gerganov | ci : fix BERT model download and convert |
commit | commitdiff | tree |
2024-02-15 |
Douglas Hanley | Use correct type of pooling for embedding models (... |
commit | commitdiff | tree |
2024-02-15 |
Georgi Gerganov | clip : fix wrong loop condition |
commit | commitdiff | tree |
2024-02-15 |
slaren | cuda : print message when initialization fails (#5512) |
commit | commitdiff | tree |
2024-02-15 |
Georgi Gerganov | scripts : add hf.sh helper script (#5501) |
commit | commitdiff | tree |
2024-02-15 |
Michaël de... | fix(gguf-py): special tokens are no longer skipped... |
commit | commitdiff | tree |
2024-02-15 |
Elbios | llava : fix memory management bug (#5491) |
commit | commitdiff | tree |
2024-02-15 |
John | llaba : hotfix for llava-1.6 image number (#5495) |
commit | commitdiff | tree |
2024-02-15 |
Neuman Vong | vulkan: Find optimal memory type but with fallback... |
commit | commitdiff | tree |
2024-02-14 |
Rune | readme : fix typo (#5490) |
commit | commitdiff | tree |
2024-02-14 |
John | llava : update README.md (#5489) |
commit | commitdiff | tree |
2024-02-14 |
Michael Podvitskiy | cmake : ARM intrinsics detection for MSVC (#5401) |
commit | commitdiff | tree |
2024-02-14 |
John | llava : support v1.6 (#5267) |
commit | commitdiff | tree |
2024-02-13 |
AT | Early return for zero size calls to get_tensor. (#5482) |
commit | commitdiff | tree |
2024-02-13 |
John | gguf : add python reader example (#5216) |
commit | commitdiff | tree |
2024-02-13 |
Jared Van Bortel | llama : add support for Nomic Embed (#5468) |
commit | commitdiff | tree |
2024-02-13 |
Aarni Koskela | llama : allow raw byte in SPM vocabs; don't crash on... |
commit | commitdiff | tree |
2024-02-13 |
Aarni Koskela | llama : make load error reporting more granular (#5477) |
commit | commitdiff | tree |
2024-02-13 |
Daniel Bevenius | finetune : rename feed-forward tensors (w1/w2/w3) ... |
commit | commitdiff | tree |
2024-02-13 |
Georgi Gerganov | tests : multi-thread the tokenizer tests (#5474) |
commit | commitdiff | tree |
2024-02-13 |
Douglas Hanley | llama : support batched embeddings (#5466) |
commit | commitdiff | tree |
2024-02-13 |
Johannes Gäßler | make: add error message for bad CUDA version (#5444) |
commit | commitdiff | tree |
2024-02-13 |
Georgi Gerganov | bert : add tests + fix quantization (#5475) |
commit | commitdiff | tree |
2024-02-13 |
Georgi Gerganov | tests : disable moe test (#5473) |
commit | commitdiff | tree |
2024-02-13 |
Kawrakow | ggml-quants : fix compiler warnings (shadow variable... |
commit | commitdiff | tree |
2024-02-12 |
Georgi Gerganov | llama : fix quantization when tensors are missing ... |
commit | commitdiff | tree |
2024-02-12 |
Georgi Gerganov | swift : package no longer use ggml dependency (#5465) |
commit | commitdiff | tree |
2024-02-12 |
Lee | py : fix persimmon `n_rot` conversion (#5460) |
commit | commitdiff | tree |
2024-02-12 |
Abhilash Majumder | ggml-sycl: Replace 3d ops with macro (#5458) |
commit | commitdiff | tree |
2024-02-12 |
Daniel Bevenius | llava : remove prog parameter from ArgumentParser ... |
commit | commitdiff | tree |
2024-02-12 |
Georgi Gerganov | sync : ggml (#5452) |
commit | commitdiff | tree |
2024-02-11 |
Johannes Gäßler | CUDA: mul_mat_vec_q tiling, refactor mul mat logic... |
commit | commitdiff | tree |
2024-02-11 |
Douglas Hanley | Add support for BERT embedding models (#5423) |
commit | commitdiff | tree |
2024-02-11 |
github-actions... | flake.lock: Update |
commit | commitdiff | tree |
2024-02-11 |
Sergio López | vulkan: only use M-sized matmul on Apple GPUs (#5412) |
commit | commitdiff | tree |
2024-02-11 |
Alexey Parfenov | common : use enums for sampler types (#5418) |
commit | commitdiff | tree |
2024-02-11 |
Alexey Parfenov | server : allow to specify tokens as strings in logit_bi... |
commit | commitdiff | tree |
2024-02-11 |
Georgi Gerganov | main : ctrl+C print timing in non-interactive mode... |
commit | commitdiff | tree |
2024-02-11 |
Georgi Gerganov | common : fix compile warning |
commit | commitdiff | tree |
2024-02-11 |
Georgi Gerganov | ggml : fix compile warnings (unused vars) (#4966) |
commit | commitdiff | tree |
2024-02-11 |
snadampal | ggml : add mmla kernels for quantized GEMM (#4966) |
commit | commitdiff | tree |
2024-02-11 |
Johannes Gäßler | lookup: add print for drafting performance (#5450) |
commit | commitdiff | tree |
2024-02-11 |
Xuan Son Nguyen | server : add llama2 chat template (#5425) |
commit | commitdiff | tree |
2024-02-10 |
Ian Bull | metal : use autoreleasepool to avoid memory leaks ... |
commit | commitdiff | tree |
2024-02-10 |
Georgi Gerganov | scripts : update sync scripts with new backends |
commit | commitdiff | tree |
2024-02-10 |
Georgi Gerganov | sync : ggml |
commit | commitdiff | tree |
2024-02-10 |
Michael Podvitskiy | ggml : add abort_callback for cpu backend (ggml/725) |
commit | commitdiff | tree |
2024-02-09 |
Neuman Vong | vulkan: Set limit for task concurrency (#5427) |
commit | commitdiff | tree |
2024-02-09 |
Daniel Bevenius | llava : add requirements.txt and update README.md ... |
commit | commitdiff | tree |
2024-02-09 |
Riley Stewart | server : fix prompt caching for repeated prompts (... |
commit | commitdiff | tree |
2024-02-09 |
Paul Tsochantaris | llama : do not cap thread count when MoE on CPU (#5419) |
commit | commitdiff | tree |
2024-02-09 |
Marko Tasic | readme : add JavaScript/Wasm repo (#5415) |
commit | commitdiff | tree |
2024-02-09 |
Michael Podvitskiy | ggml : fix `error C2078: too many initializers` for... |
commit | commitdiff | tree |
2024-02-09 |
0cc4m | Fix Vulkan crash on APUs with very little device memory... |
commit | commitdiff | tree |
2024-02-08 |
Johannes Gäßler | CUDA: more warps for mmvq on NVIDIA (#5394) |
commit | commitdiff | tree |
2024-02-08 |
slaren | llama : do not print "offloading layers" message in... |
commit | commitdiff | tree |
2024-02-08 |
Abhilash Majumder | Fix f16_sycl cpy call from Arc (#5411) |
commit | commitdiff | tree |
2024-02-08 |
Daniel Bevenius | llava : add missing .py, and fix paths in README.md... |
commit | commitdiff | tree |
2024-02-08 |
Johannes Gäßler | fix trailing whitespace (#5407) |
commit | commitdiff | tree |
2024-02-08 |
runfuture | llama : fix MiniCPM (#5392) |
commit | commitdiff | tree |
2024-02-08 |
Daniel Bevenius | llava: fix typo/formatting in README.md (#5405) |
commit | commitdiff | tree |
2024-02-08 |
Johannes Gäßler | sampling: fix top_k <= 0 (#5388) |
commit | commitdiff | tree |
2024-02-08 |
Georgi Gerganov | tests : .gitignore obj files |
commit | commitdiff | tree |
2024-02-07 |
Michael Podvitskiy | CMAKE_OSX_ARCHITECTURES for MacOS cross compilation... |
commit | commitdiff | tree |
2024-02-07 |
Ebey Abraham | fix typo in readme (#5399) |
commit | commitdiff | tree |
2024-02-07 |
Kamil Tomšík | Add Ava in the list of llama.cpp UIs (#4362) |
commit | commitdiff | tree |
2024-02-07 |
Johannes Gäßler | CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (... |
commit | commitdiff | tree |
2024-02-07 |
Neo Zhang Jianyu | [SYCL] update install make by w64devkit (#5297) |
commit | commitdiff | tree |
2024-02-07 |
Xiao-Yong Jin | llava-cli : always tokenize special tokens (#5382) |
commit | commitdiff | tree |
2024-02-07 |
0cc4m | Basic Vulkan Multi-GPU implementation (#5321) |
commit | commitdiff | tree |
2024-02-07 |
Eve | readme : modernize (#5379) |
commit | commitdiff | tree |
2024-02-07 |
Ben Williams | readme : update ui list (#5354) |
commit | commitdiff | tree |
2024-02-07 |
runfuture | llama : add MiniCPM support (#5346) |
commit | commitdiff | tree |
2024-02-07 |
Justin Parker | server : update `/props` with "total_slots" value ... |
commit | commitdiff | tree |
2024-02-07 |
Sang-Kil Park | convert : fix TypeError on GPT-2 vocab.json (#5288) |
commit | commitdiff | tree |
2024-02-06 |
Alexey Parfenov | server : remove model.json endpoint (#5371) |
commit | commitdiff | tree |
2024-02-06 |
Johannes Gäßler | CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) |
commit | commitdiff | tree |
2024-02-06 |
Kawrakow | Update README.md (#5366) |
commit | commitdiff | tree |
2024-02-06 |
Kawrakow | Slight quantization improvement for Q4_K and Q5_K ... |
commit | commitdiff | tree |
2024-02-06 |
BarfingLemurs | readme : add phi, orion 14b, internlm2, and yi-VL to... |
commit | commitdiff | tree |
2024-02-06 |
Johannes Gäßler | CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) |
commit | commitdiff | tree |
next |