]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/shortlog
pkg/ggml/sources/llama.cpp
2024-06-14 Johannes GäßlerCUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
2024-06-14 Georgi Gerganovmetal : utilize max shared memory for mul_mat_id (...
2024-06-14 Radoslav Gerganovllama-bench : fix RPC indication (#7936)
2024-06-14 Sigbjørn Skjæretllama : more checks before assuming FIM tokens (#7644)
2024-06-14 Elaineconvert : add Poro-34B-chat tokenizer support (#7713)
2024-06-13 Radoslav Gerganovrpc : fix ggml_backend_rpc_supports_buft() (#7918)
2024-06-13 Galunidreadme : Remove outdated instructions from README.md...
2024-06-13 slarenmove BLAS to a separate backend (#6210)
2024-06-12 Olivier Chafik`build`: rename main → llama-cli, server → llama-server...
2024-06-12 Johannes GäßlerCUDA: fix broken oob check for FA vec f32 kernel (...
2024-06-12 Georgi Gerganovtests : add non-cont unary tests (#7857)
2024-06-12 Georgi Gerganovggml : improve ggml_is_contiguous logic (#7856)
2024-06-12 Georgi Gerganovserver : restore numeric prompts (#7883)
2024-06-12 Meng, Hengyuupdate intel docker oneapi-basekit to 2024.1.1-devel...
2024-06-12 Patrice FerletFix a typo and add Fedora 40 pacakge to install for...
2024-06-11 k.h.laivulkan: select only one device for single gpu with...
2024-06-11 0cc4mUpdate Vulkan RoPE implementation (#7818)
2024-06-11 Deven Mistryfix broken link in pr template (#7880) [no ci]
2024-06-11 Briangithub: move PR template to .github/ root (#7868)
2024-06-11 Johannes Gäßlerllama-bench: more compact markdown tables (#7879)
2024-06-11 Georgi Gerganovtests : check the Python version (#7872)
2024-06-11 Johannes GäßlerCUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K)...
2024-06-11 slarenfix CUDA CI by using a windows-2019 image (#7861)
2024-06-11 Olivier Chafikjson: refine constraint for whitespace to avoid runaway...
2024-06-11 Olivier Chafik`json`: document schema conversion in GBNF readme,...
2024-06-10 Jared Van Bortelcmake : fix CMake requirement for CUDA (#7821)
2024-06-10 slarenci : try win-2019 on server windows test (#7854)
2024-06-10 Georgi Gerganovexamples : remove --instruct remnants (#7846)
2024-06-10 Georgi Gerganovserver : improve "prompt" handling (#7847)
2024-06-10 Johannes GäßlerCUDA: use tensor cores for MMQ (#7676)
2024-06-10 Ben Ashbaughuse the correct SYCL context for host USM allocations...
2024-06-09 Georgi Gerganovflake.lock: Update (#7838)
2024-06-09 Georgi Gerganovimatrix : handle partial entries (#7833)
2024-06-09 Nicolás Pérezdocs: Added initial PR template with directions for...
2024-06-09 mgroeber9110server: do not remove whitespace at the start of a...
2024-06-09 Johannes GäßlerCUDA: revise q8_1 data layout for mul_mat_q (#7824)
2024-06-09 sasha0552convert-hf : set the model name based on cli arg, if...
2024-06-09 compiladeconvert-hf : match model part name prefix and suffix...
2024-06-09 compiladegguf-py : decouple adding metadata from writing in...
2024-06-08 slarenRevert "[SYCL] Update rpc-server.cpp to include SYCL...
2024-06-08 Olivier Chafikurl: save -mu downloads to new cache location (#7826)
2024-06-08 sasha0552server : smart slot selection using Longest Common...
2024-06-07 slarenvulkan : reuse parent extra for views (#7806)
2024-06-07 Christian Zhou... gguf-split : change binary multi-byte units to decimal...
2024-06-07 intelmattcmake : fix BUILD_SHARED_LIBS=ON build (#7784)
2024-06-07 Johannes Gäßlerserver: update cache_prompt documentation [no ci] ...
2024-06-07 woodxserver : do not get prompt in infill mode (#7286)
2024-06-07 pengxin99[SYCL] fix softmax r2r result wrong issue (#7811)
2024-06-07 slarencheck for nans in imatrix and quantize (#7807)
2024-06-06 Georgi Gerganovserver : fix --threads-http arg (#7801)
2024-06-06 Georgi Gerganovimatrix : migrate to gpt_params (#7771)
2024-06-06 Clint HerronAdded support for . (any character) token in grammar...
2024-06-06 Mattheus ChediakREADME minor fixes (#7798) [no ci]
2024-06-06 Olivier Chafikgrammars: x{min,max} repetition operator (#6640)
2024-06-06 Joan Fontanalsllama : add jina v2 base code (#7596)
2024-06-06 slarendocker : build only main and server in their images...
2024-06-06 slarendocker : add openmp lib (#7780)
2024-06-05 GalunidFix encoding in python scripts (#7733)
2024-06-05 Johannes GäßlerCUDA: refactor mmq, dmmv, mmvq (#7716)
2024-06-05 Georgi Gerganovggml : refactor rope norm/neox (#7634)
2024-06-05 arch-btwreadme : remove -ins (#7759)
2024-06-04 jaime-m-pFix per token atrributes bits (#7749)
2024-06-04 agray3Allow number of nodes in CUDA graph to change (#7738)
2024-06-04 Georgi Gerganovcommon : refactor cli arg parsing (#7675)
2024-06-04 Georgi Gerganovggml : remove OpenCL (#7735)
2024-06-04 Georgi Gerganovllama : remove beam search (#7736)
2024-06-04 Georgi Gerganovreadme : remove obsolete Zig instructions (#7471)
2024-06-04 slarenllama-bench : allow using a different printer for stder...
2024-06-04 DanieleImprove hipBLAS support in CMake (#7696)
2024-06-04 zhouwgrefine .gitignore (#7688)
2024-06-04 jaime-m-pPer token attributes (#7685)
2024-06-04 Georgi Gerganovggml : prevent builds with -ffinite-math-only (#7726)
2024-06-03 Radoslav Gerganovllama : offload to RPC in addition to other backends...
2024-06-03 Masaya, Katoggml : use OpenMP as a thread pool (#7606)
2024-06-03 Johannes Gäßlermake: fix debug options not being applied to NVCC ...
2024-06-03 0cc4mVulkan Mixture of Experts (MoE) support (#7628)
2024-06-03 Andy Taicmake : add pkg-config spec file for llama.cpp (#7702)
2024-06-03 zhangkaihuollama : MiniCPM support tied embeddings (#7664)
2024-06-03 Georgi Gerganovllama : avoid double token-to-piece cache (#7654)
2024-06-03 woachkkompute : implement op_getrows_f32 (#6403)
2024-06-02 Dave Airliefix bug introduced in using calloc (#7701)
2024-06-02 Georgi Gerganovflake.lock: Update (#7686)
2024-06-02 Austinchore : add ignore rule for generated server themes...
2024-06-02 nickp27[SYCL] Update rpc-server.cpp to include SYCL backend...
2024-06-01 Johannes GäßlerFix FlashAttention debug test, FP32 assert (#7684)
2024-06-01 Yazan Agha... server : new UI (#7633)
2024-06-01 HanishKVCSimpleChat: Simple histogram/repeatMatching driven...
2024-06-01 Johannes GäßlerCUDA: fix Pascal FA, deq. KV to FP16 for batch > 8...
2024-06-01 Johannes GäßlerCUDA: quantized KV support for FA vec (#7527)
2024-05-31 Georgi Gerganovserver : update js (#7670)
2024-05-31 Galunidconvert-hf : Handle NotImplementedError in convert...
2024-05-31 Johannes Gäßlerscripts: update compare_llama_bench.py [no ci] (#7673)
2024-05-31 DanieleImprove HIP compatibility (#7672)
2024-05-31 Georgi Gerganovreadme : link homebrew discussion
2024-05-31 Georgi Gerganovggml : fix loongson compile warnings (#7537)
2024-05-31 GalunidSomehow '**' got lost (#7663)
2024-05-31 GalunidAdd convert.py removal to hot topics (#7662)
2024-05-30 Sertaç Özercan[no ci] docs: add aikit to readme (#7650)
2024-05-30 JohnnyBFixed painfully slow single process builds. (#7326)
2024-05-30 Georgi Gerganovllama : cache llama_token_to_piece (#7587)
next