]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/shortlog
pkg/ggml/sources/llama.cpp
2024-02-12 Georgi Gerganovsync : ggml (#5452)
2024-02-11 Johannes GäßlerCUDA: mul_mat_vec_q tiling, refactor mul mat logic...
2024-02-11 Douglas HanleyAdd support for BERT embedding models (#5423)
2024-02-11 github-actions... flake.lock: Update
2024-02-11 Sergio Lópezvulkan: only use M-sized matmul on Apple GPUs (#5412)
2024-02-11 Alexey Parfenovcommon : use enums for sampler types (#5418)
2024-02-11 Alexey Parfenovserver : allow to specify tokens as strings in logit_bi...
2024-02-11 Georgi Gerganovmain : ctrl+C print timing in non-interactive mode...
2024-02-11 Georgi Gerganovcommon : fix compile warning
2024-02-11 Georgi Gerganovggml : fix compile warnings (unused vars) (#4966)
2024-02-11 snadampalggml : add mmla kernels for quantized GEMM (#4966)
2024-02-11 Johannes Gäßlerlookup: add print for drafting performance (#5450)
2024-02-11 Xuan Son Nguyenserver : add llama2 chat template (#5425)
2024-02-10 Ian Bullmetal : use autoreleasepool to avoid memory leaks ...
2024-02-10 Georgi Gerganovscripts : update sync scripts with new backends
2024-02-10 Georgi Gerganovsync : ggml
2024-02-10 Michael Podvitskiyggml : add abort_callback for cpu backend (ggml/725)
2024-02-09 Neuman Vongvulkan: Set limit for task concurrency (#5427)
2024-02-09 Daniel Beveniusllava : add requirements.txt and update README.md ...
2024-02-09 Riley Stewartserver : fix prompt caching for repeated prompts (...
2024-02-09 Paul Tsochantarisllama : do not cap thread count when MoE on CPU (#5419)
2024-02-09 Marko Tasicreadme : add JavaScript/Wasm repo (#5415)
2024-02-09 Michael Podvitskiyggml : fix `error C2078: too many initializers` for...
2024-02-09 0cc4mFix Vulkan crash on APUs with very little device memory...
2024-02-08 Johannes GäßlerCUDA: more warps for mmvq on NVIDIA (#5394)
2024-02-08 slarenllama : do not print "offloading layers" message in...
2024-02-08 Abhilash MajumderFix f16_sycl cpy call from Arc (#5411)
2024-02-08 Daniel Beveniusllava : add missing .py, and fix paths in README.md...
2024-02-08 Johannes Gäßlerfix trailing whitespace (#5407)
2024-02-08 runfuturellama : fix MiniCPM (#5392)
2024-02-08 Daniel Beveniusllava: fix typo/formatting in README.md (#5405)
2024-02-08 Johannes Gäßlersampling: fix top_k <= 0 (#5388)
2024-02-08 Georgi Gerganovtests : .gitignore obj files
2024-02-07 Michael PodvitskiyCMAKE_OSX_ARCHITECTURES for MacOS cross compilation...
2024-02-07 Ebey Abrahamfix typo in readme (#5399)
2024-02-07 Kamil TomšíkAdd Ava in the list of llama.cpp UIs (#4362)
2024-02-07 Johannes GäßlerCUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (...
2024-02-07 Neo Zhang Jianyu[SYCL] update install make by w64devkit (#5297)
2024-02-07 Xiao-Yong Jinllava-cli : always tokenize special tokens (#5382)
2024-02-07 0cc4mBasic Vulkan Multi-GPU implementation (#5321)
2024-02-07 Evereadme : modernize (#5379)
2024-02-07 Ben Williamsreadme : update ui list (#5354)
2024-02-07 runfuturellama : add MiniCPM support (#5346)
2024-02-07 Justin Parkerserver : update `/props` with "total_slots" value ...
2024-02-07 Sang-Kil Parkconvert : fix TypeError on GPT-2 vocab.json (#5288)
2024-02-06 Alexey Parfenovserver : remove model.json endpoint (#5371)
2024-02-06 Johannes GäßlerCUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370)
2024-02-06 KawrakowUpdate README.md (#5366)
2024-02-06 KawrakowSlight quantization improvement for Q4_K and Q5_K ...
2024-02-06 BarfingLemursreadme : add phi, orion 14b, internlm2, and yi-VL to...
2024-02-06 Johannes GäßlerCUDA: mul_mat_vec_q for batch sizes > 1 (#5351)
2024-02-06 Justin Parkerserver : include total "num_slots" in props endpoint...
2024-02-06 Michael Coppolaserver : add `dynatemp_range` and `dynatemp_exponent...
2024-02-06 Niall Coatesserver : various fixes for the prompt field in /complet...
2024-02-06 Georgi Gerganovpy : handle byte tokens in `get_token_type` (#5341)
2024-02-05 Johannes Gäßlermake: Use ccache for faster compilation (#5318)
2024-02-05 Johannes GäßlerREADME: updated introduction (#5343)
2024-02-05 Kawrakowggml : make use of ggml-quants.h possible in C++ code...
2024-02-05 Dr. Tom Murphy... ggml : avoid duplicating function calls using MIN/MAX...
2024-02-05 Kawrakowiq3_xxs: quards for the no-imatrix situation (#5334)
2024-02-05 Guotengpy : fix internlm2-hf convert to gguf (#5305)
2024-02-05 Kawrakowiq2_xxs: tune quantization (#5320)
2024-02-05 Alexey Parfenovserver : allow to get default generation settings for...
2024-02-05 l3utterflycommon : add dynamic temperature parameters to main...
2024-02-05 Georgi Gerganovscripts : fix typos, cleanup (#5303)
2024-02-05 Нияз Гарифзяновscripts : add non-interactive server-llm.sh (#5303)
2024-02-05 chirankoreadme : add CodeShell models to the supported models...
2024-02-05 AidanBeltonS[SYCL] Fix cpy with dims of 3 (#5289)
2024-02-04 github-actions... flake.lock: Update
2024-02-04 KawrakowAdding some imatrix tools (#5302)
2024-02-04 Welby Seelycmake : use set() for LLAMA_WIN_VER (#5298)
2024-02-03 Johannes Gäßlermake: add nvcc info print (#5310)
2024-02-03 Johannes Gäßlermake: fix nvcc optimization flags for host code (#5309)
2024-02-03 Martin Schwaighoferadd Vulkan support to Nix flake
2024-02-03 0cc4mVulkan Intel Fixes, Optimizations and Debugging Flags...
2024-02-03 Michael Klimenkorefactor : switch to emplace_back to avoid extra object...
2024-02-03 Jared Van BortelYaRN : store rope scaling type as int32_t in memory...
2024-02-03 BADRreadme : add tenere in the ui tools list (#5284)
2024-02-03 AidanBeltonSFix im2col with 32fp (#5286)
2024-02-02 kalomazeperplexity : fix KL divergence calculations on Windows...
2024-02-02 Georgi Gerganovscripts : parse wtype in server-llm.sh (#5167)
2024-02-02 Mirror Azurepy : add check for '.attn.masked_bias' layers to GPT2mo...
2024-02-02 AidanBeltonSTidy ggml-sycl (#5261)
2024-02-02 Xuan Son Nguyendocker : add build for SYCL, Vulkan + update readme...
2024-02-02 Meng, Hengyu[SYCL] get MAX_MEM_ALLOC from device property (#5270)
2024-02-02 Neo Zhang Jianyu[SYCL] update guide of SYCL backend (#5254)
2024-02-02 Ian Bullllama : fix memory leak in llama_batch_free (#5252)
2024-02-01 Neo Zhang Jianyuadd --no-mmap in llama-bench (#5257)
2024-02-01 0cc4mVulkan Phi Fix for AMD Proprietary Drivers (#5260)
2024-02-01 slarencuda : fix LLAMA_CUDA_F16 (#5262)
2024-02-01 Ali Nehzatmake : generate .a library for static linking (#5205)
2024-02-01 Guotengllama : support InternLM2 (#5184)
2024-01-31 EveFix broken Vulkan Cmake (properly) (#5230)
2024-01-31 Georgi Gerganovllama : reorder build_orion() at correct place (#5118)
2024-01-31 Georgi Gerganovllama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU...
2024-01-31 Georgi Gerganovmetal : add im2col F32 dst support (#5132)
2024-01-31 JidongZhang-THUllava : add MobileVLM support (#5132)
2024-01-31 Neo Zhang Jianyuformat license text, restore apache license by legal...
2024-01-31 slarenggml : limit n_threads to the max n_tasks (#5238)
2024-01-31 0cc4mVulkan Fixes (#5223)
next