Projects with this topic
Sort by:
-
LLM quantization & benchmarking on GPU - GGUF, GPTQ, AWQ, bitsandbytes | Quantification et benchmark de modeles LLM sur GPU
Updated -
Extreme KV Cache Compression for LLM Inference — C++17/CUDA implementation of TurboQuant (arXiv 2504.19874). 7.5x compression, <2% quality loss.
Updated -
Dependency-free, multiplatform cli tool to manipulate palettes in images and animations.
Updated -
This is a project related to the paper: Kyber Under Tightening: Threshold Safety, Failure Margins, and Robustness
Updated -
Implementation of a Keyword Spotting Algorithm (KWS) using deep neural networks in the edge.
Updated