Projects with this topic
Sort by:
-
Official repository of the Ruhr university Neural Network energy representation (RuNNer).
Updated -
VRAM to RAM Offloader for AI and vLLM - High-Performance C++23 KV Cache Engine with Multi-Stream GPU Transfers
Updated -
Extreme KV Cache Compression for LLM Inference — C++17/CUDA implementation of TurboQuant (arXiv 2504.19874). 7.5x compression, <2% quality loss.
Updated -
A "Lab" neural network library for controlled research into network-based machine learning.
Updated