We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 66k 12.1k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.5k 332
Common recipes to run vLLM
Jupyter Notebook 300 108
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python 173 22
Intelligent Router for Mixture-of-Models
Go 2.6k 356
TPU inference for vLLM, with unified JAX and PyTorch support.
Community maintained hardware plugin for vLLM on Ascend
Community maintained hardware plugin for vLLM on Spyre
vLLM Daily Summarization of Merged PRs
A framework for efficient model inference with omni-modality models
Community maintained hardware plugin for vLLM on Intel Gaudi
Community maintained hardware plugin for vLLM on Apple Silicon