mistralrs-community

Community hub for pre-quantized models in UQFF format, ready to run with mistral.rs.

What is mistral.rs?

Fast, flexible LLM inference. A Rust-native inference engine supporting text, image, video, and audio — with built-in quantization, tool calling, web search, and MCP client support.

True multimodality: Text, vision, video, and audio, speech generation, image generation, and embeddings in one engine.
Full quantization control: Choose the precise quantization you want to use, or make your own UQFF with mistralrs quantize.
Built-in web UI: mistralrs serve --ui gives you a web interface instantly.
Hardware-aware: mistralrs tune benchmarks your system and picks optimal quantization + device mapping.
Flexible SDKs: Python package and Rust crate to build your projects.
Agentic features — tool calling, web search, and MCP client built in

Quick Start

Install (Linux/macOS):

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.sh | sh

Install (Windows):

irm https://raw.githubusercontent.com/EricLBuehler/mistral.rs/master/install.ps1 | iex

Run a UQFF model:

mistralrs run -m mistralrs-community/gemma-4-E4B-it-UQFF --from-uqff 4

Or quantize any model on the fly with ISQ:

mistralrs run -m google/gemma-4-E4B-it --isq 4