Runyard is a free hardware-aware AI model browser. You enter your CPU, GPU, and VRAM and it instantly shows every local LLM that will run on your machine, ranked by speed and quality.

How much VRAM do I need to run local LLMs?

8GB of VRAM runs 7B models like Llama 3.1 8B and Mistral 7B at Q4 quantization. 16GB unlocks 13B models. 24GB lets you run Mixtral 8x7B and Llama 3 70B at lower quantization.

What is the best local LLM for my GPU?

Use Runyard at runyard.dev — enter your GPU and VRAM and the Model Radar will rank every compatible LLM for your exact hardware, showing estimated tokens per second for each model.

Can I run Llama 3 locally?

Yes. Llama 3.1 8B at Q4 runs on any 8GB VRAM GPU. Llama 3.1 70B needs around 40GB VRAM at Q4, or an Apple Silicon Mac with 64GB+ unified memory.

RUNYARD.DEV / COMPARE

Compare Devices

Pick two devices · Device B can be wrapped with TurboQuant

Device ANormal

VRAM

RAM

Cores

Backend

Bandwidth~504 GB/s

Device B✦ TurboQuant

VRAM

RAM

Cores

Backend

Bandwidth~504 GB/s

Sort

A: 30Tie: 361B: 175

Model

RTX 4070 · Normal

RTX 4070 · ✦ TQ

✦gemma-3n-E4B-it8BUNLOCKED

Multimodal

RUNS GREAT73/100~27.5 tok/s

ctx7K

✦ RUNS GREAT73/100~27.5 tok/s

7K→28K✦

✦Qwen3-8B8BUNLOCKED

Reasoning

RUNS GREAT72/100~27.5 tok/s

ctx7K

✦ RUNS GREAT72/100~27.5 tok/s

7K→28K✦

✦DeepSeek-R1-0528-Qwen3-8B8.2BUNLOCKED

Reasoning

RUNS GREAT72/100~26.8 tok/s

ctx6K

✦ RUNS GREAT73/100~26.8 tok/s

6K→26K✦

✦granite-3.1-8b-instruct8BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→28K✦

✦Qwen-7B7.7BUNLOCKED

General

RUNS GREAT69/100~28.6 tok/s

ctx8K

✦ RUNS GREAT69/100~28.6 tok/s

8K→32K✦

✦Qwen1.5-7B7.7BUNLOCKED

General

RUNS GREAT69/100~28.6 tok/s

ctx8K

✦ RUNS GREAT69/100~28.6 tok/s

8K→32K✦

✦EXAONE-Deep-7.8B7.8BUNLOCKED

General

RUNS GREAT69/100~28.2 tok/s

ctx8K

✦ RUNS GREAT69/100~28.2 tok/s

8K→30K✦

✦MiMo-7B-Base7.8BUNLOCKED

General

RUNS GREAT69/100~28.2 tok/s

ctx8K

✦ RUNS GREAT69/100~28.2 tok/s

8K→30K✦

✦saiga_llama3_8b8BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→8K✦

✦Hermes-3-Llama-3.1-8B8BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→28K✦

✦Llama-3.1-Nemotron-Nano-8B-v18BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→28K✦

✦Meta-Llama-3.1-8B-FP88BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→28K✦

✦llava-onevision-qwen2-7b-ov8BUNLOCKED

General

RUNS GREAT69/100~27.5 tok/s

ctx7K

✦ RUNS GREAT69/100~27.5 tok/s

7K→28K✦

✦gemma-2-9b-it9BUNLOCKED

General

RUNS GREAT68/100~24.4 tok/s

ctx4K

✦ RUNS GREAT69/100~24.4 tok/s

4K→8K✦

✦Qwen3-14B-NVFP48.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦Qwen3-8B-Base8.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦Qwen3-8B-AWQ8.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦Qwen3-8B-FP88.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦Qwen3-8B.w8a88.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦Qwen3-8B-FP8-dynamic8.2BUNLOCKED

General

RUNS GREAT68/100~26.8 tok/s

ctx6K

✦ RUNS GREAT69/100~26.8 tok/s

6K→26K✦

✦LFM2-8B-A1B8.3BMoEUNLOCKED

General

RUNS GREAT68/100~21.2 tok/s

ctx6K

✦ RUNS GREAT69/100~21.2 tok/s

6K→25K✦

✦NVIDIA-Nemotron-Nano-9B-v28.9BUNLOCKED

General

RUNS GREAT68/100~24.7 tok/s

ctx5K

✦ RUNS GREAT69/100~24.7 tok/s

5K→18K✦

✦NVIDIA-Nemotron-Nano-9B-v2-Japanese8.9BUNLOCKED

General

RUNS GREAT68/100~24.7 tok/s

ctx5K

✦ RUNS GREAT69/100~24.7 tok/s

5K→18K✦

✦NVIDIA-Nemotron-Nano-9B-v2-Base8.9BUNLOCKED

General

RUNS GREAT68/100~24.7 tok/s

ctx5K

✦ RUNS GREAT69/100~24.7 tok/s

5K→18K✦

✦NVIDIA-Nemotron-Nano-9B-v2-FP88.9BUNLOCKED

General

RUNS GREAT68/100~24.7 tok/s

ctx5K

✦ RUNS GREAT69/100~24.7 tok/s

5K→18K✦

✦QwQ-32B-MLX-8bit9.2BUNLOCKED

General

RUNS GREAT68/100~23.9 tok/s

ctx4K

✦ RUNS GREAT69/100~23.9 tok/s

4K→15K✦

✦glm-4-9b9.4BUNLOCKED

General

RUNS GREAT68/100~23.4 tok/s

ctx3K

✦ RUNS GREAT69/100~23.4 tok/s

3K→8K✦

✦Qwen2.5-Coder-32B-Instruct-MLX-8bit9.2BUNLOCKED

Coding

RUNS GREAT67/100~23.9 tok/s

ctx4K

✦ RUNS GREAT68/100~23.9 tok/s

4K→15K✦

✦Qwen3-Coder-30B-A3B-Instruct-gptq-8bit9.3BMoEUNLOCKED

Coding

RUNS GREAT66/100~18.9 tok/s

ctx4K

✦ RUNS GREAT67/100~18.9 tok/s

4K→15K✦

✦Llama-3.1-8B-Instruct8BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Qwen-7B-Chat7.7BUNLOCKED

Chat

RUNS GREAT63/100~28.6 tok/s

ctx8K

✦ RUNS GREAT63/100~28.6 tok/s

8K→32K✦

✦salamandra-7b-instruct7.8BUNLOCKED

Chat

RUNS GREAT63/100~28.2 tok/s

ctx8K

✦ RUNS GREAT63/100~28.2 tok/s

8K→8K✦

✦Ministral-8B-Instruct-24108BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Meta-Llama-3.1-8B-Instruct8BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Llama-3.1-8B-Instruct-FP88BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Llama-3-Patronus-Lynx-8B-Instruct-v1.18BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Meta-Llama-3.1-8B-Instruct-FP88BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Meta-Llama-3.1-8B-Instruct-quantized.w4a168BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦Meta-Llama-3.1-8B-Instruct-FP8-dynamic8BUNLOCKED

Chat

RUNS GREAT63/100~27.5 tok/s

ctx7K

✦ RUNS GREAT63/100~27.5 tok/s

7K→28K✦

✦granite-3.3-8b-instruct8.2BUNLOCKED

Chat

RUNS GREAT62/100~26.8 tok/s

ctx6K

✦ RUNS GREAT63/100~26.8 tok/s

6K→26K✦

✦SDAR-8B-Chat-b328.2BUNLOCKED

Chat

RUNS GREAT62/100~26.8 tok/s

ctx6K

✦ RUNS GREAT63/100~26.8 tok/s

6K→26K✦

✦Qwen2.5-VL-7B-Instruct8.3BUNLOCKED

Chat

RUNS GREAT62/100~26.5 tok/s

ctx6K

✦ RUNS GREAT63/100~26.5 tok/s

6K→25K✦

✦rnj-1-instruct8.3BUNLOCKED

Chat

RUNS GREAT62/100~26.5 tok/s

ctx6K

✦ RUNS GREAT63/100~26.5 tok/s

6K→25K✦

✦Mistral-NeMo-Minitron-8B-Instruct8.4BUNLOCKED

Chat

RUNS GREAT62/100~26.2 tok/s

ctx6K

✦ RUNS GREAT63/100~26.2 tok/s

6K→8K✦

✦Qwen3-30B-A3B-Instruct-2507-AWQ-8bit9BMoEUNLOCKED

Chat

RUNS GREAT61/100~19.6 tok/s

ctx4K

✦ RUNS GREAT62/100~19.6 tok/s

4K→17K✦

✦glm-4-9b-chat-hf9.4BUNLOCKED

Chat

RUNS GREAT61/100~23.4 tok/s

ctx3K

✦ RUNS GREAT63/100~23.4 tok/s

3K→14K✦

✦glm-4-9b-chat9.4BUNLOCKED

Chat

RUNS GREAT61/100~23.4 tok/s

ctx3K

✦ RUNS GREAT63/100~23.4 tok/s

3K→14K✦

✦Qwen3.5-9B9.7BUNLOCKED

General

RUNS WELL49/100~22.7 tok/s

ctx3K

✦ RUNS WELL51/100~22.7 tok/s

3K→11K✦

✦Qwen3.5-9B-Base9.7BUNLOCKED

General

RUNS WELL49/100~22.7 tok/s

ctx3K

✦ RUNS WELL51/100~22.7 tok/s

3K→11K✦

Qwen2.5-VL-7B7B

Multimodal

RUNS GREAT73/100~31.4 tok/s

ctx10K

✦ RUNS GREAT73/100~31.4 tok/s

10K→32K✦

DeepSeek-R1-7B7B

Reasoning

RUNS GREAT72/100~31.4 tok/s

ctx10K

✦ RUNS GREAT72/100~31.4 tok/s

10K→42K✦

MiMo-7B-RL7B

Reasoning

RUNS GREAT72/100~31.4 tok/s

ctx10K

✦ RUNS GREAT72/100~31.4 tok/s

10K→33K✦

DeepSeek-R1-Distill-Qwen-7B7.6B

Reasoning

RUNS GREAT72/100~28.9 tok/s

ctx8K

✦ RUNS GREAT72/100~28.9 tok/s

8K→33K✦

Orca-2-7B7B

Reasoning

RUNS GREAT71/100~31.4 tok/s

ctx4K

RUNS GREAT71/100~31.4 tok/s

ctx4K

Qwen2.5-7B-Instruct7B

General

RUNS GREAT69/100~31.4 tok/s

ctx10K

✦ RUNS GREAT69/100~31.4 tok/s

10K→42K✦

Qwen3-32B-quantized.w4a165.7B

General

RUNS GREAT69/100~38.6 tok/s

ctx17K

✦ RUNS GREAT69/100~38.6 tok/s

17K→41K✦

zephyr-7b-beta7.2B

General

RUNS GREAT69/100~30.6 tok/s

ctx10K

✦ RUNS GREAT69/100~30.6 tok/s

10K→33K✦

Mistral-7B-v0.17.2B

General

RUNS GREAT69/100~30.6 tok/s

ctx10K

✦ RUNS GREAT69/100~30.6 tok/s

10K→33K✦

prometheus-7b-v2.07.2B

General

RUNS GREAT69/100~30.6 tok/s

ctx10K

✦ RUNS GREAT69/100~30.6 tok/s

10K→33K✦

xLAM-7b-r7.2B

General

RUNS GREAT69/100~30.6 tok/s

ctx10K

✦ RUNS GREAT69/100~30.6 tok/s

10K→33K✦

dolphin-2.6-mistral-7b7.2B

General

RUNS GREAT69/100~30.6 tok/s

ctx10K

✦ RUNS GREAT69/100~30.6 tok/s

10K→33K✦

Olmo-3-1025-7B7.3B

General

RUNS GREAT69/100~30.1 tok/s

ctx9K

✦ RUNS GREAT69/100~30.1 tok/s

9K→37K✦

Qwen2.5-7B7.6B

General

RUNS GREAT69/100~28.9 tok/s

ctx8K

✦ RUNS GREAT69/100~28.9 tok/s

8K→33K✦

SWE-agent-LM-7B7.6B

General

RUNS GREAT69/100~28.9 tok/s

ctx8K

✦ RUNS GREAT69/100~28.9 tok/s

8K→33K✦

Qwen2-7B7.6B

General

RUNS GREAT69/100~28.9 tok/s

ctx8K

✦ RUNS GREAT69/100~28.9 tok/s

8K→33K✦

VulnLLM-R-7B7.6B

General

RUNS GREAT69/100~28.9 tok/s

ctx8K

✦ RUNS GREAT69/100~28.9 tok/s

8K→33K✦

Qwen2.5-Coder-7B7B

Coding

RUNS GREAT68/100~31.4 tok/s

ctx10K

✦ RUNS GREAT68/100~31.4 tok/s

10K→42K✦

granite-3.0-8b-instruct8B

General

RUNS GREAT68/100~27.5 tok/s

ctx4K

RUNS GREAT68/100~27.5 tok/s

ctx4K

CodeLlama-7B-Instruct7B

Coding

RUNS GREAT68/100~31.4 tok/s

ctx10K

✦ RUNS GREAT68/100~31.4 tok/s

10K→16K✦

StarCoder2-7B7B

Coding

RUNS GREAT68/100~31.4 tok/s

ctx10K

✦ RUNS GREAT68/100~31.4 tok/s

10K→16K✦

GLM-4.7-Flash-AWQ-4bit6.4BMoE

General

RUNS GREAT68/100~27.5 tok/s

ctx13K

✦ RUNS GREAT68/100~27.5 tok/s

13K→52K✦

Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC6.7B

General

RUNS GREAT68/100~32.8 tok/s

ctx4K

RUNS GREAT67/100~32.8 tok/s

ctx4K

Nous-Hermes-llama-2-7b6.7B

General

RUNS GREAT68/100~32.8 tok/s

ctx4K

RUNS GREAT67/100~32.8 tok/s

ctx4K

Llama-2-7b-hf6.7B

General

RUNS GREAT68/100~32.8 tok/s

ctx4K

RUNS GREAT67/100~32.8 tok/s

ctx4K

CodeLlama-7b-hf6.7B

Coding

RUNS GREAT68/100~32.8 tok/s

ctx12K

RUNS GREAT67/100~32.8 tok/s

12K→16K✦

deepseek-coder-6.7b-instruct6.7B

Coding

RUNS GREAT68/100~32.8 tok/s

ctx12K

RUNS GREAT67/100~32.8 tok/s

12K→16K✦

deepseek-coder-6.7b-base6.7B

Coding

RUNS GREAT68/100~32.8 tok/s

ctx12K

RUNS GREAT67/100~32.8 tok/s

12K→16K✦

Orca-2-7b7B

General

RUNS GREAT68/100~31.4 tok/s

ctx4K

RUNS GREAT68/100~31.4 tok/s

ctx4K

Tarsier-7b7.1B

General

RUNS GREAT68/100~31 tok/s

ctx4K

RUNS GREAT68/100~31 tok/s

ctx4K

starcoder2-7b7.2B

Coding

RUNS GREAT68/100~30.6 tok/s

ctx10K

✦ RUNS GREAT68/100~30.6 tok/s

10K→16K✦

falcon-7b7.2B

General

RUNS GREAT68/100~30.6 tok/s

ctx4K

RUNS GREAT67/100~30.6 tok/s

ctx4K

wildguard7.2B

General

RUNS GREAT68/100~30.6 tok/s

ctx4K

RUNS GREAT67/100~30.6 tok/s

ctx4K

starcoder2-7b-GPTQ7.4B

Coding

RUNS GREAT68/100~29.7 tok/s

ctx9K

✦ RUNS GREAT68/100~29.7 tok/s

9K→16K✦

Qwen2.5-Math-7B7.6B

General

RUNS GREAT68/100~28.9 tok/s

ctx4K

RUNS GREAT68/100~28.9 tok/s

ctx4K

Llama-3.1-8B8B

General

RUNS GREAT68/100~27.5 tok/s

ctx4K

RUNS GREAT68/100~27.5 tok/s

ctx4K

Meta-Llama-3-8B8B

General

RUNS GREAT68/100~27.5 tok/s

ctx4K

RUNS GREAT68/100~27.5 tok/s

ctx4K

Llama-Guard-3-8B8B

General

RUNS GREAT68/100~27.5 tok/s

ctx4K

RUNS GREAT68/100~27.5 tok/s

ctx4K

llama-3.1-8b-bias-reduced8B

General

RUNS GREAT68/100~27.5 tok/s

ctx4K

RUNS GREAT68/100~27.5 tok/s

ctx4K

Qwen3-235B-A22B235BMoE

Reasoning

RUNS WELL67/100~0.6 tok/s

ctx927

✦ RUNS WELL68/100~0.6 tok/s

927→4K✦

CodeLlama-7b-Instruct-hf6.7B

Coding

RUNS GREAT67/100~32.8 tok/s

ctx4K

RUNS GREAT66/100~32.8 tok/s

ctx4K

OLMoE-1B-7B-01256.9BMoE

General

RUNS GREAT67/100~25.5 tok/s

ctx4K

RUNS GREAT67/100~25.5 tok/s

ctx4K

Qwen2.5-Coder-7B-Instruct7.6B

Coding

RUNS GREAT67/100~28.9 tok/s

ctx8K

✦ RUNS GREAT68/100~28.9 tok/s

8K→33K✦

Qwen2.5-Coder-7B-Instruct-GPTQ-Int47.6B

Coding

RUNS GREAT67/100~28.9 tok/s

ctx8K

✦ RUNS GREAT68/100~28.9 tok/s

8K→33K✦

Qwen2.5-Coder-7B-Instruct-AWQ7.6B

Coding

RUNS GREAT67/100~28.9 tok/s

ctx8K

✦ RUNS GREAT68/100~28.9 tok/s

8K→33K✦

hf-moshiko7.8B

General

RUNS GREAT67/100~28.2 tok/s

ctx3K

RUNS GREAT67/100~28.2 tok/s

ctx3K

Qwen3-30B-A3B30BMoE

Reasoning

RUNS WELL66/100~5.9 tok/s

ctx47K

✦ RUNS WELL66/100~5.9 tok/s

47K→128K✦

OLMo-7B-Instruct7B

General

RUNS GREAT66/100~31.4 tok/s

ctx2K

✦ RUNS GREAT67/100~31.4 tok/s

ctx2K

Falcon-7B-Instruct7B

General

RUNS GREAT66/100~31.4 tok/s

ctx2K

✦ RUNS GREAT67/100~31.4 tok/s

ctx2K

Amber6.7B

General

RUNS GREAT66/100~32.8 tok/s

ctx2K

RUNS GREAT66/100~32.8 tok/s

ctx2K

llama-7b6.7B

General

RUNS GREAT66/100~32.8 tok/s

ctx2K

RUNS GREAT66/100~32.8 tok/s

ctx2K

pythia-6.9b7B

General

RUNS GREAT66/100~31.4 tok/s

ctx2K

✦ RUNS GREAT67/100~31.4 tok/s

ctx2K

Llama-3.2-1B-Instruct1B

Chat

RUNS WELL65/100~220 tok/s

ctx128K

RUNS WELL65/100~220 tok/s

ctx128K

gemma-3-1b-it1B

Chat

RUNS WELL65/100~220 tok/s

ctx33K

RUNS WELL65/100~220 tok/s

ctx33K

Qwen3-4B-Instruct-2507-MLX-8bit1.1B

Chat

RUNS WELL65/100~200 tok/s

ctx158K

✦ RUNS WELL65/100~200 tok/s

158K→262K✦

Qwen3-4B-Instruct-2507-MLX-5bit0.8B

Chat

RUNS WELL64/100~275 tok/s

ctx223K

✦ RUNS WELL64/100~275 tok/s

223K→262K✦

Qwen3-4B-Instruct-2507-MLX-6bit0.9B

Chat

RUNS WELL64/100~244.4 tok/s

ctx196K

✦ RUNS WELL65/100~244.4 tok/s

196K→262K✦

Mistral-7B-Instruct7B

Chat

RUNS GREAT63/100~31.4 tok/s

ctx10K

✦ RUNS GREAT64/100~31.4 tok/s

10K→32K✦

Qwen2.5-0.5B-Instruct0.5B

Chat

RUNS WELL63/100~440 tok/s

ctx128K

RUNS WELL63/100~440 tok/s

ctx128K

Qwen1.5-0.5B-Chat0.6B

Chat

RUNS WELL63/100~366.7 tok/s

ctx33K

RUNS WELL63/100~366.7 tok/s

ctx33K

Qwen3-4B-Instruct-2507-MLX-4bit0.6B

Chat

RUNS WELL63/100~366.7 tok/s

ctx262K

RUNS WELL63/100~366.7 tok/s

ctx262K

TinyLlama-1.1B-Chat-v1.01.1B

Chat

RUNS WELL63/100~200 tok/s

ctx2K

RUNS WELL63/100~200 tok/s

ctx2K

tinyllama-oneshot-w8w8-test-static-shape-change1.1B

Chat

RUNS WELL63/100~200 tok/s

ctx2K

RUNS WELL63/100~200 tok/s

ctx2K

LFM2.5-1.2B-Instruct1.2B

Chat

RUNS WELL63/100~183.3 tok/s

ctx128K

RUNS WELL63/100~183.3 tok/s

ctx128K

Vikhr-Llama-3.2-1B-Instruct1.2B

Chat

RUNS WELL63/100~183.3 tok/s

ctx131K

RUNS WELL63/100~183.3 tok/s

ctx131K

openchat-3.5-01067B

Chat

RUNS GREAT63/100~31.4 tok/s

ctx8K

RUNS GREAT63/100~31.4 tok/s

ctx8K

Mistral-7B-Instruct-v0.27.2B

Chat

RUNS GREAT63/100~30.6 tok/s

ctx10K

✦ RUNS GREAT63/100~30.6 tok/s

10K→33K✦

Mistral-7B-Instruct-v0.37.2B

Chat

RUNS GREAT63/100~30.6 tok/s

ctx10K

✦ RUNS GREAT63/100~30.6 tok/s

10K→33K✦

Mistral-7B-Instruct-v0.3-GPTQ7.2B

Chat

RUNS GREAT63/100~30.6 tok/s

ctx10K

✦ RUNS GREAT63/100~30.6 tok/s

10K→33K✦

Olmo-3-7B-Instruct-SFT7.3B

Chat

RUNS GREAT63/100~30.1 tok/s

ctx9K

✦ RUNS GREAT63/100~30.1 tok/s

9K→37K✦

Falcon3-7B-Instruct7.5B

Chat

RUNS GREAT63/100~29.3 tok/s

ctx9K

✦ RUNS GREAT64/100~29.3 tok/s

9K→33K✦

Qwen2-7B-Instruct7.6B

Chat

RUNS GREAT63/100~28.9 tok/s

ctx8K

✦ RUNS GREAT63/100~28.9 tok/s

8K→33K✦

XCurOS-0.1-8B-Instruct7.6B

Chat

RUNS GREAT63/100~28.9 tok/s

ctx8K

✦ RUNS GREAT63/100~28.9 tok/s

8K→33K✦

Dream-v0-Instruct-7B7.6B

Chat

RUNS GREAT63/100~28.9 tok/s

ctx8K

✦ RUNS GREAT63/100~28.9 tok/s

8K→33K✦

Qwen2.5-7B-Instruct-GPTQ-Int47.6B

Chat

RUNS GREAT63/100~28.9 tok/s

ctx8K

✦ RUNS GREAT63/100~28.9 tok/s

8K→33K✦

Qwen2.5-7B-Instruct-1M7.6B

Chat

RUNS GREAT63/100~28.9 tok/s

ctx8K

✦ RUNS GREAT63/100~28.9 tok/s

8K→33K✦

Yi-6B-Chat6.1B

Chat

RUNS GREAT62/100~36.1 tok/s

ctx4K

RUNS GREAT62/100~36.1 tok/s

ctx4K

vicuna-7b-v1.56.7B

Chat

RUNS GREAT62/100~32.8 tok/s

ctx4K

RUNS GREAT62/100~32.8 tok/s

ctx4K

Llama-2-7b-chat-hf6.7B

Chat

RUNS GREAT62/100~32.8 tok/s

ctx4K

RUNS GREAT62/100~32.8 tok/s

ctx4K

granite-4.0-h-tiny6.9BMoE

Chat

RUNS GREAT62/100~25.5 tok/s

ctx11K

✦ RUNS GREAT62/100~25.5 tok/s

11K→43K✦

falcon-7b-instruct7.2B

Chat

RUNS GREAT62/100~30.6 tok/s

ctx4K

RUNS GREAT62/100~30.6 tok/s

ctx4K

falcon-mamba-7b-instruct7.3B

Chat

RUNS GREAT62/100~30.1 tok/s

ctx4K

RUNS GREAT62/100~30.1 tok/s

ctx4K

Qwen2.5-Math-7B-Instruct7.6B

Chat

RUNS GREAT62/100~28.9 tok/s

ctx4K

RUNS GREAT62/100~28.9 tok/s

ctx4K

Meta-Llama-3-8B-Instruct8B

Chat

RUNS GREAT62/100~27.5 tok/s

ctx4K

RUNS GREAT62/100~27.5 tok/s

ctx4K

Zamba2-1.2B-instruct1.2B

Chat

RUNS WELL61/100~183.3 tok/s

ctx4K

RUNS WELL61/100~183.3 tok/s

ctx4K

Abliterated-Llama-3.2-1B-Instruct1.2B

Chat

RUNS WELL61/100~183.3 tok/s

ctx4K

RUNS WELL61/100~183.3 tok/s

ctx4K

OLMoE-1B-7B-0125-Instruct6.9BMoE

Chat

RUNS GREAT61/100~25.5 tok/s

ctx4K

RUNS GREAT61/100~25.5 tok/s

ctx4K

Phi-mini-MoE-instruct7.6BMoE

Chat

RUNS GREAT61/100~23.2 tok/s

ctx4K

RUNS GREAT61/100~23.2 tok/s

ctx4K

Qwen3-4B-Thinking-2507-MLX-8bit1.1B

General

RUNS WELL60/100~200 tok/s

ctx158K

✦ RUNS WELL60/100~200 tok/s

158K→262K✦

Qwen3-0.6B0.8B

General

RUNS WELL59/100~275 tok/s

ctx41K

RUNS WELL59/100~275 tok/s

ctx41K

Qwen3Guard-Gen-0.6B0.8B

General

RUNS WELL59/100~275 tok/s

ctx33K

RUNS WELL59/100~275 tok/s

ctx33K

Qwen3-0.6B-FP80.8B

General

RUNS WELL59/100~275 tok/s

ctx41K

RUNS WELL59/100~275 tok/s

ctx41K

Qwen3.5-0.8B0.9B

General

RUNS WELL59/100~244.4 tok/s

ctx196K

✦ RUNS WELL60/100~244.4 tok/s

196K→262K✦

Qwen3.5-0.8B-Base0.9B

General

RUNS WELL59/100~244.4 tok/s

ctx196K

✦ RUNS WELL60/100~244.4 tok/s

196K→262K✦

Qwen3-4B-Thinking-2507-MLX-6bit0.9B

General

RUNS WELL59/100~244.4 tok/s

ctx196K

✦ RUNS WELL60/100~244.4 tok/s

196K→262K✦

LFM2-1.2B1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2.5-1.2B-Thinking1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2.5-1.2B-JP1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2-1.2B-Tool1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2-1.2B-RAG1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2-1.2B-Extract1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2.5-1.2B-Base1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

LFM2-1.2B-MLX-bf161.2B

General

RUNS WELL59/100~183.3 tok/s

ctx128K

RUNS WELL59/100~183.3 tok/s

ctx128K

Ilama-3.2-1B1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx131K

RUNS WELL59/100~183.3 tok/s

ctx131K

CyberXP_Agent_Llama_3.2_1B1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx131K

RUNS WELL59/100~183.3 tok/s

ctx131K

Orpo-Llama-3.2-1B-15k1.2B

General

RUNS WELL59/100~183.3 tok/s

ctx131K

RUNS WELL59/100~183.3 tok/s

ctx131K

Qwen3-4B-MLX-4bit0.6B

General

RUNS WELL58/100~366.7 tok/s

ctx66K

RUNS WELL58/100~366.7 tok/s

ctx66K

Qwen1.5-0.5B0.6B

General

RUNS WELL58/100~366.7 tok/s

ctx33K

RUNS WELL58/100~366.7 tok/s

ctx33K

Qwen3-4B-Thinking-2507-MLX-4bit0.6B

General

RUNS WELL58/100~366.7 tok/s

ctx262K

RUNS WELL58/100~366.7 tok/s

ctx262K

LFM2-700M0.7B

General

RUNS WELL58/100~314.3 tok/s

ctx128K

RUNS WELL58/100~314.3 tok/s

ctx128K

Qwen3-8B-speculator.eagle31B

General

RUNS WELL58/100~220 tok/s

ctx4K

✦ RUNS WELL59/100~220 tok/s

ctx4K

pythia-1b1.1B

General

RUNS WELL58/100~200 tok/s

ctx2K

RUNS WELL58/100~200 tok/s

ctx2K

Qwen2.5-1.5B-Instruct1.5B

Chat

RUNS WELL57/100~146.7 tok/s

ctx111K

✦ RUNS WELL57/100~146.7 tok/s

111K→128K✦

Qwen3-4B4B

Reasoning

RUNS WELL57/100~55 tok/s

ctx31K

RUNS WELL56/100~55 tok/s

31K→124K✦

Falcon-H1-0.5B-Base0.5B

General

RUNS WELL57/100~440 tok/s

ctx16K

RUNS WELL57/100~440 tok/s

ctx16K

Qwen3-4B-DFlash-b160.5B

General

RUNS WELL57/100~440 tok/s

ctx41K

RUNS WELL57/100~440 tok/s

ctx41K

h2ovl-mississippi-800m0.8B

General

RUNS WELL57/100~275 tok/s

ctx4K

RUNS WELL57/100~275 tok/s

ctx4K

ELM0.9B

General

RUNS WELL57/100~244.4 tok/s

ctx2K

RUNS WELL57/100~244.4 tok/s

ctx2K

Llama-3.2-1B1.2B

General

RUNS WELL57/100~183.3 tok/s

ctx4K

RUNS WELL57/100~183.3 tok/s

ctx4K

Jan-nano-AWQ1.3B

General

RUNS WELL57/100~169.2 tok/s

ctx41K

✦ RUNS WELL58/100~169.2 tok/s

ctx41K

EXAONE-4.0-1.2B1.3B

General

RUNS WELL57/100~169.2 tok/s

ctx66K

✦ RUNS WELL58/100~169.2 tok/s

ctx66K

Qwen3-8B-MLX-4bit1.3B

General

RUNS WELL57/100~169.2 tok/s

ctx41K

✦ RUNS WELL58/100~169.2 tok/s

ctx41K

plamo-2-1b1.3B

General

RUNS WELL57/100~169.2 tok/s

ctx131K

✦ RUNS WELL58/100~169.2 tok/s

131K→523K✦

Llama-3.2-1B-Instruct-FP81.5B

Chat

RUNS WELL57/100~146.7 tok/s

ctx111K

✦ RUNS WELL57/100~146.7 tok/s

111K→131K✦

Llama-3.2-1B-Instruct-FP8-dynamic1.5B

Chat

RUNS WELL57/100~146.7 tok/s

ctx111K

✦ RUNS WELL57/100~146.7 tok/s

111K→131K✦

Qwen2-1.5B-Instruct1.5B

Chat

RUNS WELL57/100~146.7 tok/s

ctx33K

RUNS WELL57/100~146.7 tok/s

ctx33K

Qwen2-1.5B-Instruct-FP81.5B

Chat

RUNS WELL57/100~146.7 tok/s

ctx33K

RUNS WELL57/100~146.7 tok/s

ctx33K

bloom-560m0.6B

General

RUNS WELL56/100~366.7 tok/s

ctx4K

RUNS WELL56/100~366.7 tok/s

ctx4K

GA_Guard_Lite0.6B

General

RUNS WELL56/100~366.7 tok/s

ctx4K

RUNS WELL56/100~366.7 tok/s

ctx4K

gpt_bigcode-santacoder1.1B

Coding

RUNS WELL56/100~200 tok/s

ctx2K

RUNS WELL56/100~200 tok/s

ctx2K

OLMo-1B-hf1.2B

General

RUNS WELL56/100~183.3 tok/s

ctx2K

RUNS WELL56/100~183.3 tok/s

ctx2K

llama-3.2-1b-code-instruct1.2B

Coding

RUNS WELL56/100~183.3 tok/s

ctx131K

✦ RUNS WELL57/100~183.3 tok/s

ctx131K

starvector-1b-im2svg1.4B

General

RUNS WELL56/100~157.1 tok/s

ctx8K

RUNS WELL56/100~157.1 tok/s

ctx8K

OLMo-2-0425-1B-Instruct1.5B

Chat

RUNS WELL56/100~146.7 tok/s

ctx4K

RUNS WELL56/100~146.7 tok/s

ctx4K

Qwen2.5-1.5B1.5B

General

RUNS WELL56/100~146.7 tok/s

ctx111K

RUNS WELL55/100~146.7 tok/s

111K→131K✦

Qwen2-1.5B1.5B

General

RUNS WELL56/100~146.7 tok/s

ctx111K

RUNS WELL55/100~146.7 tok/s

111K→131K✦

Qwen2.5-Math-1.5B-Instruct1.5B

Chat

RUNS WELL56/100~146.7 tok/s

ctx4K

RUNS WELL56/100~146.7 tok/s

ctx4K

xLAM-2-1b-fc-r1.5B

General

RUNS WELL56/100~146.7 tok/s

ctx33K

RUNS WELL55/100~146.7 tok/s

ctx33K

qwen-base-invoicev1.01-1.5B1.5B

General

RUNS WELL56/100~146.7 tok/s

ctx33K

RUNS WELL55/100~146.7 tok/s

ctx33K

Phi-4-mini-reasoning3.8B

Reasoning

RUNS WELL56/100~57.9 tok/s

ctx16K

RUNS WELL56/100~57.9 tok/s

ctx16K

Llama-4-Maverick400BMoE

General

RUNS WELL55/100~0.4 tok/s

ctx2K

✦ RUNS WELL57/100~0.4 tok/s

2K→7K✦

bge-m30.57B

Embedding

RUNS WELL55/100~386 tok/s

ctx8K

RUNS WELL55/100~386 tok/s

ctx8K

pythia-410m0.5B

General

RUNS WELL55/100~440 tok/s

ctx2K

RUNS WELL55/100~440 tok/s

ctx2K

pythia-410m-deduped0.5B

General

RUNS WELL55/100~440 tok/s

ctx2K

RUNS WELL55/100~440 tok/s

ctx2K

bloomz-560m0.6B

General

RUNS WELL55/100~366.7 tok/s

ctx2K

RUNS WELL55/100~366.7 tok/s

ctx2K

LFM2-VL-1.6B1.6B

General

RUNS WELL55/100~137.5 tok/s

ctx103K

✦ RUNS WELL55/100~137.5 tok/s

103K→128K✦

LFM2.5-VL-1.6B1.6B

General

RUNS WELL55/100~137.5 tok/s

ctx103K

✦ RUNS WELL55/100~137.5 tok/s

103K→128K✦

stablelm-2-1_6b-chat1.6B

Chat

RUNS WELL55/100~137.5 tok/s

ctx4K

RUNS WELL55/100~137.5 tok/s

ctx4K

Qwen3.5-4B4.7B

General

RUNS WELL55/100~46.8 tok/s

ctx24K

RUNS WELL54/100~46.8 tok/s

24K→95K✦

Qwen3.5-4B-Base4.7B

General

RUNS WELL55/100~46.8 tok/s

ctx24K

RUNS WELL54/100~46.8 tok/s

24K→95K✦

Qwen3-8B-NVFP44.7B

General

RUNS WELL55/100~46.8 tok/s

ctx24K

RUNS WELL54/100~46.8 tok/s

24K→41K✦

NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ5.1B

General

RUNS WELL55/100~43.1 tok/s

ctx21K

✦ RUNS WELL55/100~43.1 tok/s

21K→83K✦

Qwen3-32B-MLX-4bit5.1B

General

RUNS WELL55/100~43.1 tok/s

ctx21K

✦ RUNS WELL55/100~43.1 tok/s

21K→41K✦

QwQ-32B-MLX-4bit5.1B

General

RUNS WELL55/100~43.1 tok/s

ctx21K

✦ RUNS WELL55/100~43.1 tok/s

21K→83K✦

DeepSeek-V2-Chat236BMoE

General

RUNS WELL54/100~0.6 tok/s

ctx1K

✦ RUNS WELL56/100~0.6 tok/s

1K→5K✦

DeepSeek-R1-0528-Qwen3-8B-MLX-4bit1.3B

Reasoning

RUNS WELL54/100~169.2 tok/s

ctx131K

✦ RUNS WELL54/100~169.2 tok/s

131K→131K✦

gpt-neo-1.3B1.4B

General

RUNS WELL54/100~157.1 tok/s

ctx2K

RUNS WELL54/100~157.1 tok/s

ctx2K

phi-1_51.4B

General

RUNS WELL54/100~157.1 tok/s

ctx2K

RUNS WELL54/100~157.1 tok/s

ctx2K

LFM2-Audio-1.5B1.5B

General

RUNS WELL54/100~146.7 tok/s

ctx4K

RUNS WELL54/100~146.7 tok/s

ctx4K

LFM2.5-Audio-1.5B1.5B

General

RUNS WELL54/100~146.7 tok/s

ctx4K

RUNS WELL54/100~146.7 tok/s

ctx4K

OLMo-2-0425-1B1.5B

General

RUNS WELL54/100~146.7 tok/s

ctx4K

RUNS WELL54/100~146.7 tok/s

ctx4K

Qwen2.5-Math-1.5B1.5B

General

RUNS WELL54/100~146.7 tok/s

ctx4K

RUNS WELL54/100~146.7 tok/s

ctx4K

SmolLM2-1.7B1.7B

General

RUNS WELL54/100~129.4 tok/s

ctx8K

RUNS WELL54/100~129.4 tok/s

ctx8K

Nanbeige4.1-3B-AWQ-8bit1.7B

General

RUNS WELL54/100~129.4 tok/s

ctx96K

✦ RUNS WELL54/100~129.4 tok/s

96K→262K✦

Qwen3-1.7B-Base1.7B

General

RUNS WELL54/100~129.4 tok/s

ctx33K

RUNS WELL54/100~129.4 tok/s

ctx33K

Qwen3-1.7B-MLX-bf161.7B

General

RUNS WELL54/100~129.4 tok/s

ctx41K

RUNS WELL54/100~129.4 tok/s

ctx41K

Qwen2.5-1.5B-Instruct-AWQ1.8B

Chat

RUNS WELL54/100~122.2 tok/s

ctx33K

RUNS WELL54/100~122.2 tok/s

ctx33K

Qwen2-1.5B-Instruct-AWQ1.8B

Chat

RUNS WELL54/100~122.2 tok/s

ctx33K

RUNS WELL54/100~122.2 tok/s

ctx33K

Qwen2-1.5B-Instruct-GPTQ-Int41.8B

Chat

RUNS WELL54/100~122.2 tok/s

ctx33K

RUNS WELL54/100~122.2 tok/s

ctx33K

Qwen2.5-1.5B-quantized.w8a81.8B

General

RUNS WELL54/100~122.2 tok/s

ctx33K

RUNS WELL54/100~122.2 tok/s

ctx33K

Qwen1.5-1.8B-Chat1.8B

Chat

RUNS WELL54/100~122.2 tok/s

ctx33K

RUNS WELL54/100~122.2 tok/s

ctx33K

gemma-3n-E2B-it4B

Multimodal

RUNS WELL54/100~55 tok/s

ctx31K

✦ RUNS WELL54/100~55 tok/s

31K→124K✦

Qwen3-14B-MLX-8bit4.2B

General

RUNS WELL54/100~52.4 tok/s

ctx29K

✦ RUNS WELL54/100~52.4 tok/s

29K→41K✦

Qwen3-4B-SafeRL4.4B

General

RUNS WELL54/100~50 tok/s

ctx27K

✦ RUNS WELL54/100~50 tok/s

27K→41K✦

Qwen3-4B-FP84.4B

General

RUNS WELL54/100~50 tok/s

ctx27K

✦ RUNS WELL54/100~50 tok/s

27K→41K✦

Nemotron-H-4B-Base-8K4.5B

General

RUNS WELL54/100~48.9 tok/s

ctx8K

RUNS WELL54/100~48.9 tok/s

ctx8K

Qwen2.5-Coder-32B-Instruct-MLX-4bit5.1B

Coding

RUNS WELL54/100~43.1 tok/s

ctx21K

✦ RUNS WELL54/100~43.1 tok/s

21K→33K✦

Qwen2.5-Coder-1.5B1.5B

Coding

RUNS WELL53/100~146.7 tok/s

ctx111K

✦ RUNS WELL53/100~146.7 tok/s

111K→128K✦

gemma-2-2b-it2B

Chat

RUNS WELL53/100~110 tok/s

ctx8K

RUNS WELL53/100~110 tok/s

ctx8K

granite-3.1-2b-instruct2B

General

RUNS WELL53/100~110 tok/s

ctx79K

✦ RUNS WELL53/100~110 tok/s

79K→128K✦

nomic-embed-text-v1.50.14B

Embedding

RUNS WELL53/100~1571.4 tok/s

ctx8K

RUNS WELL53/100~1571.4 tok/s

ctx8K

pythia-1.4b1.5B

General

RUNS WELL53/100~146.7 tok/s

ctx2K

RUNS WELL53/100~146.7 tok/s

ctx2K

Qwen2.5-Coder-1.5B-Instruct1.5B

Coding

RUNS WELL53/100~146.7 tok/s

ctx33K

RUNS WELL53/100~146.7 tok/s

ctx33K

Minnow-Math-1.5B1.6B

General

RUNS WELL53/100~137.5 tok/s

ctx4K

✦ RUNS WELL54/100~137.5 tok/s

ctx4K

QVikhr-3-1.7B-Instruction-noreasoning1.7B

Reasoning

RUNS WELL53/100~129.4 tok/s

ctx41K

RUNS WELL53/100~129.4 tok/s

ctx41K

bloom-1b71.7B

General

RUNS WELL53/100~129.4 tok/s

ctx4K

RUNS WELL53/100~129.4 tok/s

ctx4K

DeepSeek-R1-Distill-Qwen-1.5B1.8B

Reasoning

RUNS WELL53/100~122.2 tok/s

ctx90K

✦ RUNS WELL53/100~122.2 tok/s

90K→131K✦

Qwen3-1.7B2B

General

RUNS WELL53/100~110 tok/s

ctx41K

RUNS WELL53/100~110 tok/s

ctx41K

Qwen3-1.7B-FP82B

General

RUNS WELL53/100~110 tok/s

ctx41K

RUNS WELL53/100~110 tok/s

ctx41K

Phi-4-reasoning-plus-MLX-4bit2.3B

Reasoning

RUNS WELL53/100~95.7 tok/s

ctx33K

RUNS WELL53/100~95.7 tok/s

ctx33K

DeepSeek-R1-0528-Qwen3-8B-MLX-8bit2.3B

Reasoning

RUNS WELL53/100~95.7 tok/s

ctx66K

✦ RUNS WELL53/100~95.7 tok/s

66K→131K✦

HTML-Pruner-Phi-3.8B3.8B

General

RUNS WELL53/100~57.9 tok/s

ctx34K

✦ RUNS WELL53/100~57.9 tok/s

34K→131K✦

Nanbeige4.1-3B3.9B

General

RUNS WELL53/100~56.4 tok/s

ctx32K

✦ RUNS WELL53/100~56.4 tok/s

32K→129K✦

Qwen3-4B-Base4B

General

RUNS WELL53/100~55 tok/s

ctx31K

✦ RUNS WELL53/100~55 tok/s

31K→33K✦

Qwen3-4B-Thinking-25074B

General

RUNS WELL53/100~55 tok/s

ctx31K

✦ RUNS WELL53/100~55 tok/s

31K→124K✦

Qwen3-4B-AWQ4B

General

RUNS WELL53/100~55 tok/s

ctx31K

✦ RUNS WELL53/100~55 tok/s

31K→41K✦

Jan-v1-4B4B

General

RUNS WELL53/100~55 tok/s

ctx31K

✦ RUNS WELL53/100~55 tok/s

31K→124K✦

Jan-nano-128k4B

General

RUNS WELL53/100~55 tok/s

ctx31K

✦ RUNS WELL53/100~55 tok/s

31K→124K✦

VLM2Vec-Full4.1B

General

RUNS WELL53/100~53.7 tok/s

ctx30K

✦ RUNS WELL54/100~53.7 tok/s

30K→119K✦

Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit5.3BMoE

Coding

RUNS WELL53/100~33.2 tok/s

ctx19K

✦ RUNS WELL53/100~33.2 tok/s

19K→77K✦

Llama-4-Scout17BMoE

General

RUNS WELL52/100~10.4 tok/s

ctx47K

✦ RUNS WELL52/100~10.4 tok/s

47K→188K✦

Mixtral-8x7B-Instruct46.7BMoE

General

RUNS WELL52/100~3.7 tok/s

ctx2K

✦ RUNS WELL55/100~3.7 tok/s

2K→7K✦

granite-3.0-2b-instruct2B

General

RUNS WELL52/100~110 tok/s

ctx4K

RUNS WELL52/100~110 tok/s

ctx4K

Phi-4-mini-reasoning-MLX-4bit0.6B

Reasoning

RUNS WELL52/100~366.7 tok/s

ctx131K

RUNS WELL52/100~366.7 tok/s

ctx131K

SmolLM-1.7B1.7B

General

RUNS WELL52/100~129.4 tok/s

ctx2K

RUNS WELL52/100~129.4 tok/s

ctx2K

Qwen2.5-Coder-1.5B-Instruct-AWQ1.8B

Coding

RUNS WELL52/100~122.2 tok/s

ctx33K

RUNS WELL52/100~122.2 tok/s

ctx33K

Qwen3.5-2B2.3B

General

RUNS WELL52/100~95.7 tok/s

ctx66K

✦ RUNS WELL52/100~95.7 tok/s

66K→262K✦

Qwen3.5-2B-Base2.3B

General

RUNS WELL52/100~95.7 tok/s

ctx66K

✦ RUNS WELL52/100~95.7 tok/s

66K→262K✦

Qwen3-8B-MLX-8bit2.3B

General

RUNS WELL52/100~95.7 tok/s

ctx41K

RUNS WELL52/100~95.7 tok/s

ctx41K

Qwen3-14B-MLX-4bit2.3B

General

RUNS WELL52/100~95.7 tok/s

ctx41K

RUNS WELL52/100~95.7 tok/s

ctx41K

LFM2-2.6B2.6B

General

RUNS WELL52/100~84.6 tok/s

ctx57K

✦ RUNS WELL52/100~84.6 tok/s

57K→128K✦

LFM2-2.6B-Exp2.6B

General

RUNS WELL52/100~84.6 tok/s

ctx57K

✦ RUNS WELL52/100~84.6 tok/s

57K→128K✦

LFM2-2.6B-Transcript2.6B

General

RUNS WELL52/100~84.6 tok/s

ctx57K

✦ RUNS WELL52/100~84.6 tok/s

57K→128K✦

T-lite-it-1.0_Q4_02.9B

General

RUNS WELL52/100~75.9 tok/s

ctx33K

RUNS WELL52/100~75.9 tok/s

ctx33K

LFM2-VL-3B3B

General

RUNS WELL52/100~73.3 tok/s

ctx47K

✦ RUNS WELL52/100~73.3 tok/s

47K→128K✦

SmolLM3-3B3.1B

General

RUNS WELL52/100~71 tok/s

ctx45K

✦ RUNS WELL52/100~71 tok/s

45K→66K✦

SmolLM3-3B-Base3.1B

General

RUNS WELL52/100~71 tok/s

ctx45K

✦ RUNS WELL52/100~71 tok/s

45K→66K✦

Qwen2.5-3B3.1B

General

RUNS WELL52/100~71 tok/s

ctx33K

RUNS WELL52/100~71 tok/s

ctx33K

xLAM-2-3b-fc-r3.1B

General

RUNS WELL52/100~71 tok/s

ctx33K

RUNS WELL52/100~71 tok/s

ctx33K

Hermes-3-Llama-3.2-3B3.2B

General

RUNS WELL52/100~68.8 tok/s

ctx43K

✦ RUNS WELL52/100~68.8 tok/s

43K→131K✦

Qwen2.5-Coder-14B-Instruct-MLX-8bit4.2B

Coding

RUNS WELL52/100~52.4 tok/s

ctx29K

✦ RUNS WELL52/100~52.4 tok/s

29K→33K✦

xflux_text_encoders4.8B

Coding

RUNS WELL52/100~45.8 tok/s

ctx4K

RUNS WELL52/100~45.8 tok/s

ctx4K

bge-large-en-v1.50.34B

Embedding

RUNS WELL51/100~647.1 tok/s

ctx512

RUNS WELL51/100~647.1 tok/s

ctx512

h2ovl-mississippi-2b2.2B

General

RUNS WELL51/100~100 tok/s

ctx4K

RUNS WELL51/100~100 tok/s

ctx4K

EXAONE-3.5-2.4B-Instruct2.4B

Chat

RUNS WELL51/100~91.7 tok/s

ctx33K

RUNS WELL51/100~91.7 tok/s

ctx33K

gemma-1.1-2b-it2.5B

General

RUNS WELL51/100~88 tok/s

ctx4K

RUNS WELL51/100~88 tok/s

ctx4K

gemma-2-2b-jpn-it2.6B

General

RUNS WELL51/100~84.6 tok/s

ctx4K

RUNS WELL50/100~84.6 tok/s

ctx4K

stablelm-3b-4e1t2.8B

General

RUNS WELL51/100~78.6 tok/s

ctx4K

RUNS WELL51/100~78.6 tok/s

ctx4K

granite-4.0-h-micro3.2BMoE

General

RUNS WELL51/100~55 tok/s

ctx43K

✦ RUNS WELL51/100~55 tok/s

43K→131K✦

Llama-3.2-3B3.2B

General

RUNS WELL51/100~68.8 tok/s

ctx4K

RUNS WELL51/100~68.8 tok/s

ctx4K

PowerLM-3b3.5B

General

RUNS WELL51/100~62.9 tok/s

ctx4K

RUNS WELL51/100~62.9 tok/s

ctx4K

Qwen3-Coder-30B-A3B-Instruct-AWQ4.6BMoE

Coding

RUNS WELL51/100~38.3 tok/s

ctx25K

✦ RUNS WELL51/100~38.3 tok/s

25K→99K✦

Qwen2.5-VL-7B-Instruct-NVFP45B

Chat

RUNS WELL51/100~44 tok/s

ctx21K

RUNS WELL50/100~44 tok/s

21K→86K✦

Phi-4-multimodal-instruct5.6B

Chat

RUNS WELL51/100~39.3 tok/s

ctx17K

✦ RUNS WELL52/100~39.3 tok/s

17K→69K✦

Llama-3.2-3B-Instruct3B

Chat

RUNS WELL50/100~73.3 tok/s

ctx47K

✦ RUNS WELL50/100~73.3 tok/s

47K→128K✦

Qwen2.5-3B-Instruct3B

Chat

RUNS WELL50/100~73.3 tok/s

ctx47K

✦ RUNS WELL50/100~73.3 tok/s

47K→128K✦

gemma-3-4b-it4B

Chat

RUNS WELL50/100~55 tok/s

ctx31K

✦ RUNS WELL50/100~55 tok/s

31K→124K✦

Phi-3.5-mini-instruct3.8B

Chat

RUNS WELL50/100~57.9 tok/s

ctx34K

✦ RUNS WELL50/100~57.9 tok/s

34K→128K✦

Phi-4-mini3.8B

Chat

RUNS WELL50/100~57.9 tok/s

ctx34K

✦ RUNS WELL50/100~57.9 tok/s

34K→128K✦

DeepSeek-Coder-V2-16B16BMoE

Coding

RUNS WELL50/100~11 tok/s

ctx63K

✦ RUNS WELL50/100~11 tok/s

63K→128K✦

StarCoder2-3B3B

Coding

RUNS WELL50/100~73.3 tok/s

ctx16K

RUNS WELL50/100~73.3 tok/s

ctx16K

Qwen2.5-Coder-14B-Instruct-MLX-4bit2.3B

Coding

RUNS WELL50/100~95.7 tok/s

ctx33K

RUNS WELL50/100~95.7 tok/s

ctx33K

gpt-neo-2.7B2.7B

General

RUNS WELL50/100~81.5 tok/s

ctx2K

RUNS WELL50/100~81.5 tok/s

ctx2K

phi-22.8B

General

RUNS WELL50/100~78.6 tok/s

ctx2K

RUNS WELL49/100~78.6 tok/s

ctx2K

pythia-2.8b2.9B

General

RUNS WELL50/100~75.9 tok/s

ctx2K

RUNS WELL50/100~75.9 tok/s

ctx2K

starcoder2-3b3B

Coding

RUNS WELL50/100~73.3 tok/s

ctx16K

RUNS WELL50/100~73.3 tok/s

ctx16K

Qwen2.5-Coder-3B-Instruct3.1B

Coding

RUNS WELL50/100~71 tok/s

ctx33K

RUNS WELL50/100~71 tok/s

ctx33K

Qwen2.5-Coder-3B3.1B

Coding

RUNS WELL50/100~71 tok/s

ctx33K

RUNS WELL50/100~71 tok/s

ctx33K

Qwen2.5-VL-3B-Instruct3.8B

Chat

RUNS WELL50/100~57.9 tok/s

ctx34K

✦ RUNS WELL50/100~57.9 tok/s

34K→128K✦

Qwen3-4B-Instruct-25074B

Chat

RUNS WELL50/100~55 tok/s

ctx31K

✦ RUNS WELL50/100~55 tok/s

31K→124K✦

Qwen3-4B-Instruct-2507-GPTQ-Int44B

Chat

RUNS WELL50/100~55 tok/s

ctx31K

✦ RUNS WELL50/100~55 tok/s

31K→124K✦

Qwen3-4B-Instruct-2507-FP84.4B

Chat

RUNS WELL50/100~50 tok/s

ctx27K

✦ RUNS WELL50/100~50 tok/s

27K→107K✦

Nemotron-H-4B-Instruct-128K4.5B

Chat

RUNS WELL50/100~48.9 tok/s

ctx26K

✦ RUNS WELL50/100~48.9 tok/s

26K→103K✦

Qwen3-30B-A3B-Instruct-2507-AWQ-4bit5.3BMoE

Chat

RUNS WELL50/100~33.2 tok/s

ctx19K

✦ RUNS WELL50/100~33.2 tok/s

19K→77K✦

granite-4.0-h-tiny-AWQ-4bit2BMoE

Chat

RUNS WELL49/100~88 tok/s

ctx79K

✦ RUNS WELL49/100~88 tok/s

79K→131K✦

PowerMoE-3b3.4BMoE

General

RUNS WELL49/100~51.8 tok/s

ctx4K

✦ RUNS WELL50/100~51.8 tok/s

ctx4K

Qwen2.5-3B-Instruct-AWQ3.4B

Chat

RUNS WELL49/100~64.7 tok/s

ctx33K

RUNS WELL49/100~64.7 tok/s

ctx33K

granite-3b-code-base-2k3.5B

Coding

RUNS WELL49/100~62.9 tok/s

ctx2K

RUNS WELL49/100~62.9 tok/s

ctx2K

Llama-3.2-3B-Instruct-FP83.6B

Chat

RUNS WELL49/100~61.1 tok/s

ctx36K

✦ RUNS WELL50/100~61.1 tok/s

36K→131K✦

Qwen3-32B32B

Reasoning

DECENT48/100~2.9 tok/s

ctx—

DECENT48/100~2.9 tok/s

ctx—

Phi-3-mini-4k3.8B

Chat

RUNS WELL48/100~57.9 tok/s

ctx4K

RUNS WELL48/100~57.9 tok/s

ctx4K

DeepSeek-R1-32B32B

Reasoning

DECENT48/100~2.9 tok/s

ctx—

DECENT48/100~2.9 tok/s

ctx—

phi-3-mini-4k-instruct3.8B

Chat

RUNS WELL48/100~57.9 tok/s

ctx4K

RUNS WELL48/100~57.9 tok/s

ctx4K

Phi-3-mini-4k-instruct-AWQ3.8B

Chat

RUNS WELL48/100~57.9 tok/s

ctx4K

RUNS WELL48/100~57.9 tok/s

ctx4K

Phi-3-mini-4k-instruct-gptq-4bit3.8B

Chat

RUNS WELL48/100~57.9 tok/s

ctx4K

RUNS WELL48/100~57.9 tok/s

ctx4K

Qwen3-30B-A3B-Instruct-2507-AWQ4.6BMoE

Chat

RUNS WELL48/100~38.3 tok/s

ctx25K

✦ RUNS WELL48/100~38.3 tok/s

25K→99K✦

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled27.8B

Reasoning

DECENT48/100~3.6 tok/s

ctx—

DECENT48/100~3.6 tok/s

ctx—

DeepSeek-R1-Distill-Qwen-32B32.8B

Reasoning

DECENT48/100~2.9 tok/s

ctx—

✦ DECENT49/100~2.9 tok/s

ctx—

OpenReasoning-Nemotron-32B32.8B

Reasoning

DECENT48/100~2.9 tok/s

ctx—

✦ DECENT49/100~2.9 tok/s

ctx—

Phi-tiny-MoE-instruct3.8BMoE

Chat

RUNS WELL46/100~46.3 tok/s

ctx4K

RUNS WELL46/100~46.3 tok/s

ctx4K

DeepSeek-R1-Distill-Qwen-14B14.8B

Reasoning

DECENT46/100~7.4 tok/s

ctx—

DECENT46/100~7.4 tok/s

ctx—

Qwen3-14B14B

Reasoning

DECENT45/100~7.9 tok/s

ctx—

DECENT45/100~7.9 tok/s

ctx—

Phi-414B

Reasoning

DECENT45/100~7.9 tok/s

ctx—

DECENT45/100~7.9 tok/s

ctx—

DeepSeek-R1-14B14B

Reasoning

DECENT45/100~7.9 tok/s

ctx—

DECENT45/100~7.9 tok/s

ctx—

Orca-2-13B13B

Reasoning

DECENT45/100~8.5 tok/s

ctx—

DECENT45/100~8.5 tok/s

ctx—

Phi-4-reasoning14B

Reasoning

DECENT45/100~7.9 tok/s

ctx—

DECENT45/100~7.9 tok/s

ctx—

HyperCLOVAX-SEED-Omni-8B10.7B

General

RUNS WELL43/100~20.6 tok/s

ctx943

✦ RUNS WELL46/100~20.6 tok/s

943→4K✦

Llama-3.2-11B-Vision11B

Multimodal

DECENT40/100~10 tok/s

ctx454

✦ DECENT42/100~10 tok/s

454→2K✦

Llama-3.2-11B-Vision-Instruct10.7B

Chat

RUNS WELL37/100~20.6 tok/s

ctx943

✦ RUNS WELL39/100~20.6 tok/s

943→4K✦

SOLAR-10.7B-Instruct-v1.010.7B

Chat

RUNS WELL37/100~20.6 tok/s

ctx943

✦ RUNS WELL39/100~20.6 tok/s

943→4K✦

CodeLlama-34B-Instruct34B

Coding

DECENT36/100~2.8 tok/s

ctx—

DECENT35/100~2.8 tok/s

ctx—

CodeLlama-34b-Instruct-hf33.7B

Coding

DECENT36/100~2.8 tok/s

ctx—

DECENT35/100~2.8 tok/s

ctx—

Qwen2.5-Coder-32B32B

Coding

DECENT35/100~2.9 tok/s

ctx—

DECENT35/100~2.9 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-MLX-4bit30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-MLX-5bit30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-MLX-8bit30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-MLX-6bit30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-gptq-4bit30.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-FP830.5BMoE

Coding

DECENT35/100~2.5 tok/s

ctx—

DECENT35/100~2.5 tok/s

ctx—

Qwen2.5-Coder-32B-Instruct32.8B

Coding

DECENT35/100~2.9 tok/s

ctx—

DECENT35/100~2.9 tok/s

ctx—

Qwen2.5-Coder-32B-Instruct-AWQ32.8B

Coding

DECENT35/100~2.9 tok/s

ctx—

DECENT35/100~2.9 tok/s

ctx—

StarCoder2-15B15B

Coding

DECENT34/100~7.3 tok/s

ctx—

DECENT34/100~7.3 tok/s

ctx—

Qwen3-Coder-Next-AWQ-4bit14.4B

Coding

DECENT34/100~7.6 tok/s

ctx—

DECENT34/100~7.6 tok/s

ctx—

Qwen2.5-Coder-14B-Instruct14.8B

Coding

DECENT34/100~7.4 tok/s

ctx—

DECENT34/100~7.4 tok/s

ctx—

Qwen2.5-Coder-14B-Instruct-AWQ14.8B

Coding

DECENT34/100~7.4 tok/s

ctx—

DECENT34/100~7.4 tok/s

ctx—

WizardCoder-15B-V1.015.5B

Coding

DECENT34/100~7.1 tok/s

ctx—

DECENT34/100~7.1 tok/s

ctx—

Qwen3-Coder-30B-A3B-Instruct-FP415.6BMoE

Coding

DECENT34/100~5.6 tok/s

ctx—

DECENT34/100~5.6 tok/s

ctx—

starcoder2-15b15.7B

Coding

DECENT34/100~7 tok/s

ctx—

DECENT34/100~7 tok/s

ctx—

DeepSeek-Coder-V2-Lite-Instruct15.7BMoE

Coding

DECENT34/100~5.6 tok/s

ctx—

DECENT34/100~5.6 tok/s

ctx—

DeepSeek-Coder-V2-Lite-Instruct-FP815.7BMoE

Coding

DECENT34/100~5.6 tok/s

ctx—

DECENT34/100~5.6 tok/s

ctx—

Qwen2.5-Coder-14B14B

Coding

DECENT33/100~7.9 tok/s

ctx—

DECENT33/100~7.9 tok/s

ctx—

CodeLlama-13B-Instruct13B

Coding

DECENT33/100~8.5 tok/s

ctx—

DECENT33/100~8.5 tok/s

ctx—

CodeLlama-13b-Instruct-hf13B

Coding

DECENT33/100~8.5 tok/s

ctx—

DECENT33/100~8.5 tok/s

ctx—

xLAM-8x7b-r46.7BMoE

General

DECENT32/100~1.5 tok/s

ctx—

DECENT32/100~1.5 tok/s

ctx—

Nous-Hermes-2-Mixtral-8x7B-DPO46.7BMoE

General

DECENT32/100~1.5 tok/s

ctx—

DECENT32/100~1.5 tok/s

ctx—

Llama-3_3-Nemotron-Super-49B-v1_549.9B

General

DECENT32/100~1.7 tok/s

ctx—

DECENT32/100~1.7 tok/s

ctx—

Llama-3_3-Nemotron-Super-49B-v1_5-FP849.9B

General

DECENT32/100~1.7 tok/s

ctx—

DECENT32/100~1.7 tok/s

ctx—

Llama-3_3-Nemotron-Super-49B-v149.9B

General

DECENT32/100~1.7 tok/s

ctx—

DECENT32/100~1.7 tok/s

ctx—

Mistral-Small-24B24B

General

DECENT31/100~4.3 tok/s

ctx—

DECENT31/100~4.3 tok/s

ctx—

Qwen2.5-32B-Instruct32B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

gemma-2-27b-it27B

General

DECENT31/100~3.7 tok/s

ctx—

DECENT31/100~3.7 tok/s

ctx—

gemma-3-27b-it27B

General

DECENT31/100~3.7 tok/s

ctx—

DECENT31/100~3.7 tok/s

ctx—

Command-R35B

General

DECENT31/100~2.7 tok/s

ctx—

DECENT31/100~2.7 tok/s

ctx—

Falcon-40B-Instruct40B

General

DECENT31/100~2.1 tok/s

ctx—

DECENT31/100~2.1 tok/s

ctx—

t5gemma-9b-9b-ul220.3B

General

DECENT31/100~5.3 tok/s

ctx—

DECENT30/100~5.3 tok/s

ctx—

gpt-oss-20b21.5B

General

DECENT31/100~5 tok/s

ctx—

DECENT31/100~5 tok/s

ctx—

ERNIE-4.5-21B-A3B-MLX-4bit21.8BMoE

General

DECENT31/100~3.9 tok/s

ctx—

DECENT31/100~3.9 tok/s

ctx—

ERNIE-4.5-21B-A3B-MLX-6bit21.8BMoE

General

DECENT31/100~3.9 tok/s

ctx—

DECENT31/100~3.9 tok/s

ctx—

ERNIE-4.5-21B-A3B-MLX-8bit21.8BMoE

General

DECENT31/100~3.9 tok/s

ctx—

DECENT31/100~3.9 tok/s

ctx—

LFM2-24B-A2B-MLX-4bit23.8BMoE

General

DECENT31/100~3.5 tok/s

ctx—

DECENT31/100~3.5 tok/s

ctx—

LFM2-24B-A2B-MLX-6bit23.8BMoE

General

DECENT31/100~3.5 tok/s

ctx—

DECENT31/100~3.5 tok/s

ctx—

LFM2-24B-A2B-MLX-8bit23.8BMoE

General

DECENT31/100~3.5 tok/s

ctx—

DECENT31/100~3.5 tok/s

ctx—

LFM2-24B-A2B-MLX-5bit23.8BMoE

General

DECENT31/100~3.5 tok/s

ctx—

DECENT31/100~3.5 tok/s

ctx—

LFM2-24B-A2B23.8BMoE

General

DECENT31/100~3.5 tok/s

ctx—

DECENT31/100~3.5 tok/s

ctx—

Qwen3.5-27B27.8B

General

DECENT31/100~3.6 tok/s

ctx—

DECENT31/100~3.6 tok/s

ctx—

GLM-4.7-Flash-MLX-8bit29.9BMoE

General

DECENT31/100~2.6 tok/s

ctx—

DECENT31/100~2.6 tok/s

ctx—

GLM-4.7-Flash-MLX-6bit29.9BMoE

General

DECENT31/100~2.6 tok/s

ctx—

DECENT31/100~2.6 tok/s

ctx—

Qwen3-30B-A3B-Thinking-250730.5BMoE

General

DECENT31/100~2.5 tok/s

ctx—

DECENT30/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-GPTQ-Int430.5BMoE

General

DECENT31/100~2.5 tok/s

ctx—

DECENT30/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-Base30.5BMoE

General

DECENT31/100~2.5 tok/s

ctx—

DECENT30/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-AWQ30.5BMoE

General

DECENT31/100~2.5 tok/s

ctx—

DECENT30/100~2.5 tok/s

ctx—

GLM-4.7-Flash31.2BMoE

General

DECENT31/100~2.4 tok/s

ctx—

DECENT31/100~2.4 tok/s

ctx—

GLM-4.7-Flash-AWQ31.2BMoE

General

DECENT31/100~2.4 tok/s

ctx—

DECENT31/100~2.4 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-4bit31.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8bit31.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-6bit31.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-5bit31.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-BF1631.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF1631.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-FP831.6B

General

DECENT31/100~3 tok/s

ctx—

DECENT31/100~3 tok/s

ctx—

EXAONE-4.0-32B32B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

EXAONE-4.0.1-32B32B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

EXAONE-4.0-32B-FP832B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

sarvam-30b32.2BMoE

General

DECENT31/100~2.3 tok/s

ctx—

DECENT31/100~2.3 tok/s

ctx—

Olmo-3-1125-32B32.2B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

Qwen3-32B-AWQ32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

Qwen2.5-32B32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

QwQ-32B-AWQ32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

Baichuan-M2-32B32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

QwQ-32B32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

xLAM-2-32b-fc-r32.8B

General

DECENT31/100~2.9 tok/s

ctx—

DECENT31/100~2.9 tok/s

ctx—

HyperCLOVAX-SEED-Think-32B33.3B

General

DECENT31/100~2.8 tok/s

ctx—

DECENT31/100~2.8 tok/s

ctx—

dolphin-2.9.1-yi-1.5-34b34.4B

General

DECENT31/100~2.7 tok/s

ctx—

DECENT31/100~2.7 tok/s

ctx—

c4ai-command-r-v0135B

General

DECENT31/100~2.7 tok/s

ctx—

DECENT31/100~2.7 tok/s

ctx—

Qwen3.5-35B-A3B36BMoE

General

DECENT31/100~2.1 tok/s

ctx—

DECENT31/100~2.1 tok/s

ctx—

Bielik-11B-v3.0-Instruct11.2B

Chat

DECENT30/100~9.8 tok/s

ctx142

✦ DECENT32/100~9.8 tok/s

142→568✦

Qwen3-Next-80B-A3B-Thinking-AWQ-4bit14.7B

General

DECENT30/100~7.5 tok/s

ctx—

DECENT30/100~7.5 tok/s

ctx—

HyperCLOVAX-SEED-Think-14B-GPTQ14.7B

General

DECENT30/100~7.5 tok/s

ctx—

DECENT30/100~7.5 tok/s

ctx—

Qwen3-14B-AWQ14.8B

General

DECENT30/100~7.4 tok/s

ctx—

DECENT30/100~7.4 tok/s

ctx—

Qwen3-14B-Base14.8B

General

DECENT30/100~7.4 tok/s

ctx—

DECENT30/100~7.4 tok/s

ctx—

Qwen2.5-14B14.8B

General

DECENT30/100~7.4 tok/s

ctx—

DECENT30/100~7.4 tok/s

ctx—

Qwen3-30B-A3B-NVFP415.6BMoE

General

DECENT30/100~5.6 tok/s

ctx—

DECENT30/100~5.6 tok/s

ctx—

DeepSeek-V2-Lite15.7BMoE

General

DECENT30/100~5.6 tok/s

ctx—

DECENT30/100~5.6 tok/s

ctx—

Moonlight-16B-A3B16BMoE

General

DECENT30/100~5.5 tok/s

ctx—

DECENT30/100~5.5 tok/s

ctx—

deepseek-moe-16b-base16.4B

General

DECENT30/100~6.7 tok/s

ctx—

DECENT30/100~6.7 tok/s

ctx—

Ling-lite16.8BMoE

General

DECENT30/100~5.2 tok/s

ctx—

DECENT30/100~5.2 tok/s

ctx—

Qwen3-32B-NVFP417.2B

General

DECENT30/100~6.2 tok/s

ctx—

DECENT30/100~6.2 tok/s

ctx—

NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP418.2B

General

DECENT30/100~5.9 tok/s

ctx—

DECENT30/100~5.9 tok/s

ctx—

Mistral-Nemo-12B12B

General

DECENT29/100~9.2 tok/s

ctx—

DECENT29/100~9.2 tok/s

ctx—

Qwen2.5-14B-Instruct14B

General

DECENT29/100~7.9 tok/s

ctx—

DECENT29/100~7.9 tok/s

ctx—

gemma-3-12b-it12B

General

DECENT29/100~9.2 tok/s

ctx—

DECENT29/100~9.2 tok/s

ctx—

Phi-3-medium-128k14B

General

DECENT29/100~7.9 tok/s

ctx—

DECENT29/100~7.9 tok/s

ctx—

OLMo-2-13B-Instruct13B

General

DECENT29/100~8.5 tok/s

ctx—

DECENT29/100~8.5 tok/s

ctx—

pythia-12b12B

General

DECENT29/100~9.2 tok/s

ctx—

DECENT29/100~9.2 tok/s

ctx—

Orca-2-13b13B

General

DECENT29/100~8.5 tok/s

ctx—

DECENT29/100~8.5 tok/s

ctx—

HarmBench-Llama-2-13b-cls13B

General

DECENT29/100~8.5 tok/s

ctx—

DECENT29/100~8.5 tok/s

ctx—

llm-jp-3.1-13b13.7B

General

DECENT29/100~8 tok/s

ctx—

DECENT29/100~8 tok/s

ctx—

phi-414B

General

DECENT29/100~7.9 tok/s

ctx—

DECENT29/100~7.9 tok/s

ctx—

Phi-3-medium-14b-instruct14B

General

DECENT29/100~7.9 tok/s

ctx—

DECENT29/100~7.9 tok/s

ctx—

Qwen1.5-MoE-A2.7B14.3BMoE

General

DECENT29/100~6.2 tok/s

ctx—

DECENT29/100~6.2 tok/s

ctx—

Yi-34B-Chat34.4B

Chat

DECENT23/100~2.7 tok/s

ctx—

DECENT22/100~2.7 tok/s

ctx—

Seed-OSS-36B-Instruct-MLX-8bit36.2BMoE

Chat

DECENT23/100~2.1 tok/s

ctx—

DECENT23/100~2.1 tok/s

ctx—

Seed-OSS-36B-Instruct-MLX-4bit36.2BMoE

Chat

DECENT23/100~2.1 tok/s

ctx—

DECENT23/100~2.1 tok/s

ctx—

Seed-OSS-36B-Instruct-MLX-5bit36.2BMoE

Chat

DECENT23/100~2.1 tok/s

ctx—

DECENT23/100~2.1 tok/s

ctx—

Seed-OSS-36B-Instruct-MLX-6bit36.2BMoE

Chat

DECENT23/100~2.1 tok/s

ctx—

DECENT23/100~2.1 tok/s

ctx—

MiniMax-M2.5-AWQ-4bit36.8BMoE

Chat

DECENT23/100~2 tok/s

ctx—

DECENT23/100~2 tok/s

ctx—

Mixtral-8x7B-Instruct-v0.146.7BMoE

Chat

DECENT23/100~1.5 tok/s

ctx—

DECENT23/100~1.5 tok/s

ctx—

Kimi-Linear-48B-A3B-Instruct49.1B

Chat

DECENT23/100~1.7 tok/s

ctx—

DECENT23/100~1.7 tok/s

ctx—

vicuna-13b-v1.513B

Chat

DECENT22/100~8.5 tok/s

ctx—

DECENT21/100~8.5 tok/s

ctx—

WizardLM-13B-V1.213B

Chat

DECENT22/100~8.5 tok/s

ctx—

DECENT21/100~8.5 tok/s

ctx—

llm-jp-3.1-13b-instruct413.7B

Chat

DECENT22/100~8 tok/s

ctx—

DECENT22/100~8 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-AWQ-4bit14.7B

Chat

DECENT22/100~7.5 tok/s

ctx—

DECENT22/100~7.5 tok/s

ctx—

Qwen3-14B-Instruct14.8B

Chat

DECENT22/100~7.4 tok/s

ctx—

DECENT22/100~7.4 tok/s

ctx—

Qwen2.5-14B-Instruct-AWQ14.8B

Chat

DECENT22/100~7.4 tok/s

ctx—

DECENT22/100~7.4 tok/s

ctx—

Qwen2.5-14B-Instruct-GPTQ-Int414.8B

Chat

DECENT22/100~7.4 tok/s

ctx—

DECENT22/100~7.4 tok/s

ctx—

Qwen2.5-14B-Instruct-1M14.8B

Chat

DECENT22/100~7.4 tok/s

ctx—

DECENT22/100~7.4 tok/s

ctx—

Qwen2.5-14B-Instruct-GPTQ-Int814.8B

Chat

DECENT22/100~7.4 tok/s

ctx—

DECENT22/100~7.4 tok/s

ctx—

Qwen3-30B-A3B-Instruct-2507-FP415.6BMoE

Chat

DECENT22/100~5.6 tok/s

ctx—

DECENT22/100~5.6 tok/s

ctx—

DeepSeek-V2-Lite-Chat15.7BMoE

Chat

DECENT22/100~5.6 tok/s

ctx—

DECENT22/100~5.6 tok/s

ctx—

Moonlight-16B-A3B-Instruct16BMoE

Chat

DECENT22/100~5.5 tok/s

ctx—

DECENT22/100~5.5 tok/s

ctx—

LLaDA2.0-mini16.3BMoE

Chat

DECENT22/100~5.4 tok/s

ctx—

DECENT22/100~5.4 tok/s

ctx—

LLaDA2.1-mini16.3BMoE

Chat

DECENT22/100~5.4 tok/s

ctx—

DECENT22/100~5.4 tok/s

ctx—

deepseek-moe-16b-chat16.4B

Chat

DECENT22/100~6.7 tok/s

ctx—

DECENT22/100~6.7 tok/s

ctx—

Mistral-Small-24B-Instruct-2501-AWQ23.6B

Chat

DECENT22/100~4.4 tok/s

ctx—

DECENT22/100~4.4 tok/s

ctx—

Mistral-Small-24B-Instruct-2501-FP8-dynamic23.6B

Chat

DECENT22/100~4.4 tok/s

ctx—

DECENT22/100~4.4 tok/s

ctx—

Mistral-Small-24B-Instruct-250124B

Chat

DECENT22/100~4.3 tok/s

ctx—

DECENT22/100~4.3 tok/s

ctx—

Qwen3-30B-A3B-Instruct-2507-MLX-4bit30.5BMoE

Chat

DECENT22/100~2.5 tok/s

ctx—

DECENT22/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-Instruct-2507-MLX-8bit30.5BMoE

Chat

DECENT22/100~2.5 tok/s

ctx—

DECENT22/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-Instruct-2507-MLX-6bit30.5BMoE

Chat

DECENT22/100~2.5 tok/s

ctx—

DECENT22/100~2.5 tok/s

ctx—

Qwen3-30B-A3B-Instruct-2507-FP830.5BMoE

Chat

DECENT22/100~2.5 tok/s

ctx—

DECENT22/100~2.5 tok/s

ctx—

Qwen3-VL-30B-A3B-Instruct-AWQ31.1BMoE

Chat

DECENT22/100~2.4 tok/s

ctx—

DECENT22/100~2.4 tok/s

ctx—

OLMo-2-0325-32B-Instruct32.2B

Chat

DECENT22/100~2.9 tok/s

ctx—

DECENT22/100~2.9 tok/s

ctx—

Qwen2.5-32B-Instruct-AWQ32.8B

Chat

DECENT22/100~2.9 tok/s

ctx—

DECENT22/100~2.9 tok/s

ctx—

Qwen2.5-32B-Instruct-GPTQ-Int432.8B

Chat

DECENT22/100~2.9 tok/s

ctx—

DECENT22/100~2.9 tok/s

ctx—

Qwen2.5-32B-Instruct-GPTQ-Int832.8B

Chat

DECENT22/100~2.9 tok/s

ctx—

DECENT22/100~2.9 tok/s

ctx—

MiniMax-M2.5-BF16-INT4-AWQ39.1BMoE

Chat

DECENT22/100~1.8 tok/s

ctx—

DECENT22/100~1.8 tok/s

ctx—

falcon-40b-instruct40B

Chat

DECENT22/100~2.1 tok/s

ctx—

DECENT22/100~2.1 tok/s

ctx—

GigaChat3-10B-A1.8B11.5BMoE

Chat

DECENT21/100~7.7 tok/s

ctx—

DECENT21/100~7.7 tok/s

ctx—

Mistral-Nemo-Instruct-240712.2B

Chat

DECENT21/100~9 tok/s

ctx—

✦ DECENT22/100~9 tok/s

ctx—

mistral-nemo-instruct-2407-awq12.2B

Chat

DECENT21/100~9 tok/s

ctx—

✦ DECENT22/100~9 tok/s

ctx—

MiniMax-M2.7230B

Reasoning

TOO HEAVY5/100~0.7 tok/s

ctx—

TOO HEAVY5/100~0.7 tok/s

ctx—

DeepSeek-R1-0528-NVFP4-v2393.6BMoE

Reasoning

TOO HEAVY5/100~0.3 tok/s

ctx—

TOO HEAVY5/100~0.3 tok/s

ctx—

DeepSeek-R1-NVFP4396.8BMoE

Reasoning

TOO HEAVY5/100~0.3 tok/s

ctx—

TOO HEAVY5/100~0.3 tok/s

ctx—

DeepSeek-R1684.5BMoE

Reasoning

TOO HEAVY5/100~0.2 tok/s

ctx—

TOO HEAVY5/100~0.2 tok/s

ctx—

DeepSeek-R1-0528684.5BMoE

Reasoning

TOO HEAVY5/100~0.2 tok/s

ctx—

TOO HEAVY5/100~0.2 tok/s

ctx—

DeepSeek-V3.2-Speciale685BMoE

Reasoning

TOO HEAVY5/100~0.2 tok/s

ctx—

TOO HEAVY5/100~0.2 tok/s

ctx—

DeepSeek-R1-70B70B

Reasoning

TOO HEAVY3/100~2.5 tok/s

ctx—

TOO HEAVY3/100~2.5 tok/s

ctx—

DeepSeek-R1-Distill-Llama-70B-FP8-dynamic70.6B

Reasoning

TOO HEAVY3/100~2.4 tok/s

ctx—

TOO HEAVY3/100~2.4 tok/s

ctx—

Llama-3.1-70B-Instruct70B

General

TOO HEAVY0/100~2.5 tok/s

ctx—

TOO HEAVY0/100~2.5 tok/s

ctx—

Llama-3.1-405B-Instruct405B

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

Llama-3.2-90B-Vision90B

Multimodal

TOO HEAVY0/100~1.9 tok/s

ctx—

TOO HEAVY0/100~1.9 tok/s

ctx—

Mixtral-8x22B-Instruct141BMoE

General

TOO HEAVY0/100~1 tok/s

ctx—

TOO HEAVY0/100~1 tok/s

ctx—

Qwen2.5-72B-Instruct72B

General

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2.5-VL-72B72B

Multimodal

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

DeepSeek-V3671BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

Command-R+104B

General

TOO HEAVY0/100~1.7 tok/s

ctx—

TOO HEAVY0/100~1.7 tok/s

ctx—

Qwen3.5-122B-A10B-NVFP464.4BMoE

General

TOO HEAVY0/100~2.1 tok/s

ctx—

TOO HEAVY0/100~2.1 tok/s

ctx—

NVIDIA-Nemotron-3-Super-120B-A12B-NVFP467.2B

General

TOO HEAVY0/100~2.6 tok/s

ctx—

TOO HEAVY0/100~2.6 tok/s

ctx—

Llama-3.3-70B-Instruct70.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

llama-3.3-70b-instruct-awq70.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Llama-3.3-70B-Instruct-AWQ70.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

L3.3-GeneticLemonade-Final-v2-70B70.6B

General

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Meta-Llama-3.3-70B-Instruct-AWQ-INT470.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Meta-Llama-3.1-70B-Instruct-quantized.w4a1670.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Meta-Llama-3-70B-Instruct70.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Llama-3.1-70B70.6B

General

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Llama-3.1-Swallow-70B-Instruct-v0.370.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Meta-Llama-3.1-70B-Instruct-FP870.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Llama-3.3-70B-Instruct-FP8-dynamic70.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

jais-adapted-70b-chat-4bit-bnb71.6B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2.5-72B-Instruct-abliterated72.7B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2.5-72B72.7B

General

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2-72B-Instruct72.7B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2-72B72.7B

General

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2.5-72B-Instruct-AWQ73B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen2.5-72B-Instruct-GPTQ-Int473B

Chat

TOO HEAVY0/100~2.4 tok/s

ctx—

TOO HEAVY0/100~2.4 tok/s

ctx—

Qwen3-Coder-Next-8bit79.7B

Coding

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-MLX-4bit79.7B

Chat

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-MLX-8bit79.7B

Chat

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-MLX-6bit79.7B

Chat

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-MLX-5bit79.7B

Chat

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Coder-Next79.7B

Coding

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Coder-Next-FP879.7B

Coding

TOO HEAVY0/100~2.2 tok/s

ctx—

TOO HEAVY0/100~2.2 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct81.3B

Chat

TOO HEAVY0/100~2.1 tok/s

ctx—

TOO HEAVY0/100~2.1 tok/s

ctx—

Qwen3-Next-80B-A3B-Instruct-FP881.3B

Chat

TOO HEAVY0/100~2.1 tok/s

ctx—

TOO HEAVY0/100~2.1 tok/s

ctx—

GLM-4.5-Air110.5BMoE

General

TOO HEAVY0/100~1.2 tok/s

ctx—

TOO HEAVY0/100~1.2 tok/s

ctx—

Qwen1.5-110B-Chat-AWQ111.2B

Chat

TOO HEAVY0/100~1.5 tok/s

ctx—

TOO HEAVY0/100~1.5 tok/s

ctx—

gpt-oss-120b-MLX-8bit116.8B

General

TOO HEAVY0/100~1.5 tok/s

ctx—

TOO HEAVY0/100~1.5 tok/s

ctx—

gpt-oss-120b-heretic116.8B

General

TOO HEAVY0/100~1.5 tok/s

ctx—

TOO HEAVY0/100~1.5 tok/s

ctx—

gpt-oss-120b120.4B

General

TOO HEAVY0/100~1.4 tok/s

ctx—

TOO HEAVY0/100~1.4 tok/s

ctx—

XORTRON.CriminalComputing.LARGE.2026.3122.6B

General

TOO HEAVY0/100~1.4 tok/s

ctx—

TOO HEAVY0/100~1.4 tok/s

ctx—

NVIDIA-Nemotron-3-Super-120B-A12B-FP8123.6B

General

TOO HEAVY0/100~1.4 tok/s

ctx—

TOO HEAVY0/100~1.4 tok/s

ctx—

NVIDIA-Nemotron-3-Super-120B-A12B-BF16123.6B

General

TOO HEAVY0/100~1.4 tok/s

ctx—

TOO HEAVY0/100~1.4 tok/s

ctx—

Qwen3.5-122B-A10B125.1BMoE

General

TOO HEAVY0/100~1.1 tok/s

ctx—

TOO HEAVY0/100~1.1 tok/s

ctx—

Mixtral-8x22B-Instruct-v0.1140.6BMoE

Chat

TOO HEAVY0/100~1 tok/s

ctx—

TOO HEAVY0/100~1 tok/s

ctx—

dots.llm1.inst142.8BMoE

General

TOO HEAVY0/100~1 tok/s

ctx—

TOO HEAVY0/100~1 tok/s

ctx—

bloom176.2B

General

TOO HEAVY0/100~1 tok/s

ctx—

TOO HEAVY0/100~1 tok/s

ctx—

GLM-4.7-NVFP4177.2BMoE

General

TOO HEAVY0/100~0.8 tok/s

ctx—

TOO HEAVY0/100~0.8 tok/s

ctx—

falcon-180B-chat179.5B

Chat

TOO HEAVY0/100~1 tok/s

ctx—

TOO HEAVY0/100~1 tok/s

ctx—

Step-3.5-Flash199.4BMoE

General

TOO HEAVY0/100~0.7 tok/s

ctx—

TOO HEAVY0/100~0.7 tok/s

ctx—

Step-3.5-Flash-FP8199.4BMoE

General

TOO HEAVY0/100~0.7 tok/s

ctx—

TOO HEAVY0/100~0.7 tok/s

ctx—

MiniMax-M2.5-MLX-8bit228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2.5-MLX-4bit228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2.5-MLX-6bit228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2-AWQ228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2.5-AWQ228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2.5228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

MiniMax-M2.1228.7BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

Qwen3-235B-A22B-Instruct-2507-FP8235.1BMoE

Chat

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

Qwen3-235B-A22B-Thinking-2507-FP8235.1BMoE

General

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

Qwen3-235B-A22B-FP8235.1BMoE

General

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

deepseek-coder-v2-instruct-awq235.7BMoE

Coding

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

DeepSeek-V2.5-1210-FP8235.7BMoE

General

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

K-EXAONE-236B-A23B237.1BMoE

General

TOO HEAVY0/100~0.6 tok/s

ctx—

TOO HEAVY0/100~0.6 tok/s

ctx—

ERNIE-4.5-300B-A47B-Paddle300.5BMoE

General

TOO HEAVY0/100~0.5 tok/s

ctx—

TOO HEAVY0/100~0.5 tok/s

ctx—

MiMo-V2-Flash309.8BMoE

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

GLM-4.6356.8BMoE

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

GLM-4.7358.3BMoE

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

GLM-4.5358.3BMoE

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

DeepSeek-V3.2-NVFP4394.5BMoE

General

TOO HEAVY0/100~0.3 tok/s

ctx—

TOO HEAVY0/100~0.3 tok/s

ctx—

DeepSeek-V3-0324-NVFP4396.8BMoE

General

TOO HEAVY0/100~0.3 tok/s

ctx—

TOO HEAVY0/100~0.3 tok/s

ctx—

Llama-4-Maverick-17B-128E-Instruct401.6B

Chat

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

Qwen3.5-397B-A17B403.4BMoE

General

TOO HEAVY0/100~0.3 tok/s

ctx—

TOO HEAVY0/100~0.3 tok/s

ctx—

Llama-3.1-405B405.9B

General

TOO HEAVY0/100~0.4 tok/s

ctx—

TOO HEAVY0/100~0.4 tok/s

ctx—

Qwen3-Coder-480B-A35B-Instruct480.2BMoE

Coding

TOO HEAVY0/100~0.3 tok/s

ctx—

TOO HEAVY0/100~0.3 tok/s

ctx—

LongCat-Flash-Chat561.9B

Chat

TOO HEAVY0/100~0.3 tok/s

ctx—

TOO HEAVY0/100~0.3 tok/s

ctx—

DeepSeek-V3-0324684.5BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

DeepSeek-V3.2-AWQ685BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

DeepSeek-V3.2685.4BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

GLM-5753.9BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

GLM-5-FP8753.9BMoE

General

TOO HEAVY0/100~0.2 tok/s

ctx—

TOO HEAVY0/100~0.2 tok/s

ctx—

Kimi-K2-Instruct1026.5BMoE

Chat

TOO HEAVY0/100~0.1 tok/s

ctx—

TOO HEAVY0/100~0.1 tok/s

ctx—

Kimi-K2-Instruct-09051026.5BMoE

Chat

TOO HEAVY0/100~0.1 tok/s

ctx—

TOO HEAVY0/100~0.1 tok/s

ctx—

Kimi-K2-Thinking1058.1BMoE

General

TOO HEAVY0/100~0.1 tok/s

ctx—

TOO HEAVY0/100~0.1 tok/s

ctx—

Kimi-K2.51058.6BMoE

General

TOO HEAVY0/100~0.1 tok/s

ctx—

TOO HEAVY0/100~0.1 tok/s

ctx—

RUNS GREAT

RUNS WELL

DECENT

TOO HEAVY

✦ TurboQuant — 4× KV compression at 32K ctx