RUNYARD.DEV / COMPARE

Compare Devices

Pick two devices · Device B can be wrapped with TurboQuant

Device ANormal
VRAM
RAM
Cores
Backend
Bandwidth~504 GB/s
Device B✦ TurboQuant
VRAM
RAM
Cores
Backend
Bandwidth~504 GB/s
Sort
A: 30Tie: 361B: 175
Model
RTX 4070 · Normal
RTX 4070 · ✦ TQ
gemma-3n-E4B-it8BUNLOCKED
Multimodal
RUNS GREAT73/100~27.5 tok/s
ctx7K
RUNS GREAT73/100~27.5 tok/s
7K28K
Qwen3-8B8BUNLOCKED
Reasoning
RUNS GREAT72/100~27.5 tok/s
ctx7K
RUNS GREAT72/100~27.5 tok/s
7K28K
DeepSeek-R1-0528-Qwen3-8B8.2BUNLOCKED
Reasoning
RUNS GREAT72/100~26.8 tok/s
ctx6K
RUNS GREAT73/100~26.8 tok/s
6K26K
granite-3.1-8b-instruct8BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K28K
Qwen-7B7.7BUNLOCKED
General
RUNS GREAT69/100~28.6 tok/s
ctx8K
RUNS GREAT69/100~28.6 tok/s
8K32K
Qwen1.5-7B7.7BUNLOCKED
General
RUNS GREAT69/100~28.6 tok/s
ctx8K
RUNS GREAT69/100~28.6 tok/s
8K32K
EXAONE-Deep-7.8B7.8BUNLOCKED
General
RUNS GREAT69/100~28.2 tok/s
ctx8K
RUNS GREAT69/100~28.2 tok/s
8K30K
MiMo-7B-Base7.8BUNLOCKED
General
RUNS GREAT69/100~28.2 tok/s
ctx8K
RUNS GREAT69/100~28.2 tok/s
8K30K
saiga_llama3_8b8BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K8K
Hermes-3-Llama-3.1-8B8BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K28K
Llama-3.1-Nemotron-Nano-8B-v18BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K28K
Meta-Llama-3.1-8B-FP88BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K28K
llava-onevision-qwen2-7b-ov8BUNLOCKED
General
RUNS GREAT69/100~27.5 tok/s
ctx7K
RUNS GREAT69/100~27.5 tok/s
7K28K
gemma-2-9b-it9BUNLOCKED
General
RUNS GREAT68/100~24.4 tok/s
ctx4K
RUNS GREAT69/100~24.4 tok/s
4K8K
Qwen3-14B-NVFP48.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
Qwen3-8B-Base8.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
Qwen3-8B-AWQ8.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
Qwen3-8B-FP88.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
Qwen3-8B.w8a88.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
Qwen3-8B-FP8-dynamic8.2BUNLOCKED
General
RUNS GREAT68/100~26.8 tok/s
ctx6K
RUNS GREAT69/100~26.8 tok/s
6K26K
LFM2-8B-A1B8.3BMoEUNLOCKED
General
RUNS GREAT68/100~21.2 tok/s
ctx6K
RUNS GREAT69/100~21.2 tok/s
6K25K
NVIDIA-Nemotron-Nano-9B-v28.9BUNLOCKED
General
RUNS GREAT68/100~24.7 tok/s
ctx5K
RUNS GREAT69/100~24.7 tok/s
5K18K
NVIDIA-Nemotron-Nano-9B-v2-Japanese8.9BUNLOCKED
General
RUNS GREAT68/100~24.7 tok/s
ctx5K
RUNS GREAT69/100~24.7 tok/s
5K18K
NVIDIA-Nemotron-Nano-9B-v2-Base8.9BUNLOCKED
General
RUNS GREAT68/100~24.7 tok/s
ctx5K
RUNS GREAT69/100~24.7 tok/s
5K18K
NVIDIA-Nemotron-Nano-9B-v2-FP88.9BUNLOCKED
General
RUNS GREAT68/100~24.7 tok/s
ctx5K
RUNS GREAT69/100~24.7 tok/s
5K18K
QwQ-32B-MLX-8bit9.2BUNLOCKED
General
RUNS GREAT68/100~23.9 tok/s
ctx4K
RUNS GREAT69/100~23.9 tok/s
4K15K
glm-4-9b9.4BUNLOCKED
General
RUNS GREAT68/100~23.4 tok/s
ctx3K
RUNS GREAT69/100~23.4 tok/s
3K8K
Qwen2.5-Coder-32B-Instruct-MLX-8bit9.2BUNLOCKED
Coding
RUNS GREAT67/100~23.9 tok/s
ctx4K
RUNS GREAT68/100~23.9 tok/s
4K15K
Qwen3-Coder-30B-A3B-Instruct-gptq-8bit9.3BMoEUNLOCKED
Coding
RUNS GREAT66/100~18.9 tok/s
ctx4K
RUNS GREAT67/100~18.9 tok/s
4K15K
Llama-3.1-8B-Instruct8BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Qwen-7B-Chat7.7BUNLOCKED
Chat
RUNS GREAT63/100~28.6 tok/s
ctx8K
RUNS GREAT63/100~28.6 tok/s
8K32K
salamandra-7b-instruct7.8BUNLOCKED
Chat
RUNS GREAT63/100~28.2 tok/s
ctx8K
RUNS GREAT63/100~28.2 tok/s
8K8K
Ministral-8B-Instruct-24108BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Meta-Llama-3.1-8B-Instruct8BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Llama-3.1-8B-Instruct-FP88BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Llama-3-Patronus-Lynx-8B-Instruct-v1.18BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Meta-Llama-3.1-8B-Instruct-FP88BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Meta-Llama-3.1-8B-Instruct-quantized.w4a168BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
Meta-Llama-3.1-8B-Instruct-FP8-dynamic8BUNLOCKED
Chat
RUNS GREAT63/100~27.5 tok/s
ctx7K
RUNS GREAT63/100~27.5 tok/s
7K28K
granite-3.3-8b-instruct8.2BUNLOCKED
Chat
RUNS GREAT62/100~26.8 tok/s
ctx6K
RUNS GREAT63/100~26.8 tok/s
6K26K
SDAR-8B-Chat-b328.2BUNLOCKED
Chat
RUNS GREAT62/100~26.8 tok/s
ctx6K
RUNS GREAT63/100~26.8 tok/s
6K26K
Qwen2.5-VL-7B-Instruct8.3BUNLOCKED
Chat
RUNS GREAT62/100~26.5 tok/s
ctx6K
RUNS GREAT63/100~26.5 tok/s
6K25K
rnj-1-instruct8.3BUNLOCKED
Chat
RUNS GREAT62/100~26.5 tok/s
ctx6K
RUNS GREAT63/100~26.5 tok/s
6K25K
Mistral-NeMo-Minitron-8B-Instruct8.4BUNLOCKED
Chat
RUNS GREAT62/100~26.2 tok/s
ctx6K
RUNS GREAT63/100~26.2 tok/s
6K8K
Qwen3-30B-A3B-Instruct-2507-AWQ-8bit9BMoEUNLOCKED
Chat
RUNS GREAT61/100~19.6 tok/s
ctx4K
RUNS GREAT62/100~19.6 tok/s
4K17K
glm-4-9b-chat-hf9.4BUNLOCKED
Chat
RUNS GREAT61/100~23.4 tok/s
ctx3K
RUNS GREAT63/100~23.4 tok/s
3K14K
glm-4-9b-chat9.4BUNLOCKED
Chat
RUNS GREAT61/100~23.4 tok/s
ctx3K
RUNS GREAT63/100~23.4 tok/s
3K14K
Qwen3.5-9B9.7BUNLOCKED
General
RUNS WELL49/100~22.7 tok/s
ctx3K
RUNS WELL51/100~22.7 tok/s
3K11K
Qwen3.5-9B-Base9.7BUNLOCKED
General
RUNS WELL49/100~22.7 tok/s
ctx3K
RUNS WELL51/100~22.7 tok/s
3K11K
Qwen2.5-VL-7B7B
Multimodal
RUNS GREAT73/100~31.4 tok/s
ctx10K
RUNS GREAT73/100~31.4 tok/s
10K32K
DeepSeek-R1-7B7B
Reasoning
RUNS GREAT72/100~31.4 tok/s
ctx10K
RUNS GREAT72/100~31.4 tok/s
10K42K
MiMo-7B-RL7B
Reasoning
RUNS GREAT72/100~31.4 tok/s
ctx10K
RUNS GREAT72/100~31.4 tok/s
10K33K
DeepSeek-R1-Distill-Qwen-7B7.6B
Reasoning
RUNS GREAT72/100~28.9 tok/s
ctx8K
RUNS GREAT72/100~28.9 tok/s
8K33K
Orca-2-7B7B
Reasoning
RUNS GREAT71/100~31.4 tok/s
ctx4K
RUNS GREAT71/100~31.4 tok/s
ctx4K
Qwen2.5-7B-Instruct7B
General
RUNS GREAT69/100~31.4 tok/s
ctx10K
RUNS GREAT69/100~31.4 tok/s
10K42K
Qwen3-32B-quantized.w4a165.7B
General
RUNS GREAT69/100~38.6 tok/s
ctx17K
RUNS GREAT69/100~38.6 tok/s
17K41K
zephyr-7b-beta7.2B
General
RUNS GREAT69/100~30.6 tok/s
ctx10K
RUNS GREAT69/100~30.6 tok/s
10K33K
Mistral-7B-v0.17.2B
General
RUNS GREAT69/100~30.6 tok/s
ctx10K
RUNS GREAT69/100~30.6 tok/s
10K33K
prometheus-7b-v2.07.2B
General
RUNS GREAT69/100~30.6 tok/s
ctx10K
RUNS GREAT69/100~30.6 tok/s
10K33K
xLAM-7b-r7.2B
General
RUNS GREAT69/100~30.6 tok/s
ctx10K
RUNS GREAT69/100~30.6 tok/s
10K33K
dolphin-2.6-mistral-7b7.2B
General
RUNS GREAT69/100~30.6 tok/s
ctx10K
RUNS GREAT69/100~30.6 tok/s
10K33K
Olmo-3-1025-7B7.3B
General
RUNS GREAT69/100~30.1 tok/s
ctx9K
RUNS GREAT69/100~30.1 tok/s
9K37K
Qwen2.5-7B7.6B
General
RUNS GREAT69/100~28.9 tok/s
ctx8K
RUNS GREAT69/100~28.9 tok/s
8K33K
SWE-agent-LM-7B7.6B
General
RUNS GREAT69/100~28.9 tok/s
ctx8K
RUNS GREAT69/100~28.9 tok/s
8K33K
Qwen2-7B7.6B
General
RUNS GREAT69/100~28.9 tok/s
ctx8K
RUNS GREAT69/100~28.9 tok/s
8K33K
VulnLLM-R-7B7.6B
General
RUNS GREAT69/100~28.9 tok/s
ctx8K
RUNS GREAT69/100~28.9 tok/s
8K33K
Qwen2.5-Coder-7B7B
Coding
RUNS GREAT68/100~31.4 tok/s
ctx10K
RUNS GREAT68/100~31.4 tok/s
10K42K
granite-3.0-8b-instruct8B
General
RUNS GREAT68/100~27.5 tok/s
ctx4K
RUNS GREAT68/100~27.5 tok/s
ctx4K
CodeLlama-7B-Instruct7B
Coding
RUNS GREAT68/100~31.4 tok/s
ctx10K
RUNS GREAT68/100~31.4 tok/s
10K16K
StarCoder2-7B7B
Coding
RUNS GREAT68/100~31.4 tok/s
ctx10K
RUNS GREAT68/100~31.4 tok/s
10K16K
GLM-4.7-Flash-AWQ-4bit6.4BMoE
General
RUNS GREAT68/100~27.5 tok/s
ctx13K
RUNS GREAT68/100~27.5 tok/s
13K52K
Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC6.7B
General
RUNS GREAT68/100~32.8 tok/s
ctx4K
RUNS GREAT67/100~32.8 tok/s
ctx4K
Nous-Hermes-llama-2-7b6.7B
General
RUNS GREAT68/100~32.8 tok/s
ctx4K
RUNS GREAT67/100~32.8 tok/s
ctx4K
Llama-2-7b-hf6.7B
General
RUNS GREAT68/100~32.8 tok/s
ctx4K
RUNS GREAT67/100~32.8 tok/s
ctx4K
CodeLlama-7b-hf6.7B
Coding
RUNS GREAT68/100~32.8 tok/s
ctx12K
RUNS GREAT67/100~32.8 tok/s
12K16K
deepseek-coder-6.7b-instruct6.7B
Coding
RUNS GREAT68/100~32.8 tok/s
ctx12K
RUNS GREAT67/100~32.8 tok/s
12K16K
deepseek-coder-6.7b-base6.7B
Coding
RUNS GREAT68/100~32.8 tok/s
ctx12K
RUNS GREAT67/100~32.8 tok/s
12K16K
Orca-2-7b7B
General
RUNS GREAT68/100~31.4 tok/s
ctx4K
RUNS GREAT68/100~31.4 tok/s
ctx4K
Tarsier-7b7.1B
General
RUNS GREAT68/100~31 tok/s
ctx4K
RUNS GREAT68/100~31 tok/s
ctx4K
starcoder2-7b7.2B
Coding
RUNS GREAT68/100~30.6 tok/s
ctx10K
RUNS GREAT68/100~30.6 tok/s
10K16K
falcon-7b7.2B
General
RUNS GREAT68/100~30.6 tok/s
ctx4K
RUNS GREAT67/100~30.6 tok/s
ctx4K
wildguard7.2B
General
RUNS GREAT68/100~30.6 tok/s
ctx4K
RUNS GREAT67/100~30.6 tok/s
ctx4K
starcoder2-7b-GPTQ7.4B
Coding
RUNS GREAT68/100~29.7 tok/s
ctx9K
RUNS GREAT68/100~29.7 tok/s
9K16K
Qwen2.5-Math-7B7.6B
General
RUNS GREAT68/100~28.9 tok/s
ctx4K
RUNS GREAT68/100~28.9 tok/s
ctx4K
Llama-3.1-8B8B
General
RUNS GREAT68/100~27.5 tok/s
ctx4K
RUNS GREAT68/100~27.5 tok/s
ctx4K
Meta-Llama-3-8B8B
General
RUNS GREAT68/100~27.5 tok/s
ctx4K
RUNS GREAT68/100~27.5 tok/s
ctx4K
Llama-Guard-3-8B8B
General
RUNS GREAT68/100~27.5 tok/s
ctx4K
RUNS GREAT68/100~27.5 tok/s
ctx4K
llama-3.1-8b-bias-reduced8B
General
RUNS GREAT68/100~27.5 tok/s
ctx4K
RUNS GREAT68/100~27.5 tok/s
ctx4K
Qwen3-235B-A22B235BMoE
Reasoning
RUNS WELL67/100~0.6 tok/s
ctx927
RUNS WELL68/100~0.6 tok/s
9274K
CodeLlama-7b-Instruct-hf6.7B
Coding
RUNS GREAT67/100~32.8 tok/s
ctx4K
RUNS GREAT66/100~32.8 tok/s
ctx4K
OLMoE-1B-7B-01256.9BMoE
General
RUNS GREAT67/100~25.5 tok/s
ctx4K
RUNS GREAT67/100~25.5 tok/s
ctx4K
Qwen2.5-Coder-7B-Instruct7.6B
Coding
RUNS GREAT67/100~28.9 tok/s
ctx8K
RUNS GREAT68/100~28.9 tok/s
8K33K
Qwen2.5-Coder-7B-Instruct-GPTQ-Int47.6B
Coding
RUNS GREAT67/100~28.9 tok/s
ctx8K
RUNS GREAT68/100~28.9 tok/s
8K33K
Qwen2.5-Coder-7B-Instruct-AWQ7.6B
Coding
RUNS GREAT67/100~28.9 tok/s
ctx8K
RUNS GREAT68/100~28.9 tok/s
8K33K
hf-moshiko7.8B
General
RUNS GREAT67/100~28.2 tok/s
ctx3K
RUNS GREAT67/100~28.2 tok/s
ctx3K
Qwen3-30B-A3B30BMoE
Reasoning
RUNS WELL66/100~5.9 tok/s
ctx47K
RUNS WELL66/100~5.9 tok/s
47K128K
OLMo-7B-Instruct7B
General
RUNS GREAT66/100~31.4 tok/s
ctx2K
RUNS GREAT67/100~31.4 tok/s
ctx2K
Falcon-7B-Instruct7B
General
RUNS GREAT66/100~31.4 tok/s
ctx2K
RUNS GREAT67/100~31.4 tok/s
ctx2K
Amber6.7B
General
RUNS GREAT66/100~32.8 tok/s
ctx2K
RUNS GREAT66/100~32.8 tok/s
ctx2K
llama-7b6.7B
General
RUNS GREAT66/100~32.8 tok/s
ctx2K
RUNS GREAT66/100~32.8 tok/s
ctx2K
pythia-6.9b7B
General
RUNS GREAT66/100~31.4 tok/s
ctx2K
RUNS GREAT67/100~31.4 tok/s
ctx2K
Llama-3.2-1B-Instruct1B
Chat
RUNS WELL65/100~220 tok/s
ctx128K
RUNS WELL65/100~220 tok/s
ctx128K
gemma-3-1b-it1B
Chat
RUNS WELL65/100~220 tok/s
ctx33K
RUNS WELL65/100~220 tok/s
ctx33K
Qwen3-4B-Instruct-2507-MLX-8bit1.1B
Chat
RUNS WELL65/100~200 tok/s
ctx158K
RUNS WELL65/100~200 tok/s
158K262K
Qwen3-4B-Instruct-2507-MLX-5bit0.8B
Chat
RUNS WELL64/100~275 tok/s
ctx223K
RUNS WELL64/100~275 tok/s
223K262K
Qwen3-4B-Instruct-2507-MLX-6bit0.9B
Chat
RUNS WELL64/100~244.4 tok/s
ctx196K
RUNS WELL65/100~244.4 tok/s
196K262K
Mistral-7B-Instruct7B
Chat
RUNS GREAT63/100~31.4 tok/s
ctx10K
RUNS GREAT64/100~31.4 tok/s
10K32K
Qwen2.5-0.5B-Instruct0.5B
Chat
RUNS WELL63/100~440 tok/s
ctx128K
RUNS WELL63/100~440 tok/s
ctx128K
Qwen1.5-0.5B-Chat0.6B
Chat
RUNS WELL63/100~366.7 tok/s
ctx33K
RUNS WELL63/100~366.7 tok/s
ctx33K
Qwen3-4B-Instruct-2507-MLX-4bit0.6B
Chat
RUNS WELL63/100~366.7 tok/s
ctx262K
RUNS WELL63/100~366.7 tok/s
ctx262K
TinyLlama-1.1B-Chat-v1.01.1B
Chat
RUNS WELL63/100~200 tok/s
ctx2K
RUNS WELL63/100~200 tok/s
ctx2K
tinyllama-oneshot-w8w8-test-static-shape-change1.1B
Chat
RUNS WELL63/100~200 tok/s
ctx2K
RUNS WELL63/100~200 tok/s
ctx2K
LFM2.5-1.2B-Instruct1.2B
Chat
RUNS WELL63/100~183.3 tok/s
ctx128K
RUNS WELL63/100~183.3 tok/s
ctx128K
Vikhr-Llama-3.2-1B-Instruct1.2B
Chat
RUNS WELL63/100~183.3 tok/s
ctx131K
RUNS WELL63/100~183.3 tok/s
ctx131K
openchat-3.5-01067B
Chat
RUNS GREAT63/100~31.4 tok/s
ctx8K
RUNS GREAT63/100~31.4 tok/s
ctx8K
Mistral-7B-Instruct-v0.27.2B
Chat
RUNS GREAT63/100~30.6 tok/s
ctx10K
RUNS GREAT63/100~30.6 tok/s
10K33K
Mistral-7B-Instruct-v0.37.2B
Chat
RUNS GREAT63/100~30.6 tok/s
ctx10K
RUNS GREAT63/100~30.6 tok/s
10K33K
Mistral-7B-Instruct-v0.3-GPTQ7.2B
Chat
RUNS GREAT63/100~30.6 tok/s
ctx10K
RUNS GREAT63/100~30.6 tok/s
10K33K
Olmo-3-7B-Instruct-SFT7.3B
Chat
RUNS GREAT63/100~30.1 tok/s
ctx9K
RUNS GREAT63/100~30.1 tok/s
9K37K
Falcon3-7B-Instruct7.5B
Chat
RUNS GREAT63/100~29.3 tok/s
ctx9K
RUNS GREAT64/100~29.3 tok/s
9K33K
Qwen2-7B-Instruct7.6B
Chat
RUNS GREAT63/100~28.9 tok/s
ctx8K
RUNS GREAT63/100~28.9 tok/s
8K33K
XCurOS-0.1-8B-Instruct7.6B
Chat
RUNS GREAT63/100~28.9 tok/s
ctx8K
RUNS GREAT63/100~28.9 tok/s
8K33K
Dream-v0-Instruct-7B7.6B
Chat
RUNS GREAT63/100~28.9 tok/s
ctx8K
RUNS GREAT63/100~28.9 tok/s
8K33K
Qwen2.5-7B-Instruct-GPTQ-Int47.6B
Chat
RUNS GREAT63/100~28.9 tok/s
ctx8K
RUNS GREAT63/100~28.9 tok/s
8K33K
Qwen2.5-7B-Instruct-1M7.6B
Chat
RUNS GREAT63/100~28.9 tok/s
ctx8K
RUNS GREAT63/100~28.9 tok/s
8K33K
Yi-6B-Chat6.1B
Chat
RUNS GREAT62/100~36.1 tok/s
ctx4K
RUNS GREAT62/100~36.1 tok/s
ctx4K
vicuna-7b-v1.56.7B
Chat
RUNS GREAT62/100~32.8 tok/s
ctx4K
RUNS GREAT62/100~32.8 tok/s
ctx4K
Llama-2-7b-chat-hf6.7B
Chat
RUNS GREAT62/100~32.8 tok/s
ctx4K
RUNS GREAT62/100~32.8 tok/s
ctx4K
granite-4.0-h-tiny6.9BMoE
Chat
RUNS GREAT62/100~25.5 tok/s
ctx11K
RUNS GREAT62/100~25.5 tok/s
11K43K
falcon-7b-instruct7.2B
Chat
RUNS GREAT62/100~30.6 tok/s
ctx4K
RUNS GREAT62/100~30.6 tok/s
ctx4K
falcon-mamba-7b-instruct7.3B
Chat
RUNS GREAT62/100~30.1 tok/s
ctx4K
RUNS GREAT62/100~30.1 tok/s
ctx4K
Qwen2.5-Math-7B-Instruct7.6B
Chat
RUNS GREAT62/100~28.9 tok/s
ctx4K
RUNS GREAT62/100~28.9 tok/s
ctx4K
Meta-Llama-3-8B-Instruct8B
Chat
RUNS GREAT62/100~27.5 tok/s
ctx4K
RUNS GREAT62/100~27.5 tok/s
ctx4K
Zamba2-1.2B-instruct1.2B
Chat
RUNS WELL61/100~183.3 tok/s
ctx4K
RUNS WELL61/100~183.3 tok/s
ctx4K
Abliterated-Llama-3.2-1B-Instruct1.2B
Chat
RUNS WELL61/100~183.3 tok/s
ctx4K
RUNS WELL61/100~183.3 tok/s
ctx4K
OLMoE-1B-7B-0125-Instruct6.9BMoE
Chat
RUNS GREAT61/100~25.5 tok/s
ctx4K
RUNS GREAT61/100~25.5 tok/s
ctx4K
Phi-mini-MoE-instruct7.6BMoE
Chat
RUNS GREAT61/100~23.2 tok/s
ctx4K
RUNS GREAT61/100~23.2 tok/s
ctx4K
Qwen3-4B-Thinking-2507-MLX-8bit1.1B
General
RUNS WELL60/100~200 tok/s
ctx158K
RUNS WELL60/100~200 tok/s
158K262K
Qwen3-0.6B0.8B
General
RUNS WELL59/100~275 tok/s
ctx41K
RUNS WELL59/100~275 tok/s
ctx41K
Qwen3Guard-Gen-0.6B0.8B
General
RUNS WELL59/100~275 tok/s
ctx33K
RUNS WELL59/100~275 tok/s
ctx33K
Qwen3-0.6B-FP80.8B
General
RUNS WELL59/100~275 tok/s
ctx41K
RUNS WELL59/100~275 tok/s
ctx41K
Qwen3.5-0.8B0.9B
General
RUNS WELL59/100~244.4 tok/s
ctx196K
RUNS WELL60/100~244.4 tok/s
196K262K
Qwen3.5-0.8B-Base0.9B
General
RUNS WELL59/100~244.4 tok/s
ctx196K
RUNS WELL60/100~244.4 tok/s
196K262K
Qwen3-4B-Thinking-2507-MLX-6bit0.9B
General
RUNS WELL59/100~244.4 tok/s
ctx196K
RUNS WELL60/100~244.4 tok/s
196K262K
LFM2-1.2B1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2.5-1.2B-Thinking1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2.5-1.2B-JP1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2-1.2B-Tool1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2-1.2B-RAG1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2-1.2B-Extract1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2.5-1.2B-Base1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
LFM2-1.2B-MLX-bf161.2B
General
RUNS WELL59/100~183.3 tok/s
ctx128K
RUNS WELL59/100~183.3 tok/s
ctx128K
Ilama-3.2-1B1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx131K
RUNS WELL59/100~183.3 tok/s
ctx131K
CyberXP_Agent_Llama_3.2_1B1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx131K
RUNS WELL59/100~183.3 tok/s
ctx131K
Orpo-Llama-3.2-1B-15k1.2B
General
RUNS WELL59/100~183.3 tok/s
ctx131K
RUNS WELL59/100~183.3 tok/s
ctx131K
Qwen3-4B-MLX-4bit0.6B
General
RUNS WELL58/100~366.7 tok/s
ctx66K
RUNS WELL58/100~366.7 tok/s
ctx66K
Qwen1.5-0.5B0.6B
General
RUNS WELL58/100~366.7 tok/s
ctx33K
RUNS WELL58/100~366.7 tok/s
ctx33K
Qwen3-4B-Thinking-2507-MLX-4bit0.6B
General
RUNS WELL58/100~366.7 tok/s
ctx262K
RUNS WELL58/100~366.7 tok/s
ctx262K
LFM2-700M0.7B
General
RUNS WELL58/100~314.3 tok/s
ctx128K
RUNS WELL58/100~314.3 tok/s
ctx128K
Qwen3-8B-speculator.eagle31B
General
RUNS WELL58/100~220 tok/s
ctx4K
RUNS WELL59/100~220 tok/s
ctx4K
pythia-1b1.1B
General
RUNS WELL58/100~200 tok/s
ctx2K
RUNS WELL58/100~200 tok/s
ctx2K
Qwen2.5-1.5B-Instruct1.5B
Chat
RUNS WELL57/100~146.7 tok/s
ctx111K
RUNS WELL57/100~146.7 tok/s
111K128K
Qwen3-4B4B
Reasoning
RUNS WELL57/100~55 tok/s
ctx31K
RUNS WELL56/100~55 tok/s
31K124K
Falcon-H1-0.5B-Base0.5B
General
RUNS WELL57/100~440 tok/s
ctx16K
RUNS WELL57/100~440 tok/s
ctx16K
Qwen3-4B-DFlash-b160.5B
General
RUNS WELL57/100~440 tok/s
ctx41K
RUNS WELL57/100~440 tok/s
ctx41K
h2ovl-mississippi-800m0.8B
General
RUNS WELL57/100~275 tok/s
ctx4K
RUNS WELL57/100~275 tok/s
ctx4K
ELM0.9B
General
RUNS WELL57/100~244.4 tok/s
ctx2K
RUNS WELL57/100~244.4 tok/s
ctx2K
Llama-3.2-1B1.2B
General
RUNS WELL57/100~183.3 tok/s
ctx4K
RUNS WELL57/100~183.3 tok/s
ctx4K
Jan-nano-AWQ1.3B
General
RUNS WELL57/100~169.2 tok/s
ctx41K
RUNS WELL58/100~169.2 tok/s
ctx41K
EXAONE-4.0-1.2B1.3B
General
RUNS WELL57/100~169.2 tok/s
ctx66K
RUNS WELL58/100~169.2 tok/s
ctx66K
Qwen3-8B-MLX-4bit1.3B
General
RUNS WELL57/100~169.2 tok/s
ctx41K
RUNS WELL58/100~169.2 tok/s
ctx41K
plamo-2-1b1.3B
General
RUNS WELL57/100~169.2 tok/s
ctx131K
RUNS WELL58/100~169.2 tok/s
131K523K
Llama-3.2-1B-Instruct-FP81.5B
Chat
RUNS WELL57/100~146.7 tok/s
ctx111K
RUNS WELL57/100~146.7 tok/s
111K131K
Llama-3.2-1B-Instruct-FP8-dynamic1.5B
Chat
RUNS WELL57/100~146.7 tok/s
ctx111K
RUNS WELL57/100~146.7 tok/s
111K131K
Qwen2-1.5B-Instruct1.5B
Chat
RUNS WELL57/100~146.7 tok/s
ctx33K
RUNS WELL57/100~146.7 tok/s
ctx33K
Qwen2-1.5B-Instruct-FP81.5B
Chat
RUNS WELL57/100~146.7 tok/s
ctx33K
RUNS WELL57/100~146.7 tok/s
ctx33K
bloom-560m0.6B
General
RUNS WELL56/100~366.7 tok/s
ctx4K
RUNS WELL56/100~366.7 tok/s
ctx4K
GA_Guard_Lite0.6B
General
RUNS WELL56/100~366.7 tok/s
ctx4K
RUNS WELL56/100~366.7 tok/s
ctx4K
gpt_bigcode-santacoder1.1B
Coding
RUNS WELL56/100~200 tok/s
ctx2K
RUNS WELL56/100~200 tok/s
ctx2K
OLMo-1B-hf1.2B
General
RUNS WELL56/100~183.3 tok/s
ctx2K
RUNS WELL56/100~183.3 tok/s
ctx2K
llama-3.2-1b-code-instruct1.2B
Coding
RUNS WELL56/100~183.3 tok/s
ctx131K
RUNS WELL57/100~183.3 tok/s
ctx131K
starvector-1b-im2svg1.4B
General
RUNS WELL56/100~157.1 tok/s
ctx8K
RUNS WELL56/100~157.1 tok/s
ctx8K
OLMo-2-0425-1B-Instruct1.5B
Chat
RUNS WELL56/100~146.7 tok/s
ctx4K
RUNS WELL56/100~146.7 tok/s
ctx4K
Qwen2.5-1.5B1.5B
General
RUNS WELL56/100~146.7 tok/s
ctx111K
RUNS WELL55/100~146.7 tok/s
111K131K
Qwen2-1.5B1.5B
General
RUNS WELL56/100~146.7 tok/s
ctx111K
RUNS WELL55/100~146.7 tok/s
111K131K
Qwen2.5-Math-1.5B-Instruct1.5B
Chat
RUNS WELL56/100~146.7 tok/s
ctx4K
RUNS WELL56/100~146.7 tok/s
ctx4K
xLAM-2-1b-fc-r1.5B
General
RUNS WELL56/100~146.7 tok/s
ctx33K
RUNS WELL55/100~146.7 tok/s
ctx33K
qwen-base-invoicev1.01-1.5B1.5B
General
RUNS WELL56/100~146.7 tok/s
ctx33K
RUNS WELL55/100~146.7 tok/s
ctx33K
Phi-4-mini-reasoning3.8B
Reasoning
RUNS WELL56/100~57.9 tok/s
ctx16K
RUNS WELL56/100~57.9 tok/s
ctx16K
Llama-4-Maverick400BMoE
General
RUNS WELL55/100~0.4 tok/s
ctx2K
RUNS WELL57/100~0.4 tok/s
2K7K
bge-m30.57B
Embedding
RUNS WELL55/100~386 tok/s
ctx8K
RUNS WELL55/100~386 tok/s
ctx8K
pythia-410m0.5B
General
RUNS WELL55/100~440 tok/s
ctx2K
RUNS WELL55/100~440 tok/s
ctx2K
pythia-410m-deduped0.5B
General
RUNS WELL55/100~440 tok/s
ctx2K
RUNS WELL55/100~440 tok/s
ctx2K
bloomz-560m0.6B
General
RUNS WELL55/100~366.7 tok/s
ctx2K
RUNS WELL55/100~366.7 tok/s
ctx2K
LFM2-VL-1.6B1.6B
General
RUNS WELL55/100~137.5 tok/s
ctx103K
RUNS WELL55/100~137.5 tok/s
103K128K
LFM2.5-VL-1.6B1.6B
General
RUNS WELL55/100~137.5 tok/s
ctx103K
RUNS WELL55/100~137.5 tok/s
103K128K
stablelm-2-1_6b-chat1.6B
Chat
RUNS WELL55/100~137.5 tok/s
ctx4K
RUNS WELL55/100~137.5 tok/s
ctx4K
Qwen3.5-4B4.7B
General
RUNS WELL55/100~46.8 tok/s
ctx24K
RUNS WELL54/100~46.8 tok/s
24K95K
Qwen3.5-4B-Base4.7B
General
RUNS WELL55/100~46.8 tok/s
ctx24K
RUNS WELL54/100~46.8 tok/s
24K95K
Qwen3-8B-NVFP44.7B
General
RUNS WELL55/100~46.8 tok/s
ctx24K
RUNS WELL54/100~46.8 tok/s
24K41K
NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ5.1B
General
RUNS WELL55/100~43.1 tok/s
ctx21K
RUNS WELL55/100~43.1 tok/s
21K83K
Qwen3-32B-MLX-4bit5.1B
General
RUNS WELL55/100~43.1 tok/s
ctx21K
RUNS WELL55/100~43.1 tok/s
21K41K
QwQ-32B-MLX-4bit5.1B
General
RUNS WELL55/100~43.1 tok/s
ctx21K
RUNS WELL55/100~43.1 tok/s
21K83K
DeepSeek-V2-Chat236BMoE
General
RUNS WELL54/100~0.6 tok/s
ctx1K
RUNS WELL56/100~0.6 tok/s
1K5K
DeepSeek-R1-0528-Qwen3-8B-MLX-4bit1.3B
Reasoning
RUNS WELL54/100~169.2 tok/s
ctx131K
RUNS WELL54/100~169.2 tok/s
131K131K
gpt-neo-1.3B1.4B
General
RUNS WELL54/100~157.1 tok/s
ctx2K
RUNS WELL54/100~157.1 tok/s
ctx2K
phi-1_51.4B
General
RUNS WELL54/100~157.1 tok/s
ctx2K
RUNS WELL54/100~157.1 tok/s
ctx2K
LFM2-Audio-1.5B1.5B
General
RUNS WELL54/100~146.7 tok/s
ctx4K
RUNS WELL54/100~146.7 tok/s
ctx4K
LFM2.5-Audio-1.5B1.5B
General
RUNS WELL54/100~146.7 tok/s
ctx4K
RUNS WELL54/100~146.7 tok/s
ctx4K
OLMo-2-0425-1B1.5B
General
RUNS WELL54/100~146.7 tok/s
ctx4K
RUNS WELL54/100~146.7 tok/s
ctx4K
Qwen2.5-Math-1.5B1.5B
General
RUNS WELL54/100~146.7 tok/s
ctx4K
RUNS WELL54/100~146.7 tok/s
ctx4K
SmolLM2-1.7B1.7B
General
RUNS WELL54/100~129.4 tok/s
ctx8K
RUNS WELL54/100~129.4 tok/s
ctx8K
Nanbeige4.1-3B-AWQ-8bit1.7B
General
RUNS WELL54/100~129.4 tok/s
ctx96K
RUNS WELL54/100~129.4 tok/s
96K262K
Qwen3-1.7B-Base1.7B
General
RUNS WELL54/100~129.4 tok/s
ctx33K
RUNS WELL54/100~129.4 tok/s
ctx33K
Qwen3-1.7B-MLX-bf161.7B
General
RUNS WELL54/100~129.4 tok/s
ctx41K
RUNS WELL54/100~129.4 tok/s
ctx41K
Qwen2.5-1.5B-Instruct-AWQ1.8B
Chat
RUNS WELL54/100~122.2 tok/s
ctx33K
RUNS WELL54/100~122.2 tok/s
ctx33K
Qwen2-1.5B-Instruct-AWQ1.8B
Chat
RUNS WELL54/100~122.2 tok/s
ctx33K
RUNS WELL54/100~122.2 tok/s
ctx33K
Qwen2-1.5B-Instruct-GPTQ-Int41.8B
Chat
RUNS WELL54/100~122.2 tok/s
ctx33K
RUNS WELL54/100~122.2 tok/s
ctx33K
Qwen2.5-1.5B-quantized.w8a81.8B
General
RUNS WELL54/100~122.2 tok/s
ctx33K
RUNS WELL54/100~122.2 tok/s
ctx33K
Qwen1.5-1.8B-Chat1.8B
Chat
RUNS WELL54/100~122.2 tok/s
ctx33K
RUNS WELL54/100~122.2 tok/s
ctx33K
gemma-3n-E2B-it4B
Multimodal
RUNS WELL54/100~55 tok/s
ctx31K
RUNS WELL54/100~55 tok/s
31K124K
Qwen3-14B-MLX-8bit4.2B
General
RUNS WELL54/100~52.4 tok/s
ctx29K
RUNS WELL54/100~52.4 tok/s
29K41K
Qwen3-4B-SafeRL4.4B
General
RUNS WELL54/100~50 tok/s
ctx27K
RUNS WELL54/100~50 tok/s
27K41K
Qwen3-4B-FP84.4B
General
RUNS WELL54/100~50 tok/s
ctx27K
RUNS WELL54/100~50 tok/s
27K41K
Nemotron-H-4B-Base-8K4.5B
General
RUNS WELL54/100~48.9 tok/s
ctx8K
RUNS WELL54/100~48.9 tok/s
ctx8K
Qwen2.5-Coder-32B-Instruct-MLX-4bit5.1B
Coding
RUNS WELL54/100~43.1 tok/s
ctx21K
RUNS WELL54/100~43.1 tok/s
21K33K
Qwen2.5-Coder-1.5B1.5B
Coding
RUNS WELL53/100~146.7 tok/s
ctx111K
RUNS WELL53/100~146.7 tok/s
111K128K
gemma-2-2b-it2B
Chat
RUNS WELL53/100~110 tok/s
ctx8K
RUNS WELL53/100~110 tok/s
ctx8K
granite-3.1-2b-instruct2B
General
RUNS WELL53/100~110 tok/s
ctx79K
RUNS WELL53/100~110 tok/s
79K128K
nomic-embed-text-v1.50.14B
Embedding
RUNS WELL53/100~1571.4 tok/s
ctx8K
RUNS WELL53/100~1571.4 tok/s
ctx8K
pythia-1.4b1.5B
General
RUNS WELL53/100~146.7 tok/s
ctx2K
RUNS WELL53/100~146.7 tok/s
ctx2K
Qwen2.5-Coder-1.5B-Instruct1.5B
Coding
RUNS WELL53/100~146.7 tok/s
ctx33K
RUNS WELL53/100~146.7 tok/s
ctx33K
Minnow-Math-1.5B1.6B
General
RUNS WELL53/100~137.5 tok/s
ctx4K
RUNS WELL54/100~137.5 tok/s
ctx4K
QVikhr-3-1.7B-Instruction-noreasoning1.7B
Reasoning
RUNS WELL53/100~129.4 tok/s
ctx41K
RUNS WELL53/100~129.4 tok/s
ctx41K
bloom-1b71.7B
General
RUNS WELL53/100~129.4 tok/s
ctx4K
RUNS WELL53/100~129.4 tok/s
ctx4K
DeepSeek-R1-Distill-Qwen-1.5B1.8B
Reasoning
RUNS WELL53/100~122.2 tok/s
ctx90K
RUNS WELL53/100~122.2 tok/s
90K131K
Qwen3-1.7B2B
General
RUNS WELL53/100~110 tok/s
ctx41K
RUNS WELL53/100~110 tok/s
ctx41K
Qwen3-1.7B-FP82B
General
RUNS WELL53/100~110 tok/s
ctx41K
RUNS WELL53/100~110 tok/s
ctx41K
Phi-4-reasoning-plus-MLX-4bit2.3B
Reasoning
RUNS WELL53/100~95.7 tok/s
ctx33K
RUNS WELL53/100~95.7 tok/s
ctx33K
DeepSeek-R1-0528-Qwen3-8B-MLX-8bit2.3B
Reasoning
RUNS WELL53/100~95.7 tok/s
ctx66K
RUNS WELL53/100~95.7 tok/s
66K131K
HTML-Pruner-Phi-3.8B3.8B
General
RUNS WELL53/100~57.9 tok/s
ctx34K
RUNS WELL53/100~57.9 tok/s
34K131K
Nanbeige4.1-3B3.9B
General
RUNS WELL53/100~56.4 tok/s
ctx32K
RUNS WELL53/100~56.4 tok/s
32K129K
Qwen3-4B-Base4B
General
RUNS WELL53/100~55 tok/s
ctx31K
RUNS WELL53/100~55 tok/s
31K33K
Qwen3-4B-Thinking-25074B
General
RUNS WELL53/100~55 tok/s
ctx31K
RUNS WELL53/100~55 tok/s
31K124K
Qwen3-4B-AWQ4B
General
RUNS WELL53/100~55 tok/s
ctx31K
RUNS WELL53/100~55 tok/s
31K41K
Jan-v1-4B4B
General
RUNS WELL53/100~55 tok/s
ctx31K
RUNS WELL53/100~55 tok/s
31K124K
Jan-nano-128k4B
General
RUNS WELL53/100~55 tok/s
ctx31K
RUNS WELL53/100~55 tok/s
31K124K
VLM2Vec-Full4.1B
General
RUNS WELL53/100~53.7 tok/s
ctx30K
RUNS WELL54/100~53.7 tok/s
30K119K
Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit5.3BMoE
Coding
RUNS WELL53/100~33.2 tok/s
ctx19K
RUNS WELL53/100~33.2 tok/s
19K77K
Llama-4-Scout17BMoE
General
RUNS WELL52/100~10.4 tok/s
ctx47K
RUNS WELL52/100~10.4 tok/s
47K188K
Mixtral-8x7B-Instruct46.7BMoE
General
RUNS WELL52/100~3.7 tok/s
ctx2K
RUNS WELL55/100~3.7 tok/s
2K7K
granite-3.0-2b-instruct2B
General
RUNS WELL52/100~110 tok/s
ctx4K
RUNS WELL52/100~110 tok/s
ctx4K
Phi-4-mini-reasoning-MLX-4bit0.6B
Reasoning
RUNS WELL52/100~366.7 tok/s
ctx131K
RUNS WELL52/100~366.7 tok/s
ctx131K
SmolLM-1.7B1.7B
General
RUNS WELL52/100~129.4 tok/s
ctx2K
RUNS WELL52/100~129.4 tok/s
ctx2K
Qwen2.5-Coder-1.5B-Instruct-AWQ1.8B
Coding
RUNS WELL52/100~122.2 tok/s
ctx33K
RUNS WELL52/100~122.2 tok/s
ctx33K
Qwen3.5-2B2.3B
General
RUNS WELL52/100~95.7 tok/s
ctx66K
RUNS WELL52/100~95.7 tok/s
66K262K
Qwen3.5-2B-Base2.3B
General
RUNS WELL52/100~95.7 tok/s
ctx66K
RUNS WELL52/100~95.7 tok/s
66K262K
Qwen3-8B-MLX-8bit2.3B
General
RUNS WELL52/100~95.7 tok/s
ctx41K
RUNS WELL52/100~95.7 tok/s
ctx41K
Qwen3-14B-MLX-4bit2.3B
General
RUNS WELL52/100~95.7 tok/s
ctx41K
RUNS WELL52/100~95.7 tok/s
ctx41K
LFM2-2.6B2.6B
General
RUNS WELL52/100~84.6 tok/s
ctx57K
RUNS WELL52/100~84.6 tok/s
57K128K
LFM2-2.6B-Exp2.6B
General
RUNS WELL52/100~84.6 tok/s
ctx57K
RUNS WELL52/100~84.6 tok/s
57K128K
LFM2-2.6B-Transcript2.6B
General
RUNS WELL52/100~84.6 tok/s
ctx57K
RUNS WELL52/100~84.6 tok/s
57K128K
T-lite-it-1.0_Q4_02.9B
General
RUNS WELL52/100~75.9 tok/s
ctx33K
RUNS WELL52/100~75.9 tok/s
ctx33K
LFM2-VL-3B3B
General
RUNS WELL52/100~73.3 tok/s
ctx47K
RUNS WELL52/100~73.3 tok/s
47K128K
SmolLM3-3B3.1B
General
RUNS WELL52/100~71 tok/s
ctx45K
RUNS WELL52/100~71 tok/s
45K66K
SmolLM3-3B-Base3.1B
General
RUNS WELL52/100~71 tok/s
ctx45K
RUNS WELL52/100~71 tok/s
45K66K
Qwen2.5-3B3.1B
General
RUNS WELL52/100~71 tok/s
ctx33K
RUNS WELL52/100~71 tok/s
ctx33K
xLAM-2-3b-fc-r3.1B
General
RUNS WELL52/100~71 tok/s
ctx33K
RUNS WELL52/100~71 tok/s
ctx33K
Hermes-3-Llama-3.2-3B3.2B
General
RUNS WELL52/100~68.8 tok/s
ctx43K
RUNS WELL52/100~68.8 tok/s
43K131K
Qwen2.5-Coder-14B-Instruct-MLX-8bit4.2B
Coding
RUNS WELL52/100~52.4 tok/s
ctx29K
RUNS WELL52/100~52.4 tok/s
29K33K
xflux_text_encoders4.8B
Coding
RUNS WELL52/100~45.8 tok/s
ctx4K
RUNS WELL52/100~45.8 tok/s
ctx4K
bge-large-en-v1.50.34B
Embedding
RUNS WELL51/100~647.1 tok/s
ctx512
RUNS WELL51/100~647.1 tok/s
ctx512
h2ovl-mississippi-2b2.2B
General
RUNS WELL51/100~100 tok/s
ctx4K
RUNS WELL51/100~100 tok/s
ctx4K
EXAONE-3.5-2.4B-Instruct2.4B
Chat
RUNS WELL51/100~91.7 tok/s
ctx33K
RUNS WELL51/100~91.7 tok/s
ctx33K
gemma-1.1-2b-it2.5B
General
RUNS WELL51/100~88 tok/s
ctx4K
RUNS WELL51/100~88 tok/s
ctx4K
gemma-2-2b-jpn-it2.6B
General
RUNS WELL51/100~84.6 tok/s
ctx4K
RUNS WELL50/100~84.6 tok/s
ctx4K
stablelm-3b-4e1t2.8B
General
RUNS WELL51/100~78.6 tok/s
ctx4K
RUNS WELL51/100~78.6 tok/s
ctx4K
granite-4.0-h-micro3.2BMoE
General
RUNS WELL51/100~55 tok/s
ctx43K
RUNS WELL51/100~55 tok/s
43K131K
Llama-3.2-3B3.2B
General
RUNS WELL51/100~68.8 tok/s
ctx4K
RUNS WELL51/100~68.8 tok/s
ctx4K
PowerLM-3b3.5B
General
RUNS WELL51/100~62.9 tok/s
ctx4K
RUNS WELL51/100~62.9 tok/s
ctx4K
Qwen3-Coder-30B-A3B-Instruct-AWQ4.6BMoE
Coding
RUNS WELL51/100~38.3 tok/s
ctx25K
RUNS WELL51/100~38.3 tok/s
25K99K
Qwen2.5-VL-7B-Instruct-NVFP45B
Chat
RUNS WELL51/100~44 tok/s
ctx21K
RUNS WELL50/100~44 tok/s
21K86K
Phi-4-multimodal-instruct5.6B
Chat
RUNS WELL51/100~39.3 tok/s
ctx17K
RUNS WELL52/100~39.3 tok/s
17K69K
Llama-3.2-3B-Instruct3B
Chat
RUNS WELL50/100~73.3 tok/s
ctx47K
RUNS WELL50/100~73.3 tok/s
47K128K
Qwen2.5-3B-Instruct3B
Chat
RUNS WELL50/100~73.3 tok/s
ctx47K
RUNS WELL50/100~73.3 tok/s
47K128K
gemma-3-4b-it4B
Chat
RUNS WELL50/100~55 tok/s
ctx31K
RUNS WELL50/100~55 tok/s
31K124K
Phi-3.5-mini-instruct3.8B
Chat
RUNS WELL50/100~57.9 tok/s
ctx34K
RUNS WELL50/100~57.9 tok/s
34K128K
Phi-4-mini3.8B
Chat
RUNS WELL50/100~57.9 tok/s
ctx34K
RUNS WELL50/100~57.9 tok/s
34K128K
DeepSeek-Coder-V2-16B16BMoE
Coding
RUNS WELL50/100~11 tok/s
ctx63K
RUNS WELL50/100~11 tok/s
63K128K
StarCoder2-3B3B
Coding
RUNS WELL50/100~73.3 tok/s
ctx16K
RUNS WELL50/100~73.3 tok/s
ctx16K
Qwen2.5-Coder-14B-Instruct-MLX-4bit2.3B
Coding
RUNS WELL50/100~95.7 tok/s
ctx33K
RUNS WELL50/100~95.7 tok/s
ctx33K
gpt-neo-2.7B2.7B
General
RUNS WELL50/100~81.5 tok/s
ctx2K
RUNS WELL50/100~81.5 tok/s
ctx2K
phi-22.8B
General
RUNS WELL50/100~78.6 tok/s
ctx2K
RUNS WELL49/100~78.6 tok/s
ctx2K
pythia-2.8b2.9B
General
RUNS WELL50/100~75.9 tok/s
ctx2K
RUNS WELL50/100~75.9 tok/s
ctx2K
starcoder2-3b3B
Coding
RUNS WELL50/100~73.3 tok/s
ctx16K
RUNS WELL50/100~73.3 tok/s
ctx16K
Qwen2.5-Coder-3B-Instruct3.1B
Coding
RUNS WELL50/100~71 tok/s
ctx33K
RUNS WELL50/100~71 tok/s
ctx33K
Qwen2.5-Coder-3B3.1B
Coding
RUNS WELL50/100~71 tok/s
ctx33K
RUNS WELL50/100~71 tok/s
ctx33K
Qwen2.5-VL-3B-Instruct3.8B
Chat
RUNS WELL50/100~57.9 tok/s
ctx34K
RUNS WELL50/100~57.9 tok/s
34K128K
Qwen3-4B-Instruct-25074B
Chat
RUNS WELL50/100~55 tok/s
ctx31K
RUNS WELL50/100~55 tok/s
31K124K
Qwen3-4B-Instruct-2507-GPTQ-Int44B
Chat
RUNS WELL50/100~55 tok/s
ctx31K
RUNS WELL50/100~55 tok/s
31K124K
Qwen3-4B-Instruct-2507-FP84.4B
Chat
RUNS WELL50/100~50 tok/s
ctx27K
RUNS WELL50/100~50 tok/s
27K107K
Nemotron-H-4B-Instruct-128K4.5B
Chat
RUNS WELL50/100~48.9 tok/s
ctx26K
RUNS WELL50/100~48.9 tok/s
26K103K
Qwen3-30B-A3B-Instruct-2507-AWQ-4bit5.3BMoE
Chat
RUNS WELL50/100~33.2 tok/s
ctx19K
RUNS WELL50/100~33.2 tok/s
19K77K
granite-4.0-h-tiny-AWQ-4bit2BMoE
Chat
RUNS WELL49/100~88 tok/s
ctx79K
RUNS WELL49/100~88 tok/s
79K131K
PowerMoE-3b3.4BMoE
General
RUNS WELL49/100~51.8 tok/s
ctx4K
RUNS WELL50/100~51.8 tok/s
ctx4K
Qwen2.5-3B-Instruct-AWQ3.4B
Chat
RUNS WELL49/100~64.7 tok/s
ctx33K
RUNS WELL49/100~64.7 tok/s
ctx33K
granite-3b-code-base-2k3.5B
Coding
RUNS WELL49/100~62.9 tok/s
ctx2K
RUNS WELL49/100~62.9 tok/s
ctx2K
Llama-3.2-3B-Instruct-FP83.6B
Chat
RUNS WELL49/100~61.1 tok/s
ctx36K
RUNS WELL50/100~61.1 tok/s
36K131K
Qwen3-32B32B
Reasoning
DECENT48/100~2.9 tok/s
ctx
DECENT48/100~2.9 tok/s
ctx
Phi-3-mini-4k3.8B
Chat
RUNS WELL48/100~57.9 tok/s
ctx4K
RUNS WELL48/100~57.9 tok/s
ctx4K
DeepSeek-R1-32B32B
Reasoning
DECENT48/100~2.9 tok/s
ctx
DECENT48/100~2.9 tok/s
ctx
phi-3-mini-4k-instruct3.8B
Chat
RUNS WELL48/100~57.9 tok/s
ctx4K
RUNS WELL48/100~57.9 tok/s
ctx4K
Phi-3-mini-4k-instruct-AWQ3.8B
Chat
RUNS WELL48/100~57.9 tok/s
ctx4K
RUNS WELL48/100~57.9 tok/s
ctx4K
Phi-3-mini-4k-instruct-gptq-4bit3.8B
Chat
RUNS WELL48/100~57.9 tok/s
ctx4K
RUNS WELL48/100~57.9 tok/s
ctx4K
Qwen3-30B-A3B-Instruct-2507-AWQ4.6BMoE
Chat
RUNS WELL48/100~38.3 tok/s
ctx25K
RUNS WELL48/100~38.3 tok/s
25K99K
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled27.8B
Reasoning
DECENT48/100~3.6 tok/s
ctx
DECENT48/100~3.6 tok/s
ctx
DeepSeek-R1-Distill-Qwen-32B32.8B
Reasoning
DECENT48/100~2.9 tok/s
ctx
DECENT49/100~2.9 tok/s
ctx
OpenReasoning-Nemotron-32B32.8B
Reasoning
DECENT48/100~2.9 tok/s
ctx
DECENT49/100~2.9 tok/s
ctx
Phi-tiny-MoE-instruct3.8BMoE
Chat
RUNS WELL46/100~46.3 tok/s
ctx4K
RUNS WELL46/100~46.3 tok/s
ctx4K
DeepSeek-R1-Distill-Qwen-14B14.8B
Reasoning
DECENT46/100~7.4 tok/s
ctx
DECENT46/100~7.4 tok/s
ctx
Qwen3-14B14B
Reasoning
DECENT45/100~7.9 tok/s
ctx
DECENT45/100~7.9 tok/s
ctx
Phi-414B
Reasoning
DECENT45/100~7.9 tok/s
ctx
DECENT45/100~7.9 tok/s
ctx
DeepSeek-R1-14B14B
Reasoning
DECENT45/100~7.9 tok/s
ctx
DECENT45/100~7.9 tok/s
ctx
Orca-2-13B13B
Reasoning
DECENT45/100~8.5 tok/s
ctx
DECENT45/100~8.5 tok/s
ctx
Phi-4-reasoning14B
Reasoning
DECENT45/100~7.9 tok/s
ctx
DECENT45/100~7.9 tok/s
ctx
HyperCLOVAX-SEED-Omni-8B10.7B
General
RUNS WELL43/100~20.6 tok/s
ctx943
RUNS WELL46/100~20.6 tok/s
9434K
Llama-3.2-11B-Vision11B
Multimodal
DECENT40/100~10 tok/s
ctx454
DECENT42/100~10 tok/s
4542K
Llama-3.2-11B-Vision-Instruct10.7B
Chat
RUNS WELL37/100~20.6 tok/s
ctx943
RUNS WELL39/100~20.6 tok/s
9434K
SOLAR-10.7B-Instruct-v1.010.7B
Chat
RUNS WELL37/100~20.6 tok/s
ctx943
RUNS WELL39/100~20.6 tok/s
9434K
CodeLlama-34B-Instruct34B
Coding
DECENT36/100~2.8 tok/s
ctx
DECENT35/100~2.8 tok/s
ctx
CodeLlama-34b-Instruct-hf33.7B
Coding
DECENT36/100~2.8 tok/s
ctx
DECENT35/100~2.8 tok/s
ctx
Qwen2.5-Coder-32B32B
Coding
DECENT35/100~2.9 tok/s
ctx
DECENT35/100~2.9 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-MLX-4bit30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-MLX-5bit30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-MLX-8bit30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-MLX-6bit30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-gptq-4bit30.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-FP830.5BMoE
Coding
DECENT35/100~2.5 tok/s
ctx
DECENT35/100~2.5 tok/s
ctx
Qwen2.5-Coder-32B-Instruct32.8B
Coding
DECENT35/100~2.9 tok/s
ctx
DECENT35/100~2.9 tok/s
ctx
Qwen2.5-Coder-32B-Instruct-AWQ32.8B
Coding
DECENT35/100~2.9 tok/s
ctx
DECENT35/100~2.9 tok/s
ctx
StarCoder2-15B15B
Coding
DECENT34/100~7.3 tok/s
ctx
DECENT34/100~7.3 tok/s
ctx
Qwen3-Coder-Next-AWQ-4bit14.4B
Coding
DECENT34/100~7.6 tok/s
ctx
DECENT34/100~7.6 tok/s
ctx
Qwen2.5-Coder-14B-Instruct14.8B
Coding
DECENT34/100~7.4 tok/s
ctx
DECENT34/100~7.4 tok/s
ctx
Qwen2.5-Coder-14B-Instruct-AWQ14.8B
Coding
DECENT34/100~7.4 tok/s
ctx
DECENT34/100~7.4 tok/s
ctx
WizardCoder-15B-V1.015.5B
Coding
DECENT34/100~7.1 tok/s
ctx
DECENT34/100~7.1 tok/s
ctx
Qwen3-Coder-30B-A3B-Instruct-FP415.6BMoE
Coding
DECENT34/100~5.6 tok/s
ctx
DECENT34/100~5.6 tok/s
ctx
starcoder2-15b15.7B
Coding
DECENT34/100~7 tok/s
ctx
DECENT34/100~7 tok/s
ctx
DeepSeek-Coder-V2-Lite-Instruct15.7BMoE
Coding
DECENT34/100~5.6 tok/s
ctx
DECENT34/100~5.6 tok/s
ctx
DeepSeek-Coder-V2-Lite-Instruct-FP815.7BMoE
Coding
DECENT34/100~5.6 tok/s
ctx
DECENT34/100~5.6 tok/s
ctx
Qwen2.5-Coder-14B14B
Coding
DECENT33/100~7.9 tok/s
ctx
DECENT33/100~7.9 tok/s
ctx
CodeLlama-13B-Instruct13B
Coding
DECENT33/100~8.5 tok/s
ctx
DECENT33/100~8.5 tok/s
ctx
CodeLlama-13b-Instruct-hf13B
Coding
DECENT33/100~8.5 tok/s
ctx
DECENT33/100~8.5 tok/s
ctx
xLAM-8x7b-r46.7BMoE
General
DECENT32/100~1.5 tok/s
ctx
DECENT32/100~1.5 tok/s
ctx
Nous-Hermes-2-Mixtral-8x7B-DPO46.7BMoE
General
DECENT32/100~1.5 tok/s
ctx
DECENT32/100~1.5 tok/s
ctx
Llama-3_3-Nemotron-Super-49B-v1_549.9B
General
DECENT32/100~1.7 tok/s
ctx
DECENT32/100~1.7 tok/s
ctx
Llama-3_3-Nemotron-Super-49B-v1_5-FP849.9B
General
DECENT32/100~1.7 tok/s
ctx
DECENT32/100~1.7 tok/s
ctx
Llama-3_3-Nemotron-Super-49B-v149.9B
General
DECENT32/100~1.7 tok/s
ctx
DECENT32/100~1.7 tok/s
ctx
Mistral-Small-24B24B
General
DECENT31/100~4.3 tok/s
ctx
DECENT31/100~4.3 tok/s
ctx
Qwen2.5-32B-Instruct32B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
gemma-2-27b-it27B
General
DECENT31/100~3.7 tok/s
ctx
DECENT31/100~3.7 tok/s
ctx
gemma-3-27b-it27B
General
DECENT31/100~3.7 tok/s
ctx
DECENT31/100~3.7 tok/s
ctx
Command-R35B
General
DECENT31/100~2.7 tok/s
ctx
DECENT31/100~2.7 tok/s
ctx
Falcon-40B-Instruct40B
General
DECENT31/100~2.1 tok/s
ctx
DECENT31/100~2.1 tok/s
ctx
t5gemma-9b-9b-ul220.3B
General
DECENT31/100~5.3 tok/s
ctx
DECENT30/100~5.3 tok/s
ctx
gpt-oss-20b21.5B
General
DECENT31/100~5 tok/s
ctx
DECENT31/100~5 tok/s
ctx
ERNIE-4.5-21B-A3B-MLX-4bit21.8BMoE
General
DECENT31/100~3.9 tok/s
ctx
DECENT31/100~3.9 tok/s
ctx
ERNIE-4.5-21B-A3B-MLX-6bit21.8BMoE
General
DECENT31/100~3.9 tok/s
ctx
DECENT31/100~3.9 tok/s
ctx
ERNIE-4.5-21B-A3B-MLX-8bit21.8BMoE
General
DECENT31/100~3.9 tok/s
ctx
DECENT31/100~3.9 tok/s
ctx
LFM2-24B-A2B-MLX-4bit23.8BMoE
General
DECENT31/100~3.5 tok/s
ctx
DECENT31/100~3.5 tok/s
ctx
LFM2-24B-A2B-MLX-6bit23.8BMoE
General
DECENT31/100~3.5 tok/s
ctx
DECENT31/100~3.5 tok/s
ctx
LFM2-24B-A2B-MLX-8bit23.8BMoE
General
DECENT31/100~3.5 tok/s
ctx
DECENT31/100~3.5 tok/s
ctx
LFM2-24B-A2B-MLX-5bit23.8BMoE
General
DECENT31/100~3.5 tok/s
ctx
DECENT31/100~3.5 tok/s
ctx
LFM2-24B-A2B23.8BMoE
General
DECENT31/100~3.5 tok/s
ctx
DECENT31/100~3.5 tok/s
ctx
Qwen3.5-27B27.8B
General
DECENT31/100~3.6 tok/s
ctx
DECENT31/100~3.6 tok/s
ctx
GLM-4.7-Flash-MLX-8bit29.9BMoE
General
DECENT31/100~2.6 tok/s
ctx
DECENT31/100~2.6 tok/s
ctx
GLM-4.7-Flash-MLX-6bit29.9BMoE
General
DECENT31/100~2.6 tok/s
ctx
DECENT31/100~2.6 tok/s
ctx
Qwen3-30B-A3B-Thinking-250730.5BMoE
General
DECENT31/100~2.5 tok/s
ctx
DECENT30/100~2.5 tok/s
ctx
Qwen3-30B-A3B-GPTQ-Int430.5BMoE
General
DECENT31/100~2.5 tok/s
ctx
DECENT30/100~2.5 tok/s
ctx
Qwen3-30B-A3B-Base30.5BMoE
General
DECENT31/100~2.5 tok/s
ctx
DECENT30/100~2.5 tok/s
ctx
Qwen3-30B-A3B-AWQ30.5BMoE
General
DECENT31/100~2.5 tok/s
ctx
DECENT30/100~2.5 tok/s
ctx
GLM-4.7-Flash31.2BMoE
General
DECENT31/100~2.4 tok/s
ctx
DECENT31/100~2.4 tok/s
ctx
GLM-4.7-Flash-AWQ31.2BMoE
General
DECENT31/100~2.4 tok/s
ctx
DECENT31/100~2.4 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-4bit31.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8bit31.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-6bit31.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-5bit31.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-BF1631.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF1631.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-FP831.6B
General
DECENT31/100~3 tok/s
ctx
DECENT31/100~3 tok/s
ctx
EXAONE-4.0-32B32B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
EXAONE-4.0.1-32B32B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
EXAONE-4.0-32B-FP832B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
sarvam-30b32.2BMoE
General
DECENT31/100~2.3 tok/s
ctx
DECENT31/100~2.3 tok/s
ctx
Olmo-3-1125-32B32.2B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
Qwen3-32B-AWQ32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
Qwen2.5-32B32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
QwQ-32B-AWQ32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
Baichuan-M2-32B32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
QwQ-32B32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
xLAM-2-32b-fc-r32.8B
General
DECENT31/100~2.9 tok/s
ctx
DECENT31/100~2.9 tok/s
ctx
HyperCLOVAX-SEED-Think-32B33.3B
General
DECENT31/100~2.8 tok/s
ctx
DECENT31/100~2.8 tok/s
ctx
dolphin-2.9.1-yi-1.5-34b34.4B
General
DECENT31/100~2.7 tok/s
ctx
DECENT31/100~2.7 tok/s
ctx
c4ai-command-r-v0135B
General
DECENT31/100~2.7 tok/s
ctx
DECENT31/100~2.7 tok/s
ctx
Qwen3.5-35B-A3B36BMoE
General
DECENT31/100~2.1 tok/s
ctx
DECENT31/100~2.1 tok/s
ctx
Bielik-11B-v3.0-Instruct11.2B
Chat
DECENT30/100~9.8 tok/s
ctx142
DECENT32/100~9.8 tok/s
142568
Qwen3-Next-80B-A3B-Thinking-AWQ-4bit14.7B
General
DECENT30/100~7.5 tok/s
ctx
DECENT30/100~7.5 tok/s
ctx
HyperCLOVAX-SEED-Think-14B-GPTQ14.7B
General
DECENT30/100~7.5 tok/s
ctx
DECENT30/100~7.5 tok/s
ctx
Qwen3-14B-AWQ14.8B
General
DECENT30/100~7.4 tok/s
ctx
DECENT30/100~7.4 tok/s
ctx
Qwen3-14B-Base14.8B
General
DECENT30/100~7.4 tok/s
ctx
DECENT30/100~7.4 tok/s
ctx
Qwen2.5-14B14.8B
General
DECENT30/100~7.4 tok/s
ctx
DECENT30/100~7.4 tok/s
ctx
Qwen3-30B-A3B-NVFP415.6BMoE
General
DECENT30/100~5.6 tok/s
ctx
DECENT30/100~5.6 tok/s
ctx
DeepSeek-V2-Lite15.7BMoE
General
DECENT30/100~5.6 tok/s
ctx
DECENT30/100~5.6 tok/s
ctx
Moonlight-16B-A3B16BMoE
General
DECENT30/100~5.5 tok/s
ctx
DECENT30/100~5.5 tok/s
ctx
deepseek-moe-16b-base16.4B
General
DECENT30/100~6.7 tok/s
ctx
DECENT30/100~6.7 tok/s
ctx
Ling-lite16.8BMoE
General
DECENT30/100~5.2 tok/s
ctx
DECENT30/100~5.2 tok/s
ctx
Qwen3-32B-NVFP417.2B
General
DECENT30/100~6.2 tok/s
ctx
DECENT30/100~6.2 tok/s
ctx
NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP418.2B
General
DECENT30/100~5.9 tok/s
ctx
DECENT30/100~5.9 tok/s
ctx
Mistral-Nemo-12B12B
General
DECENT29/100~9.2 tok/s
ctx
DECENT29/100~9.2 tok/s
ctx
Qwen2.5-14B-Instruct14B
General
DECENT29/100~7.9 tok/s
ctx
DECENT29/100~7.9 tok/s
ctx
gemma-3-12b-it12B
General
DECENT29/100~9.2 tok/s
ctx
DECENT29/100~9.2 tok/s
ctx
Phi-3-medium-128k14B
General
DECENT29/100~7.9 tok/s
ctx
DECENT29/100~7.9 tok/s
ctx
OLMo-2-13B-Instruct13B
General
DECENT29/100~8.5 tok/s
ctx
DECENT29/100~8.5 tok/s
ctx
pythia-12b12B
General
DECENT29/100~9.2 tok/s
ctx
DECENT29/100~9.2 tok/s
ctx
Orca-2-13b13B
General
DECENT29/100~8.5 tok/s
ctx
DECENT29/100~8.5 tok/s
ctx
HarmBench-Llama-2-13b-cls13B
General
DECENT29/100~8.5 tok/s
ctx
DECENT29/100~8.5 tok/s
ctx
llm-jp-3.1-13b13.7B
General
DECENT29/100~8 tok/s
ctx
DECENT29/100~8 tok/s
ctx
phi-414B
General
DECENT29/100~7.9 tok/s
ctx
DECENT29/100~7.9 tok/s
ctx
Phi-3-medium-14b-instruct14B
General
DECENT29/100~7.9 tok/s
ctx
DECENT29/100~7.9 tok/s
ctx
Qwen1.5-MoE-A2.7B14.3BMoE
General
DECENT29/100~6.2 tok/s
ctx
DECENT29/100~6.2 tok/s
ctx
Yi-34B-Chat34.4B
Chat
DECENT23/100~2.7 tok/s
ctx
DECENT22/100~2.7 tok/s
ctx
Seed-OSS-36B-Instruct-MLX-8bit36.2BMoE
Chat
DECENT23/100~2.1 tok/s
ctx
DECENT23/100~2.1 tok/s
ctx
Seed-OSS-36B-Instruct-MLX-4bit36.2BMoE
Chat
DECENT23/100~2.1 tok/s
ctx
DECENT23/100~2.1 tok/s
ctx
Seed-OSS-36B-Instruct-MLX-5bit36.2BMoE
Chat
DECENT23/100~2.1 tok/s
ctx
DECENT23/100~2.1 tok/s
ctx
Seed-OSS-36B-Instruct-MLX-6bit36.2BMoE
Chat
DECENT23/100~2.1 tok/s
ctx
DECENT23/100~2.1 tok/s
ctx
MiniMax-M2.5-AWQ-4bit36.8BMoE
Chat
DECENT23/100~2 tok/s
ctx
DECENT23/100~2 tok/s
ctx
Mixtral-8x7B-Instruct-v0.146.7BMoE
Chat
DECENT23/100~1.5 tok/s
ctx
DECENT23/100~1.5 tok/s
ctx
Kimi-Linear-48B-A3B-Instruct49.1B
Chat
DECENT23/100~1.7 tok/s
ctx
DECENT23/100~1.7 tok/s
ctx
vicuna-13b-v1.513B
Chat
DECENT22/100~8.5 tok/s
ctx
DECENT21/100~8.5 tok/s
ctx
WizardLM-13B-V1.213B
Chat
DECENT22/100~8.5 tok/s
ctx
DECENT21/100~8.5 tok/s
ctx
llm-jp-3.1-13b-instruct413.7B
Chat
DECENT22/100~8 tok/s
ctx
DECENT22/100~8 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-AWQ-4bit14.7B
Chat
DECENT22/100~7.5 tok/s
ctx
DECENT22/100~7.5 tok/s
ctx
Qwen3-14B-Instruct14.8B
Chat
DECENT22/100~7.4 tok/s
ctx
DECENT22/100~7.4 tok/s
ctx
Qwen2.5-14B-Instruct-AWQ14.8B
Chat
DECENT22/100~7.4 tok/s
ctx
DECENT22/100~7.4 tok/s
ctx
Qwen2.5-14B-Instruct-GPTQ-Int414.8B
Chat
DECENT22/100~7.4 tok/s
ctx
DECENT22/100~7.4 tok/s
ctx
Qwen2.5-14B-Instruct-1M14.8B
Chat
DECENT22/100~7.4 tok/s
ctx
DECENT22/100~7.4 tok/s
ctx
Qwen2.5-14B-Instruct-GPTQ-Int814.8B
Chat
DECENT22/100~7.4 tok/s
ctx
DECENT22/100~7.4 tok/s
ctx
Qwen3-30B-A3B-Instruct-2507-FP415.6BMoE
Chat
DECENT22/100~5.6 tok/s
ctx
DECENT22/100~5.6 tok/s
ctx
DeepSeek-V2-Lite-Chat15.7BMoE
Chat
DECENT22/100~5.6 tok/s
ctx
DECENT22/100~5.6 tok/s
ctx
Moonlight-16B-A3B-Instruct16BMoE
Chat
DECENT22/100~5.5 tok/s
ctx
DECENT22/100~5.5 tok/s
ctx
LLaDA2.0-mini16.3BMoE
Chat
DECENT22/100~5.4 tok/s
ctx
DECENT22/100~5.4 tok/s
ctx
LLaDA2.1-mini16.3BMoE
Chat
DECENT22/100~5.4 tok/s
ctx
DECENT22/100~5.4 tok/s
ctx
deepseek-moe-16b-chat16.4B
Chat
DECENT22/100~6.7 tok/s
ctx
DECENT22/100~6.7 tok/s
ctx
Mistral-Small-24B-Instruct-2501-AWQ23.6B
Chat
DECENT22/100~4.4 tok/s
ctx
DECENT22/100~4.4 tok/s
ctx
Mistral-Small-24B-Instruct-2501-FP8-dynamic23.6B
Chat
DECENT22/100~4.4 tok/s
ctx
DECENT22/100~4.4 tok/s
ctx
Mistral-Small-24B-Instruct-250124B
Chat
DECENT22/100~4.3 tok/s
ctx
DECENT22/100~4.3 tok/s
ctx
Qwen3-30B-A3B-Instruct-2507-MLX-4bit30.5BMoE
Chat
DECENT22/100~2.5 tok/s
ctx
DECENT22/100~2.5 tok/s
ctx
Qwen3-30B-A3B-Instruct-2507-MLX-8bit30.5BMoE
Chat
DECENT22/100~2.5 tok/s
ctx
DECENT22/100~2.5 tok/s
ctx
Qwen3-30B-A3B-Instruct-2507-MLX-6bit30.5BMoE
Chat
DECENT22/100~2.5 tok/s
ctx
DECENT22/100~2.5 tok/s
ctx
Qwen3-30B-A3B-Instruct-2507-FP830.5BMoE
Chat
DECENT22/100~2.5 tok/s
ctx
DECENT22/100~2.5 tok/s
ctx
Qwen3-VL-30B-A3B-Instruct-AWQ31.1BMoE
Chat
DECENT22/100~2.4 tok/s
ctx
DECENT22/100~2.4 tok/s
ctx
OLMo-2-0325-32B-Instruct32.2B
Chat
DECENT22/100~2.9 tok/s
ctx
DECENT22/100~2.9 tok/s
ctx
Qwen2.5-32B-Instruct-AWQ32.8B
Chat
DECENT22/100~2.9 tok/s
ctx
DECENT22/100~2.9 tok/s
ctx
Qwen2.5-32B-Instruct-GPTQ-Int432.8B
Chat
DECENT22/100~2.9 tok/s
ctx
DECENT22/100~2.9 tok/s
ctx
Qwen2.5-32B-Instruct-GPTQ-Int832.8B
Chat
DECENT22/100~2.9 tok/s
ctx
DECENT22/100~2.9 tok/s
ctx
MiniMax-M2.5-BF16-INT4-AWQ39.1BMoE
Chat
DECENT22/100~1.8 tok/s
ctx
DECENT22/100~1.8 tok/s
ctx
falcon-40b-instruct40B
Chat
DECENT22/100~2.1 tok/s
ctx
DECENT22/100~2.1 tok/s
ctx
GigaChat3-10B-A1.8B11.5BMoE
Chat
DECENT21/100~7.7 tok/s
ctx
DECENT21/100~7.7 tok/s
ctx
Mistral-Nemo-Instruct-240712.2B
Chat
DECENT21/100~9 tok/s
ctx
DECENT22/100~9 tok/s
ctx
mistral-nemo-instruct-2407-awq12.2B
Chat
DECENT21/100~9 tok/s
ctx
DECENT22/100~9 tok/s
ctx
MiniMax-M2.7230B
Reasoning
TOO HEAVY5/100~0.7 tok/s
ctx
TOO HEAVY5/100~0.7 tok/s
ctx
DeepSeek-R1-0528-NVFP4-v2393.6BMoE
Reasoning
TOO HEAVY5/100~0.3 tok/s
ctx
TOO HEAVY5/100~0.3 tok/s
ctx
DeepSeek-R1-NVFP4396.8BMoE
Reasoning
TOO HEAVY5/100~0.3 tok/s
ctx
TOO HEAVY5/100~0.3 tok/s
ctx
DeepSeek-R1684.5BMoE
Reasoning
TOO HEAVY5/100~0.2 tok/s
ctx
TOO HEAVY5/100~0.2 tok/s
ctx
DeepSeek-R1-0528684.5BMoE
Reasoning
TOO HEAVY5/100~0.2 tok/s
ctx
TOO HEAVY5/100~0.2 tok/s
ctx
DeepSeek-V3.2-Speciale685BMoE
Reasoning
TOO HEAVY5/100~0.2 tok/s
ctx
TOO HEAVY5/100~0.2 tok/s
ctx
DeepSeek-R1-70B70B
Reasoning
TOO HEAVY3/100~2.5 tok/s
ctx
TOO HEAVY3/100~2.5 tok/s
ctx
DeepSeek-R1-Distill-Llama-70B-FP8-dynamic70.6B
Reasoning
TOO HEAVY3/100~2.4 tok/s
ctx
TOO HEAVY3/100~2.4 tok/s
ctx
Llama-3.1-70B-Instruct70B
General
TOO HEAVY0/100~2.5 tok/s
ctx
TOO HEAVY0/100~2.5 tok/s
ctx
Llama-3.1-405B-Instruct405B
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
Llama-3.2-90B-Vision90B
Multimodal
TOO HEAVY0/100~1.9 tok/s
ctx
TOO HEAVY0/100~1.9 tok/s
ctx
Mixtral-8x22B-Instruct141BMoE
General
TOO HEAVY0/100~1 tok/s
ctx
TOO HEAVY0/100~1 tok/s
ctx
Qwen2.5-72B-Instruct72B
General
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2.5-VL-72B72B
Multimodal
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
DeepSeek-V3671BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
Command-R+104B
General
TOO HEAVY0/100~1.7 tok/s
ctx
TOO HEAVY0/100~1.7 tok/s
ctx
Qwen3.5-122B-A10B-NVFP464.4BMoE
General
TOO HEAVY0/100~2.1 tok/s
ctx
TOO HEAVY0/100~2.1 tok/s
ctx
NVIDIA-Nemotron-3-Super-120B-A12B-NVFP467.2B
General
TOO HEAVY0/100~2.6 tok/s
ctx
TOO HEAVY0/100~2.6 tok/s
ctx
Llama-3.3-70B-Instruct70.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
llama-3.3-70b-instruct-awq70.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Llama-3.3-70B-Instruct-AWQ70.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
L3.3-GeneticLemonade-Final-v2-70B70.6B
General
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Meta-Llama-3.3-70B-Instruct-AWQ-INT470.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Meta-Llama-3.1-70B-Instruct-quantized.w4a1670.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Meta-Llama-3-70B-Instruct70.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Llama-3.1-70B70.6B
General
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Llama-3.1-Swallow-70B-Instruct-v0.370.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Meta-Llama-3.1-70B-Instruct-FP870.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Llama-3.3-70B-Instruct-FP8-dynamic70.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
jais-adapted-70b-chat-4bit-bnb71.6B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2.5-72B-Instruct-abliterated72.7B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2.5-72B72.7B
General
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2-72B-Instruct72.7B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2-72B72.7B
General
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2.5-72B-Instruct-AWQ73B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen2.5-72B-Instruct-GPTQ-Int473B
Chat
TOO HEAVY0/100~2.4 tok/s
ctx
TOO HEAVY0/100~2.4 tok/s
ctx
Qwen3-Coder-Next-8bit79.7B
Coding
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-MLX-4bit79.7B
Chat
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-MLX-8bit79.7B
Chat
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-MLX-6bit79.7B
Chat
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-MLX-5bit79.7B
Chat
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Coder-Next79.7B
Coding
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Coder-Next-FP879.7B
Coding
TOO HEAVY0/100~2.2 tok/s
ctx
TOO HEAVY0/100~2.2 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct81.3B
Chat
TOO HEAVY0/100~2.1 tok/s
ctx
TOO HEAVY0/100~2.1 tok/s
ctx
Qwen3-Next-80B-A3B-Instruct-FP881.3B
Chat
TOO HEAVY0/100~2.1 tok/s
ctx
TOO HEAVY0/100~2.1 tok/s
ctx
GLM-4.5-Air110.5BMoE
General
TOO HEAVY0/100~1.2 tok/s
ctx
TOO HEAVY0/100~1.2 tok/s
ctx
Qwen1.5-110B-Chat-AWQ111.2B
Chat
TOO HEAVY0/100~1.5 tok/s
ctx
TOO HEAVY0/100~1.5 tok/s
ctx
gpt-oss-120b-MLX-8bit116.8B
General
TOO HEAVY0/100~1.5 tok/s
ctx
TOO HEAVY0/100~1.5 tok/s
ctx
gpt-oss-120b-heretic116.8B
General
TOO HEAVY0/100~1.5 tok/s
ctx
TOO HEAVY0/100~1.5 tok/s
ctx
gpt-oss-120b120.4B
General
TOO HEAVY0/100~1.4 tok/s
ctx
TOO HEAVY0/100~1.4 tok/s
ctx
XORTRON.CriminalComputing.LARGE.2026.3122.6B
General
TOO HEAVY0/100~1.4 tok/s
ctx
TOO HEAVY0/100~1.4 tok/s
ctx
NVIDIA-Nemotron-3-Super-120B-A12B-FP8123.6B
General
TOO HEAVY0/100~1.4 tok/s
ctx
TOO HEAVY0/100~1.4 tok/s
ctx
NVIDIA-Nemotron-3-Super-120B-A12B-BF16123.6B
General
TOO HEAVY0/100~1.4 tok/s
ctx
TOO HEAVY0/100~1.4 tok/s
ctx
Qwen3.5-122B-A10B125.1BMoE
General
TOO HEAVY0/100~1.1 tok/s
ctx
TOO HEAVY0/100~1.1 tok/s
ctx
Mixtral-8x22B-Instruct-v0.1140.6BMoE
Chat
TOO HEAVY0/100~1 tok/s
ctx
TOO HEAVY0/100~1 tok/s
ctx
dots.llm1.inst142.8BMoE
General
TOO HEAVY0/100~1 tok/s
ctx
TOO HEAVY0/100~1 tok/s
ctx
bloom176.2B
General
TOO HEAVY0/100~1 tok/s
ctx
TOO HEAVY0/100~1 tok/s
ctx
GLM-4.7-NVFP4177.2BMoE
General
TOO HEAVY0/100~0.8 tok/s
ctx
TOO HEAVY0/100~0.8 tok/s
ctx
falcon-180B-chat179.5B
Chat
TOO HEAVY0/100~1 tok/s
ctx
TOO HEAVY0/100~1 tok/s
ctx
Step-3.5-Flash199.4BMoE
General
TOO HEAVY0/100~0.7 tok/s
ctx
TOO HEAVY0/100~0.7 tok/s
ctx
Step-3.5-Flash-FP8199.4BMoE
General
TOO HEAVY0/100~0.7 tok/s
ctx
TOO HEAVY0/100~0.7 tok/s
ctx
MiniMax-M2.5-MLX-8bit228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2.5-MLX-4bit228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2.5-MLX-6bit228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2-AWQ228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2.5-AWQ228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2.5228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
MiniMax-M2.1228.7BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
Qwen3-235B-A22B-Instruct-2507-FP8235.1BMoE
Chat
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
Qwen3-235B-A22B-Thinking-2507-FP8235.1BMoE
General
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
Qwen3-235B-A22B-FP8235.1BMoE
General
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
deepseek-coder-v2-instruct-awq235.7BMoE
Coding
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
DeepSeek-V2.5-1210-FP8235.7BMoE
General
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
K-EXAONE-236B-A23B237.1BMoE
General
TOO HEAVY0/100~0.6 tok/s
ctx
TOO HEAVY0/100~0.6 tok/s
ctx
ERNIE-4.5-300B-A47B-Paddle300.5BMoE
General
TOO HEAVY0/100~0.5 tok/s
ctx
TOO HEAVY0/100~0.5 tok/s
ctx
MiMo-V2-Flash309.8BMoE
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
GLM-4.6356.8BMoE
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
GLM-4.7358.3BMoE
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
GLM-4.5358.3BMoE
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
DeepSeek-V3.2-NVFP4394.5BMoE
General
TOO HEAVY0/100~0.3 tok/s
ctx
TOO HEAVY0/100~0.3 tok/s
ctx
DeepSeek-V3-0324-NVFP4396.8BMoE
General
TOO HEAVY0/100~0.3 tok/s
ctx
TOO HEAVY0/100~0.3 tok/s
ctx
Llama-4-Maverick-17B-128E-Instruct401.6B
Chat
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
Qwen3.5-397B-A17B403.4BMoE
General
TOO HEAVY0/100~0.3 tok/s
ctx
TOO HEAVY0/100~0.3 tok/s
ctx
Llama-3.1-405B405.9B
General
TOO HEAVY0/100~0.4 tok/s
ctx
TOO HEAVY0/100~0.4 tok/s
ctx
Qwen3-Coder-480B-A35B-Instruct480.2BMoE
Coding
TOO HEAVY0/100~0.3 tok/s
ctx
TOO HEAVY0/100~0.3 tok/s
ctx
LongCat-Flash-Chat561.9B
Chat
TOO HEAVY0/100~0.3 tok/s
ctx
TOO HEAVY0/100~0.3 tok/s
ctx
DeepSeek-V3-0324684.5BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
DeepSeek-V3.2-AWQ685BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
DeepSeek-V3.2685.4BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
GLM-5753.9BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
GLM-5-FP8753.9BMoE
General
TOO HEAVY0/100~0.2 tok/s
ctx
TOO HEAVY0/100~0.2 tok/s
ctx
Kimi-K2-Instruct1026.5BMoE
Chat
TOO HEAVY0/100~0.1 tok/s
ctx
TOO HEAVY0/100~0.1 tok/s
ctx
Kimi-K2-Instruct-09051026.5BMoE
Chat
TOO HEAVY0/100~0.1 tok/s
ctx
TOO HEAVY0/100~0.1 tok/s
ctx
Kimi-K2-Thinking1058.1BMoE
General
TOO HEAVY0/100~0.1 tok/s
ctx
TOO HEAVY0/100~0.1 tok/s
ctx
Kimi-K2.51058.6BMoE
General
TOO HEAVY0/100~0.1 tok/s
ctx
TOO HEAVY0/100~0.1 tok/s
ctx
RUNS GREAT
RUNS WELL
DECENT
TOO HEAVY
✦ TurboQuant — 4× KV compression at 32K ctx