RUNYARD.DEV / COMPARE
Compare Devices
Pick two devices · Device B can be wrapped with TurboQuant
Device A
Normal
M5 Max (36 GB VRAM)
M5 Pro (24 GB VRAM)
M5 (16 GB VRAM)
M4 Max (36 GB VRAM)
M4 Pro (24 GB VRAM)
M4 (16 GB VRAM)
M3 Ultra (96 GB VRAM)
M3 Max (36 GB VRAM)
M3 Pro (18 GB VRAM)
M3 (8 GB VRAM)
M2 Ultra (64 GB VRAM)
M2 Max (32 GB VRAM)
M2 Pro (16 GB VRAM)
M2 (8 GB VRAM)
M1 Ultra (64 GB VRAM)
M1 Max (32 GB VRAM)
M1 Pro (16 GB VRAM)
M1 (8 GB VRAM)
RTX 5090 (32 GB VRAM)
RTX 5080 (16 GB VRAM)
RTX 5070 Ti (16 GB VRAM)
RTX 5070 (12 GB VRAM)
RTX 5060 Ti 16GB (16 GB VRAM)
RTX 5060 Ti (8 GB VRAM)
RTX 5060 (8 GB VRAM)
RTX 5050 (8 GB VRAM)
RTX 5090 Laptop (24 GB VRAM)
RTX 5080 Laptop (16 GB VRAM)
RTX 5070 Ti Laptop (12 GB VRAM)
RTX 5070 Laptop (8 GB VRAM)
RTX 5060 Laptop (8 GB VRAM)
RTX 5050 Laptop (8 GB VRAM)
RTX 4090 (24 GB VRAM)
RTX 4080 SUPER (16 GB VRAM)
RTX 4080 (16 GB VRAM)
RTX 4070 Ti SUPER (16 GB VRAM)
RTX 4070 Ti (12 GB VRAM)
RTX 4070 SUPER (12 GB VRAM)
RTX 4070 (12 GB VRAM)
RTX 4060 Ti 16GB (16 GB VRAM)
RTX 4060 Ti (8 GB VRAM)
RTX 4060 (8 GB VRAM)
RTX 4090 Laptop (16 GB VRAM)
RTX 4080 Laptop (12 GB VRAM)
RTX 4070 Laptop (8 GB VRAM)
RTX 4060 Laptop (8 GB VRAM)
RTX 4050 Laptop (6 GB VRAM)
RTX 3090 Ti (24 GB VRAM)
RTX 3090 (24 GB VRAM)
RTX 3080 Ti (12 GB VRAM)
RTX 3080 12GB (12 GB VRAM)
RTX 3080 (10 GB VRAM)
RTX 3070 Ti (8 GB VRAM)
RTX 3070 (8 GB VRAM)
RTX 3060 Ti (8 GB VRAM)
RTX 3060 (12 GB VRAM)
RTX 3050 (8 GB VRAM)
RTX 3080 Ti Laptop (16 GB VRAM)
RTX 3080 Laptop (16 GB VRAM)
RTX 3070 Ti Laptop (8 GB VRAM)
RTX 3070 Laptop (8 GB VRAM)
RTX 3060 Laptop (6 GB VRAM)
RTX 3050 Ti Laptop (4 GB VRAM)
RTX 3050 Laptop (4 GB VRAM)
RTX 3050 6GB (6 GB VRAM)
RTX 2080 Ti (11 GB VRAM)
RTX 2080 SUPER (8 GB VRAM)
RTX 2080 (8 GB VRAM)
RTX 2070 SUPER (8 GB VRAM)
RTX 2070 (8 GB VRAM)
RTX 2060 SUPER (8 GB VRAM)
RTX 2060 (6 GB VRAM)
RTX 2060 12GB (12 GB VRAM)
GTX 1660 Ti (6 GB VRAM)
GTX 1660 SUPER (6 GB VRAM)
GTX 1660 (6 GB VRAM)
GTX 1650 SUPER (4 GB VRAM)
GTX 1650 Ti (4 GB VRAM)
GTX 1650 (4 GB VRAM)
GTX 1630 (4 GB VRAM)
GTX 1080 Ti (11 GB VRAM)
GTX 1080 (8 GB VRAM)
GTX 1070 Ti (8 GB VRAM)
GTX 1070 (8 GB VRAM)
GTX 1060 6GB (6 GB VRAM)
GTX 1060 3GB (3 GB VRAM)
GTX 1050 Ti (4 GB VRAM)
GTX 1050 (2 GB VRAM)
RTX PRO 6000 (96 GB VRAM)
RTX 6000 Ada (48 GB VRAM)
RTX 5000 Ada (32 GB VRAM)
RTX A6000 (48 GB VRAM)
RTX A5000 (24 GB VRAM)
RTX A4000 (16 GB VRAM)
A100 (80 GB VRAM)
H100 (80 GB VRAM)
GH200 (96 GB VRAM)
L40S (48 GB VRAM)
L4 (24 GB VRAM)
T4 (16 GB VRAM)
RX 9070 XT (16 GB VRAM)
RX 9070 (16 GB VRAM)
RX 7900 XTX (24 GB VRAM)
RX 7900 XT (20 GB VRAM)
RX 7800 XT (16 GB VRAM)
RX 7700 XT (12 GB VRAM)
RX 7600 XT (16 GB VRAM)
RX 7600 (8 GB VRAM)
RX 6950 XT (16 GB VRAM)
RX 6900 XT (16 GB VRAM)
RX 6800 XT (16 GB VRAM)
RX 6800 (16 GB VRAM)
RX 6700 XT (12 GB VRAM)
RX 6600 XT (8 GB VRAM)
RX 6600 (8 GB VRAM)
RX 5700 XT (8 GB VRAM)
RX 5700 (8 GB VRAM)
RX 5600 XT (6 GB VRAM)
RX 580 (8 GB VRAM)
RX 570 (4 GB VRAM)
Radeon VII (16 GB VRAM)
Ryzen AI MAX+ 395 (96 GB VRAM)
Radeon 890M (32 GB RAM)
Radeon 780M (32 GB RAM)
Vega 8 (16 GB RAM)
Arc A770 (16 GB VRAM)
Arc A750 (8 GB VRAM)
Arc A580 (8 GB VRAM)
Arc A380 (6 GB VRAM)
Iris Xe (16 GB RAM)
Iris Plus (16 GB RAM)
UHD 770 (32 GB RAM)
UHD Graphics 630 (16 GB RAM)
A18 Pro (8 GB RAM)
A18 (8 GB RAM)
A17 Pro (8 GB RAM)
A16 (6 GB RAM)
Snapdragon X Elite (32 GB RAM)
Snapdragon X Plus (32 GB RAM)
Adreno 840 (12 GB RAM)
Adreno 830 (12 GB RAM)
Adreno 750 (12 GB RAM)
Adreno 740 (12 GB RAM)
Adreno 730 (12 GB RAM)
Google Tensor G4 (12 GB RAM)
Google Tensor G3 (12 GB RAM)
CPU — 128GB (128 GB RAM)
CPU — 64GB (64 GB RAM)
CPU — 32GB (32 GB RAM)
VRAM
0 GB
4 GB
6 GB
8 GB
10 GB
11 GB
12 GB
16 GB
18 GB
20 GB
24 GB
32 GB
36 GB
48 GB
64 GB
80 GB
96 GB
RAM
4 GB
8 GB
12 GB
16 GB
24 GB
32 GB
48 GB
64 GB
96 GB
128 GB
192 GB
256 GB
Cores
2
4
6
8
10
12
14
16
20
24
32
40
48
64
Backend
CUDA
Metal
ROCm
SYCL
CPU ARM
CPU x86
Bandwidth
~504 GB/s
Device B
✦ TurboQuant
TurboQuant
M5 Max (36 GB VRAM)
M5 Pro (24 GB VRAM)
M5 (16 GB VRAM)
M4 Max (36 GB VRAM)
M4 Pro (24 GB VRAM)
M4 (16 GB VRAM)
M3 Ultra (96 GB VRAM)
M3 Max (36 GB VRAM)
M3 Pro (18 GB VRAM)
M3 (8 GB VRAM)
M2 Ultra (64 GB VRAM)
M2 Max (32 GB VRAM)
M2 Pro (16 GB VRAM)
M2 (8 GB VRAM)
M1 Ultra (64 GB VRAM)
M1 Max (32 GB VRAM)
M1 Pro (16 GB VRAM)
M1 (8 GB VRAM)
RTX 5090 (32 GB VRAM)
RTX 5080 (16 GB VRAM)
RTX 5070 Ti (16 GB VRAM)
RTX 5070 (12 GB VRAM)
RTX 5060 Ti 16GB (16 GB VRAM)
RTX 5060 Ti (8 GB VRAM)
RTX 5060 (8 GB VRAM)
RTX 5050 (8 GB VRAM)
RTX 5090 Laptop (24 GB VRAM)
RTX 5080 Laptop (16 GB VRAM)
RTX 5070 Ti Laptop (12 GB VRAM)
RTX 5070 Laptop (8 GB VRAM)
RTX 5060 Laptop (8 GB VRAM)
RTX 5050 Laptop (8 GB VRAM)
RTX 4090 (24 GB VRAM)
RTX 4080 SUPER (16 GB VRAM)
RTX 4080 (16 GB VRAM)
RTX 4070 Ti SUPER (16 GB VRAM)
RTX 4070 Ti (12 GB VRAM)
RTX 4070 SUPER (12 GB VRAM)
RTX 4070 (12 GB VRAM)
RTX 4060 Ti 16GB (16 GB VRAM)
RTX 4060 Ti (8 GB VRAM)
RTX 4060 (8 GB VRAM)
RTX 4090 Laptop (16 GB VRAM)
RTX 4080 Laptop (12 GB VRAM)
RTX 4070 Laptop (8 GB VRAM)
RTX 4060 Laptop (8 GB VRAM)
RTX 4050 Laptop (6 GB VRAM)
RTX 3090 Ti (24 GB VRAM)
RTX 3090 (24 GB VRAM)
RTX 3080 Ti (12 GB VRAM)
RTX 3080 12GB (12 GB VRAM)
RTX 3080 (10 GB VRAM)
RTX 3070 Ti (8 GB VRAM)
RTX 3070 (8 GB VRAM)
RTX 3060 Ti (8 GB VRAM)
RTX 3060 (12 GB VRAM)
RTX 3050 (8 GB VRAM)
RTX 3080 Ti Laptop (16 GB VRAM)
RTX 3080 Laptop (16 GB VRAM)
RTX 3070 Ti Laptop (8 GB VRAM)
RTX 3070 Laptop (8 GB VRAM)
RTX 3060 Laptop (6 GB VRAM)
RTX 3050 Ti Laptop (4 GB VRAM)
RTX 3050 Laptop (4 GB VRAM)
RTX 3050 6GB (6 GB VRAM)
RTX 2080 Ti (11 GB VRAM)
RTX 2080 SUPER (8 GB VRAM)
RTX 2080 (8 GB VRAM)
RTX 2070 SUPER (8 GB VRAM)
RTX 2070 (8 GB VRAM)
RTX 2060 SUPER (8 GB VRAM)
RTX 2060 (6 GB VRAM)
RTX 2060 12GB (12 GB VRAM)
GTX 1660 Ti (6 GB VRAM)
GTX 1660 SUPER (6 GB VRAM)
GTX 1660 (6 GB VRAM)
GTX 1650 SUPER (4 GB VRAM)
GTX 1650 Ti (4 GB VRAM)
GTX 1650 (4 GB VRAM)
GTX 1630 (4 GB VRAM)
GTX 1080 Ti (11 GB VRAM)
GTX 1080 (8 GB VRAM)
GTX 1070 Ti (8 GB VRAM)
GTX 1070 (8 GB VRAM)
GTX 1060 6GB (6 GB VRAM)
GTX 1060 3GB (3 GB VRAM)
GTX 1050 Ti (4 GB VRAM)
GTX 1050 (2 GB VRAM)
RTX PRO 6000 (96 GB VRAM)
RTX 6000 Ada (48 GB VRAM)
RTX 5000 Ada (32 GB VRAM)
RTX A6000 (48 GB VRAM)
RTX A5000 (24 GB VRAM)
RTX A4000 (16 GB VRAM)
A100 (80 GB VRAM)
H100 (80 GB VRAM)
GH200 (96 GB VRAM)
L40S (48 GB VRAM)
L4 (24 GB VRAM)
T4 (16 GB VRAM)
RX 9070 XT (16 GB VRAM)
RX 9070 (16 GB VRAM)
RX 7900 XTX (24 GB VRAM)
RX 7900 XT (20 GB VRAM)
RX 7800 XT (16 GB VRAM)
RX 7700 XT (12 GB VRAM)
RX 7600 XT (16 GB VRAM)
RX 7600 (8 GB VRAM)
RX 6950 XT (16 GB VRAM)
RX 6900 XT (16 GB VRAM)
RX 6800 XT (16 GB VRAM)
RX 6800 (16 GB VRAM)
RX 6700 XT (12 GB VRAM)
RX 6600 XT (8 GB VRAM)
RX 6600 (8 GB VRAM)
RX 5700 XT (8 GB VRAM)
RX 5700 (8 GB VRAM)
RX 5600 XT (6 GB VRAM)
RX 580 (8 GB VRAM)
RX 570 (4 GB VRAM)
Radeon VII (16 GB VRAM)
Ryzen AI MAX+ 395 (96 GB VRAM)
Radeon 890M (32 GB RAM)
Radeon 780M (32 GB RAM)
Vega 8 (16 GB RAM)
Arc A770 (16 GB VRAM)
Arc A750 (8 GB VRAM)
Arc A580 (8 GB VRAM)
Arc A380 (6 GB VRAM)
Iris Xe (16 GB RAM)
Iris Plus (16 GB RAM)
UHD 770 (32 GB RAM)
UHD Graphics 630 (16 GB RAM)
A18 Pro (8 GB RAM)
A18 (8 GB RAM)
A17 Pro (8 GB RAM)
A16 (6 GB RAM)
Snapdragon X Elite (32 GB RAM)
Snapdragon X Plus (32 GB RAM)
Adreno 840 (12 GB RAM)
Adreno 830 (12 GB RAM)
Adreno 750 (12 GB RAM)
Adreno 740 (12 GB RAM)
Adreno 730 (12 GB RAM)
Google Tensor G4 (12 GB RAM)
Google Tensor G3 (12 GB RAM)
CPU — 128GB (128 GB RAM)
CPU — 64GB (64 GB RAM)
CPU — 32GB (32 GB RAM)
VRAM
0 GB
4 GB
6 GB
8 GB
10 GB
11 GB
12 GB
16 GB
18 GB
20 GB
24 GB
32 GB
36 GB
48 GB
64 GB
80 GB
96 GB
RAM
4 GB
8 GB
12 GB
16 GB
24 GB
32 GB
48 GB
64 GB
96 GB
128 GB
192 GB
256 GB
Cores
2
4
6
8
10
12
14
16
20
24
32
40
48
64
Backend
CUDA
Metal
ROCm
SYCL
CPU ARM
CPU x86
Bandwidth
~504 GB/s
Sort
score ↓
speed
size
name
A:
30
Tie:
361
B:
175
All
General
Chat
Coding
Reasoning
Multimodal
Embedding
Model
RTX 4070 · Normal
RTX 4070 · ✦ TQ
✦
gemma-3n-E4B-it
8B
UNLOCKED
Multimodal
RUNS GREAT
73/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
73/100
~27.5 tok/s
7K
→
28K
✦
✦
Qwen3-8B
8B
UNLOCKED
Reasoning
RUNS GREAT
72/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
72/100
~27.5 tok/s
7K
→
28K
✦
✦
DeepSeek-R1-0528-Qwen3-8B
8.2B
UNLOCKED
Reasoning
RUNS GREAT
72/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
73/100
~26.8 tok/s
6K
→
26K
✦
✦
granite-3.1-8b-instruct
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
28K
✦
✦
Qwen-7B
7.7B
UNLOCKED
General
RUNS GREAT
69/100
~28.6 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.6 tok/s
8K
→
32K
✦
✦
Qwen1.5-7B
7.7B
UNLOCKED
General
RUNS GREAT
69/100
~28.6 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.6 tok/s
8K
→
32K
✦
✦
EXAONE-Deep-7.8B
7.8B
UNLOCKED
General
RUNS GREAT
69/100
~28.2 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.2 tok/s
8K
→
30K
✦
✦
MiMo-7B-Base
7.8B
UNLOCKED
General
RUNS GREAT
69/100
~28.2 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.2 tok/s
8K
→
30K
✦
✦
saiga_llama3_8b
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
8K
✦
✦
Hermes-3-Llama-3.1-8B
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
28K
✦
✦
Llama-3.1-Nemotron-Nano-8B-v1
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
28K
✦
✦
Meta-Llama-3.1-8B-FP8
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
28K
✦
✦
llava-onevision-qwen2-7b-ov
8B
UNLOCKED
General
RUNS GREAT
69/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
69/100
~27.5 tok/s
7K
→
28K
✦
✦
gemma-2-9b-it
9B
UNLOCKED
General
RUNS GREAT
68/100
~24.4 tok/s
ctx
4K
✦ RUNS GREAT
69/100
~24.4 tok/s
4K
→
8K
✦
✦
Qwen3-14B-NVFP4
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen3-8B-Base
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen3-8B-AWQ
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen3-8B-FP8
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen3-8B.w8a8
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen3-8B-FP8-dynamic
8.2B
UNLOCKED
General
RUNS GREAT
68/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~26.8 tok/s
6K
→
26K
✦
✦
LFM2-8B-A1B
8.3B
MoE
UNLOCKED
General
RUNS GREAT
68/100
~21.2 tok/s
ctx
6K
✦ RUNS GREAT
69/100
~21.2 tok/s
6K
→
25K
✦
✦
NVIDIA-Nemotron-Nano-9B-v2
8.9B
UNLOCKED
General
RUNS GREAT
68/100
~24.7 tok/s
ctx
5K
✦ RUNS GREAT
69/100
~24.7 tok/s
5K
→
18K
✦
✦
NVIDIA-Nemotron-Nano-9B-v2-Japanese
8.9B
UNLOCKED
General
RUNS GREAT
68/100
~24.7 tok/s
ctx
5K
✦ RUNS GREAT
69/100
~24.7 tok/s
5K
→
18K
✦
✦
NVIDIA-Nemotron-Nano-9B-v2-Base
8.9B
UNLOCKED
General
RUNS GREAT
68/100
~24.7 tok/s
ctx
5K
✦ RUNS GREAT
69/100
~24.7 tok/s
5K
→
18K
✦
✦
NVIDIA-Nemotron-Nano-9B-v2-FP8
8.9B
UNLOCKED
General
RUNS GREAT
68/100
~24.7 tok/s
ctx
5K
✦ RUNS GREAT
69/100
~24.7 tok/s
5K
→
18K
✦
✦
QwQ-32B-MLX-8bit
9.2B
UNLOCKED
General
RUNS GREAT
68/100
~23.9 tok/s
ctx
4K
✦ RUNS GREAT
69/100
~23.9 tok/s
4K
→
15K
✦
✦
glm-4-9b
9.4B
UNLOCKED
General
RUNS GREAT
68/100
~23.4 tok/s
ctx
3K
✦ RUNS GREAT
69/100
~23.4 tok/s
3K
→
8K
✦
✦
Qwen2.5-Coder-32B-Instruct-MLX-8bit
9.2B
UNLOCKED
Coding
RUNS GREAT
67/100
~23.9 tok/s
ctx
4K
✦ RUNS GREAT
68/100
~23.9 tok/s
4K
→
15K
✦
✦
Qwen3-Coder-30B-A3B-Instruct-gptq-8bit
9.3B
MoE
UNLOCKED
Coding
RUNS GREAT
66/100
~18.9 tok/s
ctx
4K
✦ RUNS GREAT
67/100
~18.9 tok/s
4K
→
15K
✦
✦
Llama-3.1-8B-Instruct
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Qwen-7B-Chat
7.7B
UNLOCKED
Chat
RUNS GREAT
63/100
~28.6 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.6 tok/s
8K
→
32K
✦
✦
salamandra-7b-instruct
7.8B
UNLOCKED
Chat
RUNS GREAT
63/100
~28.2 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.2 tok/s
8K
→
8K
✦
✦
Ministral-8B-Instruct-2410
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Meta-Llama-3.1-8B-Instruct
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Llama-3.1-8B-Instruct-FP8
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Llama-3-Patronus-Lynx-8B-Instruct-v1.1
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Meta-Llama-3.1-8B-Instruct-FP8
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Meta-Llama-3.1-8B-Instruct-quantized.w4a16
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
Meta-Llama-3.1-8B-Instruct-FP8-dynamic
8B
UNLOCKED
Chat
RUNS GREAT
63/100
~27.5 tok/s
ctx
7K
✦ RUNS GREAT
63/100
~27.5 tok/s
7K
→
28K
✦
✦
granite-3.3-8b-instruct
8.2B
UNLOCKED
Chat
RUNS GREAT
62/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
63/100
~26.8 tok/s
6K
→
26K
✦
✦
SDAR-8B-Chat-b32
8.2B
UNLOCKED
Chat
RUNS GREAT
62/100
~26.8 tok/s
ctx
6K
✦ RUNS GREAT
63/100
~26.8 tok/s
6K
→
26K
✦
✦
Qwen2.5-VL-7B-Instruct
8.3B
UNLOCKED
Chat
RUNS GREAT
62/100
~26.5 tok/s
ctx
6K
✦ RUNS GREAT
63/100
~26.5 tok/s
6K
→
25K
✦
✦
rnj-1-instruct
8.3B
UNLOCKED
Chat
RUNS GREAT
62/100
~26.5 tok/s
ctx
6K
✦ RUNS GREAT
63/100
~26.5 tok/s
6K
→
25K
✦
✦
Mistral-NeMo-Minitron-8B-Instruct
8.4B
UNLOCKED
Chat
RUNS GREAT
62/100
~26.2 tok/s
ctx
6K
✦ RUNS GREAT
63/100
~26.2 tok/s
6K
→
8K
✦
✦
Qwen3-30B-A3B-Instruct-2507-AWQ-8bit
9B
MoE
UNLOCKED
Chat
RUNS GREAT
61/100
~19.6 tok/s
ctx
4K
✦ RUNS GREAT
62/100
~19.6 tok/s
4K
→
17K
✦
✦
glm-4-9b-chat-hf
9.4B
UNLOCKED
Chat
RUNS GREAT
61/100
~23.4 tok/s
ctx
3K
✦ RUNS GREAT
63/100
~23.4 tok/s
3K
→
14K
✦
✦
glm-4-9b-chat
9.4B
UNLOCKED
Chat
RUNS GREAT
61/100
~23.4 tok/s
ctx
3K
✦ RUNS GREAT
63/100
~23.4 tok/s
3K
→
14K
✦
✦
Qwen3.5-9B
9.7B
UNLOCKED
General
RUNS WELL
49/100
~22.7 tok/s
ctx
3K
✦ RUNS WELL
51/100
~22.7 tok/s
3K
→
11K
✦
✦
Qwen3.5-9B-Base
9.7B
UNLOCKED
General
RUNS WELL
49/100
~22.7 tok/s
ctx
3K
✦ RUNS WELL
51/100
~22.7 tok/s
3K
→
11K
✦
Qwen2.5-VL-7B
7B
Multimodal
RUNS GREAT
73/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
73/100
~31.4 tok/s
10K
→
32K
✦
DeepSeek-R1-7B
7B
Reasoning
RUNS GREAT
72/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
72/100
~31.4 tok/s
10K
→
42K
✦
MiMo-7B-RL
7B
Reasoning
RUNS GREAT
72/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
72/100
~31.4 tok/s
10K
→
33K
✦
DeepSeek-R1-Distill-Qwen-7B
7.6B
Reasoning
RUNS GREAT
72/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
72/100
~28.9 tok/s
8K
→
33K
✦
Orca-2-7B
7B
Reasoning
RUNS GREAT
71/100
~31.4 tok/s
ctx
4K
RUNS GREAT
71/100
~31.4 tok/s
ctx
4K
Qwen2.5-7B-Instruct
7B
General
RUNS GREAT
69/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~31.4 tok/s
10K
→
42K
✦
Qwen3-32B-quantized.w4a16
5.7B
General
RUNS GREAT
69/100
~38.6 tok/s
ctx
17K
✦ RUNS GREAT
69/100
~38.6 tok/s
17K
→
41K
✦
zephyr-7b-beta
7.2B
General
RUNS GREAT
69/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~30.6 tok/s
10K
→
33K
✦
Mistral-7B-v0.1
7.2B
General
RUNS GREAT
69/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~30.6 tok/s
10K
→
33K
✦
prometheus-7b-v2.0
7.2B
General
RUNS GREAT
69/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~30.6 tok/s
10K
→
33K
✦
xLAM-7b-r
7.2B
General
RUNS GREAT
69/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~30.6 tok/s
10K
→
33K
✦
dolphin-2.6-mistral-7b
7.2B
General
RUNS GREAT
69/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
69/100
~30.6 tok/s
10K
→
33K
✦
Olmo-3-1025-7B
7.3B
General
RUNS GREAT
69/100
~30.1 tok/s
ctx
9K
✦ RUNS GREAT
69/100
~30.1 tok/s
9K
→
37K
✦
Qwen2.5-7B
7.6B
General
RUNS GREAT
69/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.9 tok/s
8K
→
33K
✦
SWE-agent-LM-7B
7.6B
General
RUNS GREAT
69/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.9 tok/s
8K
→
33K
✦
Qwen2-7B
7.6B
General
RUNS GREAT
69/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.9 tok/s
8K
→
33K
✦
VulnLLM-R-7B
7.6B
General
RUNS GREAT
69/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
69/100
~28.9 tok/s
8K
→
33K
✦
Qwen2.5-Coder-7B
7B
Coding
RUNS GREAT
68/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
68/100
~31.4 tok/s
10K
→
42K
✦
granite-3.0-8b-instruct
8B
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
CodeLlama-7B-Instruct
7B
Coding
RUNS GREAT
68/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
68/100
~31.4 tok/s
10K
→
16K
✦
StarCoder2-7B
7B
Coding
RUNS GREAT
68/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
68/100
~31.4 tok/s
10K
→
16K
✦
GLM-4.7-Flash-AWQ-4bit
6.4B
MoE
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
13K
✦ RUNS GREAT
68/100
~27.5 tok/s
13K
→
52K
✦
Llammas-base-p1-GPT-4o-human-error-mix-paragraph-GEC
6.7B
General
RUNS GREAT
68/100
~32.8 tok/s
ctx
4K
RUNS GREAT
67/100
~32.8 tok/s
ctx
4K
Nous-Hermes-llama-2-7b
6.7B
General
RUNS GREAT
68/100
~32.8 tok/s
ctx
4K
RUNS GREAT
67/100
~32.8 tok/s
ctx
4K
Llama-2-7b-hf
6.7B
General
RUNS GREAT
68/100
~32.8 tok/s
ctx
4K
RUNS GREAT
67/100
~32.8 tok/s
ctx
4K
CodeLlama-7b-hf
6.7B
Coding
RUNS GREAT
68/100
~32.8 tok/s
ctx
12K
RUNS GREAT
67/100
~32.8 tok/s
12K
→
16K
✦
deepseek-coder-6.7b-instruct
6.7B
Coding
RUNS GREAT
68/100
~32.8 tok/s
ctx
12K
RUNS GREAT
67/100
~32.8 tok/s
12K
→
16K
✦
deepseek-coder-6.7b-base
6.7B
Coding
RUNS GREAT
68/100
~32.8 tok/s
ctx
12K
RUNS GREAT
67/100
~32.8 tok/s
12K
→
16K
✦
Orca-2-7b
7B
General
RUNS GREAT
68/100
~31.4 tok/s
ctx
4K
RUNS GREAT
68/100
~31.4 tok/s
ctx
4K
Tarsier-7b
7.1B
General
RUNS GREAT
68/100
~31 tok/s
ctx
4K
RUNS GREAT
68/100
~31 tok/s
ctx
4K
starcoder2-7b
7.2B
Coding
RUNS GREAT
68/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
68/100
~30.6 tok/s
10K
→
16K
✦
falcon-7b
7.2B
General
RUNS GREAT
68/100
~30.6 tok/s
ctx
4K
RUNS GREAT
67/100
~30.6 tok/s
ctx
4K
wildguard
7.2B
General
RUNS GREAT
68/100
~30.6 tok/s
ctx
4K
RUNS GREAT
67/100
~30.6 tok/s
ctx
4K
starcoder2-7b-GPTQ
7.4B
Coding
RUNS GREAT
68/100
~29.7 tok/s
ctx
9K
✦ RUNS GREAT
68/100
~29.7 tok/s
9K
→
16K
✦
Qwen2.5-Math-7B
7.6B
General
RUNS GREAT
68/100
~28.9 tok/s
ctx
4K
RUNS GREAT
68/100
~28.9 tok/s
ctx
4K
Llama-3.1-8B
8B
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
Meta-Llama-3-8B
8B
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
Llama-Guard-3-8B
8B
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
llama-3.1-8b-bias-reduced
8B
General
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
RUNS GREAT
68/100
~27.5 tok/s
ctx
4K
Qwen3-235B-A22B
235B
MoE
Reasoning
RUNS WELL
67/100
~0.6 tok/s
ctx
927
✦ RUNS WELL
68/100
~0.6 tok/s
927
→
4K
✦
CodeLlama-7b-Instruct-hf
6.7B
Coding
RUNS GREAT
67/100
~32.8 tok/s
ctx
4K
RUNS GREAT
66/100
~32.8 tok/s
ctx
4K
OLMoE-1B-7B-0125
6.9B
MoE
General
RUNS GREAT
67/100
~25.5 tok/s
ctx
4K
RUNS GREAT
67/100
~25.5 tok/s
ctx
4K
Qwen2.5-Coder-7B-Instruct
7.6B
Coding
RUNS GREAT
67/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
68/100
~28.9 tok/s
8K
→
33K
✦
Qwen2.5-Coder-7B-Instruct-GPTQ-Int4
7.6B
Coding
RUNS GREAT
67/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
68/100
~28.9 tok/s
8K
→
33K
✦
Qwen2.5-Coder-7B-Instruct-AWQ
7.6B
Coding
RUNS GREAT
67/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
68/100
~28.9 tok/s
8K
→
33K
✦
hf-moshiko
7.8B
General
RUNS GREAT
67/100
~28.2 tok/s
ctx
3K
RUNS GREAT
67/100
~28.2 tok/s
ctx
3K
Qwen3-30B-A3B
30B
MoE
Reasoning
RUNS WELL
66/100
~5.9 tok/s
ctx
47K
✦ RUNS WELL
66/100
~5.9 tok/s
47K
→
128K
✦
OLMo-7B-Instruct
7B
General
RUNS GREAT
66/100
~31.4 tok/s
ctx
2K
✦ RUNS GREAT
67/100
~31.4 tok/s
ctx
2K
Falcon-7B-Instruct
7B
General
RUNS GREAT
66/100
~31.4 tok/s
ctx
2K
✦ RUNS GREAT
67/100
~31.4 tok/s
ctx
2K
Amber
6.7B
General
RUNS GREAT
66/100
~32.8 tok/s
ctx
2K
RUNS GREAT
66/100
~32.8 tok/s
ctx
2K
llama-7b
6.7B
General
RUNS GREAT
66/100
~32.8 tok/s
ctx
2K
RUNS GREAT
66/100
~32.8 tok/s
ctx
2K
pythia-6.9b
7B
General
RUNS GREAT
66/100
~31.4 tok/s
ctx
2K
✦ RUNS GREAT
67/100
~31.4 tok/s
ctx
2K
Llama-3.2-1B-Instruct
1B
Chat
RUNS WELL
65/100
~220 tok/s
ctx
128K
RUNS WELL
65/100
~220 tok/s
ctx
128K
gemma-3-1b-it
1B
Chat
RUNS WELL
65/100
~220 tok/s
ctx
33K
RUNS WELL
65/100
~220 tok/s
ctx
33K
Qwen3-4B-Instruct-2507-MLX-8bit
1.1B
Chat
RUNS WELL
65/100
~200 tok/s
ctx
158K
✦ RUNS WELL
65/100
~200 tok/s
158K
→
262K
✦
Qwen3-4B-Instruct-2507-MLX-5bit
0.8B
Chat
RUNS WELL
64/100
~275 tok/s
ctx
223K
✦ RUNS WELL
64/100
~275 tok/s
223K
→
262K
✦
Qwen3-4B-Instruct-2507-MLX-6bit
0.9B
Chat
RUNS WELL
64/100
~244.4 tok/s
ctx
196K
✦ RUNS WELL
65/100
~244.4 tok/s
196K
→
262K
✦
Mistral-7B-Instruct
7B
Chat
RUNS GREAT
63/100
~31.4 tok/s
ctx
10K
✦ RUNS GREAT
64/100
~31.4 tok/s
10K
→
32K
✦
Qwen2.5-0.5B-Instruct
0.5B
Chat
RUNS WELL
63/100
~440 tok/s
ctx
128K
RUNS WELL
63/100
~440 tok/s
ctx
128K
Qwen1.5-0.5B-Chat
0.6B
Chat
RUNS WELL
63/100
~366.7 tok/s
ctx
33K
RUNS WELL
63/100
~366.7 tok/s
ctx
33K
Qwen3-4B-Instruct-2507-MLX-4bit
0.6B
Chat
RUNS WELL
63/100
~366.7 tok/s
ctx
262K
RUNS WELL
63/100
~366.7 tok/s
ctx
262K
TinyLlama-1.1B-Chat-v1.0
1.1B
Chat
RUNS WELL
63/100
~200 tok/s
ctx
2K
RUNS WELL
63/100
~200 tok/s
ctx
2K
tinyllama-oneshot-w8w8-test-static-shape-change
1.1B
Chat
RUNS WELL
63/100
~200 tok/s
ctx
2K
RUNS WELL
63/100
~200 tok/s
ctx
2K
LFM2.5-1.2B-Instruct
1.2B
Chat
RUNS WELL
63/100
~183.3 tok/s
ctx
128K
RUNS WELL
63/100
~183.3 tok/s
ctx
128K
Vikhr-Llama-3.2-1B-Instruct
1.2B
Chat
RUNS WELL
63/100
~183.3 tok/s
ctx
131K
RUNS WELL
63/100
~183.3 tok/s
ctx
131K
openchat-3.5-0106
7B
Chat
RUNS GREAT
63/100
~31.4 tok/s
ctx
8K
RUNS GREAT
63/100
~31.4 tok/s
ctx
8K
Mistral-7B-Instruct-v0.2
7.2B
Chat
RUNS GREAT
63/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
63/100
~30.6 tok/s
10K
→
33K
✦
Mistral-7B-Instruct-v0.3
7.2B
Chat
RUNS GREAT
63/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
63/100
~30.6 tok/s
10K
→
33K
✦
Mistral-7B-Instruct-v0.3-GPTQ
7.2B
Chat
RUNS GREAT
63/100
~30.6 tok/s
ctx
10K
✦ RUNS GREAT
63/100
~30.6 tok/s
10K
→
33K
✦
Olmo-3-7B-Instruct-SFT
7.3B
Chat
RUNS GREAT
63/100
~30.1 tok/s
ctx
9K
✦ RUNS GREAT
63/100
~30.1 tok/s
9K
→
37K
✦
Falcon3-7B-Instruct
7.5B
Chat
RUNS GREAT
63/100
~29.3 tok/s
ctx
9K
✦ RUNS GREAT
64/100
~29.3 tok/s
9K
→
33K
✦
Qwen2-7B-Instruct
7.6B
Chat
RUNS GREAT
63/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.9 tok/s
8K
→
33K
✦
XCurOS-0.1-8B-Instruct
7.6B
Chat
RUNS GREAT
63/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.9 tok/s
8K
→
33K
✦
Dream-v0-Instruct-7B
7.6B
Chat
RUNS GREAT
63/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.9 tok/s
8K
→
33K
✦
Qwen2.5-7B-Instruct-GPTQ-Int4
7.6B
Chat
RUNS GREAT
63/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.9 tok/s
8K
→
33K
✦
Qwen2.5-7B-Instruct-1M
7.6B
Chat
RUNS GREAT
63/100
~28.9 tok/s
ctx
8K
✦ RUNS GREAT
63/100
~28.9 tok/s
8K
→
33K
✦
Yi-6B-Chat
6.1B
Chat
RUNS GREAT
62/100
~36.1 tok/s
ctx
4K
RUNS GREAT
62/100
~36.1 tok/s
ctx
4K
vicuna-7b-v1.5
6.7B
Chat
RUNS GREAT
62/100
~32.8 tok/s
ctx
4K
RUNS GREAT
62/100
~32.8 tok/s
ctx
4K
Llama-2-7b-chat-hf
6.7B
Chat
RUNS GREAT
62/100
~32.8 tok/s
ctx
4K
RUNS GREAT
62/100
~32.8 tok/s
ctx
4K
granite-4.0-h-tiny
6.9B
MoE
Chat
RUNS GREAT
62/100
~25.5 tok/s
ctx
11K
✦ RUNS GREAT
62/100
~25.5 tok/s
11K
→
43K
✦
falcon-7b-instruct
7.2B
Chat
RUNS GREAT
62/100
~30.6 tok/s
ctx
4K
RUNS GREAT
62/100
~30.6 tok/s
ctx
4K
falcon-mamba-7b-instruct
7.3B
Chat
RUNS GREAT
62/100
~30.1 tok/s
ctx
4K
RUNS GREAT
62/100
~30.1 tok/s
ctx
4K
Qwen2.5-Math-7B-Instruct
7.6B
Chat
RUNS GREAT
62/100
~28.9 tok/s
ctx
4K
RUNS GREAT
62/100
~28.9 tok/s
ctx
4K
Meta-Llama-3-8B-Instruct
8B
Chat
RUNS GREAT
62/100
~27.5 tok/s
ctx
4K
RUNS GREAT
62/100
~27.5 tok/s
ctx
4K
Zamba2-1.2B-instruct
1.2B
Chat
RUNS WELL
61/100
~183.3 tok/s
ctx
4K
RUNS WELL
61/100
~183.3 tok/s
ctx
4K
Abliterated-Llama-3.2-1B-Instruct
1.2B
Chat
RUNS WELL
61/100
~183.3 tok/s
ctx
4K
RUNS WELL
61/100
~183.3 tok/s
ctx
4K
OLMoE-1B-7B-0125-Instruct
6.9B
MoE
Chat
RUNS GREAT
61/100
~25.5 tok/s
ctx
4K
RUNS GREAT
61/100
~25.5 tok/s
ctx
4K
Phi-mini-MoE-instruct
7.6B
MoE
Chat
RUNS GREAT
61/100
~23.2 tok/s
ctx
4K
RUNS GREAT
61/100
~23.2 tok/s
ctx
4K
Qwen3-4B-Thinking-2507-MLX-8bit
1.1B
General
RUNS WELL
60/100
~200 tok/s
ctx
158K
✦ RUNS WELL
60/100
~200 tok/s
158K
→
262K
✦
Qwen3-0.6B
0.8B
General
RUNS WELL
59/100
~275 tok/s
ctx
41K
RUNS WELL
59/100
~275 tok/s
ctx
41K
Qwen3Guard-Gen-0.6B
0.8B
General
RUNS WELL
59/100
~275 tok/s
ctx
33K
RUNS WELL
59/100
~275 tok/s
ctx
33K
Qwen3-0.6B-FP8
0.8B
General
RUNS WELL
59/100
~275 tok/s
ctx
41K
RUNS WELL
59/100
~275 tok/s
ctx
41K
Qwen3.5-0.8B
0.9B
General
RUNS WELL
59/100
~244.4 tok/s
ctx
196K
✦ RUNS WELL
60/100
~244.4 tok/s
196K
→
262K
✦
Qwen3.5-0.8B-Base
0.9B
General
RUNS WELL
59/100
~244.4 tok/s
ctx
196K
✦ RUNS WELL
60/100
~244.4 tok/s
196K
→
262K
✦
Qwen3-4B-Thinking-2507-MLX-6bit
0.9B
General
RUNS WELL
59/100
~244.4 tok/s
ctx
196K
✦ RUNS WELL
60/100
~244.4 tok/s
196K
→
262K
✦
LFM2-1.2B
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2.5-1.2B-Thinking
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2.5-1.2B-JP
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2-1.2B-Tool
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2-1.2B-RAG
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2-1.2B-Extract
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2.5-1.2B-Base
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
LFM2-1.2B-MLX-bf16
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
RUNS WELL
59/100
~183.3 tok/s
ctx
128K
Ilama-3.2-1B
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
CyberXP_Agent_Llama_3.2_1B
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
Orpo-Llama-3.2-1B-15k
1.2B
General
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
RUNS WELL
59/100
~183.3 tok/s
ctx
131K
Qwen3-4B-MLX-4bit
0.6B
General
RUNS WELL
58/100
~366.7 tok/s
ctx
66K
RUNS WELL
58/100
~366.7 tok/s
ctx
66K
Qwen1.5-0.5B
0.6B
General
RUNS WELL
58/100
~366.7 tok/s
ctx
33K
RUNS WELL
58/100
~366.7 tok/s
ctx
33K
Qwen3-4B-Thinking-2507-MLX-4bit
0.6B
General
RUNS WELL
58/100
~366.7 tok/s
ctx
262K
RUNS WELL
58/100
~366.7 tok/s
ctx
262K
LFM2-700M
0.7B
General
RUNS WELL
58/100
~314.3 tok/s
ctx
128K
RUNS WELL
58/100
~314.3 tok/s
ctx
128K
Qwen3-8B-speculator.eagle3
1B
General
RUNS WELL
58/100
~220 tok/s
ctx
4K
✦ RUNS WELL
59/100
~220 tok/s
ctx
4K
pythia-1b
1.1B
General
RUNS WELL
58/100
~200 tok/s
ctx
2K
RUNS WELL
58/100
~200 tok/s
ctx
2K
Qwen2.5-1.5B-Instruct
1.5B
Chat
RUNS WELL
57/100
~146.7 tok/s
ctx
111K
✦ RUNS WELL
57/100
~146.7 tok/s
111K
→
128K
✦
Qwen3-4B
4B
Reasoning
RUNS WELL
57/100
~55 tok/s
ctx
31K
RUNS WELL
56/100
~55 tok/s
31K
→
124K
✦
Falcon-H1-0.5B-Base
0.5B
General
RUNS WELL
57/100
~440 tok/s
ctx
16K
RUNS WELL
57/100
~440 tok/s
ctx
16K
Qwen3-4B-DFlash-b16
0.5B
General
RUNS WELL
57/100
~440 tok/s
ctx
41K
RUNS WELL
57/100
~440 tok/s
ctx
41K
h2ovl-mississippi-800m
0.8B
General
RUNS WELL
57/100
~275 tok/s
ctx
4K
RUNS WELL
57/100
~275 tok/s
ctx
4K
ELM
0.9B
General
RUNS WELL
57/100
~244.4 tok/s
ctx
2K
RUNS WELL
57/100
~244.4 tok/s
ctx
2K
Llama-3.2-1B
1.2B
General
RUNS WELL
57/100
~183.3 tok/s
ctx
4K
RUNS WELL
57/100
~183.3 tok/s
ctx
4K
Jan-nano-AWQ
1.3B
General
RUNS WELL
57/100
~169.2 tok/s
ctx
41K
✦ RUNS WELL
58/100
~169.2 tok/s
ctx
41K
EXAONE-4.0-1.2B
1.3B
General
RUNS WELL
57/100
~169.2 tok/s
ctx
66K
✦ RUNS WELL
58/100
~169.2 tok/s
ctx
66K
Qwen3-8B-MLX-4bit
1.3B
General
RUNS WELL
57/100
~169.2 tok/s
ctx
41K
✦ RUNS WELL
58/100
~169.2 tok/s
ctx
41K
plamo-2-1b
1.3B
General
RUNS WELL
57/100
~169.2 tok/s
ctx
131K
✦ RUNS WELL
58/100
~169.2 tok/s
131K
→
523K
✦
Llama-3.2-1B-Instruct-FP8
1.5B
Chat
RUNS WELL
57/100
~146.7 tok/s
ctx
111K
✦ RUNS WELL
57/100
~146.7 tok/s
111K
→
131K
✦
Llama-3.2-1B-Instruct-FP8-dynamic
1.5B
Chat
RUNS WELL
57/100
~146.7 tok/s
ctx
111K
✦ RUNS WELL
57/100
~146.7 tok/s
111K
→
131K
✦
Qwen2-1.5B-Instruct
1.5B
Chat
RUNS WELL
57/100
~146.7 tok/s
ctx
33K
RUNS WELL
57/100
~146.7 tok/s
ctx
33K
Qwen2-1.5B-Instruct-FP8
1.5B
Chat
RUNS WELL
57/100
~146.7 tok/s
ctx
33K
RUNS WELL
57/100
~146.7 tok/s
ctx
33K
bloom-560m
0.6B
General
RUNS WELL
56/100
~366.7 tok/s
ctx
4K
RUNS WELL
56/100
~366.7 tok/s
ctx
4K
GA_Guard_Lite
0.6B
General
RUNS WELL
56/100
~366.7 tok/s
ctx
4K
RUNS WELL
56/100
~366.7 tok/s
ctx
4K
gpt_bigcode-santacoder
1.1B
Coding
RUNS WELL
56/100
~200 tok/s
ctx
2K
RUNS WELL
56/100
~200 tok/s
ctx
2K
OLMo-1B-hf
1.2B
General
RUNS WELL
56/100
~183.3 tok/s
ctx
2K
RUNS WELL
56/100
~183.3 tok/s
ctx
2K
llama-3.2-1b-code-instruct
1.2B
Coding
RUNS WELL
56/100
~183.3 tok/s
ctx
131K
✦ RUNS WELL
57/100
~183.3 tok/s
ctx
131K
starvector-1b-im2svg
1.4B
General
RUNS WELL
56/100
~157.1 tok/s
ctx
8K
RUNS WELL
56/100
~157.1 tok/s
ctx
8K
OLMo-2-0425-1B-Instruct
1.5B
Chat
RUNS WELL
56/100
~146.7 tok/s
ctx
4K
RUNS WELL
56/100
~146.7 tok/s
ctx
4K
Qwen2.5-1.5B
1.5B
General
RUNS WELL
56/100
~146.7 tok/s
ctx
111K
RUNS WELL
55/100
~146.7 tok/s
111K
→
131K
✦
Qwen2-1.5B
1.5B
General
RUNS WELL
56/100
~146.7 tok/s
ctx
111K
RUNS WELL
55/100
~146.7 tok/s
111K
→
131K
✦
Qwen2.5-Math-1.5B-Instruct
1.5B
Chat
RUNS WELL
56/100
~146.7 tok/s
ctx
4K
RUNS WELL
56/100
~146.7 tok/s
ctx
4K
xLAM-2-1b-fc-r
1.5B
General
RUNS WELL
56/100
~146.7 tok/s
ctx
33K
RUNS WELL
55/100
~146.7 tok/s
ctx
33K
qwen-base-invoicev1.01-1.5B
1.5B
General
RUNS WELL
56/100
~146.7 tok/s
ctx
33K
RUNS WELL
55/100
~146.7 tok/s
ctx
33K
Phi-4-mini-reasoning
3.8B
Reasoning
RUNS WELL
56/100
~57.9 tok/s
ctx
16K
RUNS WELL
56/100
~57.9 tok/s
ctx
16K
Llama-4-Maverick
400B
MoE
General
RUNS WELL
55/100
~0.4 tok/s
ctx
2K
✦ RUNS WELL
57/100
~0.4 tok/s
2K
→
7K
✦
bge-m3
0.57B
Embedding
RUNS WELL
55/100
~386 tok/s
ctx
8K
RUNS WELL
55/100
~386 tok/s
ctx
8K
pythia-410m
0.5B
General
RUNS WELL
55/100
~440 tok/s
ctx
2K
RUNS WELL
55/100
~440 tok/s
ctx
2K
pythia-410m-deduped
0.5B
General
RUNS WELL
55/100
~440 tok/s
ctx
2K
RUNS WELL
55/100
~440 tok/s
ctx
2K
bloomz-560m
0.6B
General
RUNS WELL
55/100
~366.7 tok/s
ctx
2K
RUNS WELL
55/100
~366.7 tok/s
ctx
2K
LFM2-VL-1.6B
1.6B
General
RUNS WELL
55/100
~137.5 tok/s
ctx
103K
✦ RUNS WELL
55/100
~137.5 tok/s
103K
→
128K
✦
LFM2.5-VL-1.6B
1.6B
General
RUNS WELL
55/100
~137.5 tok/s
ctx
103K
✦ RUNS WELL
55/100
~137.5 tok/s
103K
→
128K
✦
stablelm-2-1_6b-chat
1.6B
Chat
RUNS WELL
55/100
~137.5 tok/s
ctx
4K
RUNS WELL
55/100
~137.5 tok/s
ctx
4K
Qwen3.5-4B
4.7B
General
RUNS WELL
55/100
~46.8 tok/s
ctx
24K
RUNS WELL
54/100
~46.8 tok/s
24K
→
95K
✦
Qwen3.5-4B-Base
4.7B
General
RUNS WELL
55/100
~46.8 tok/s
ctx
24K
RUNS WELL
54/100
~46.8 tok/s
24K
→
95K
✦
Qwen3-8B-NVFP4
4.7B
General
RUNS WELL
55/100
~46.8 tok/s
ctx
24K
RUNS WELL
54/100
~46.8 tok/s
24K
→
41K
✦
NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ
5.1B
General
RUNS WELL
55/100
~43.1 tok/s
ctx
21K
✦ RUNS WELL
55/100
~43.1 tok/s
21K
→
83K
✦
Qwen3-32B-MLX-4bit
5.1B
General
RUNS WELL
55/100
~43.1 tok/s
ctx
21K
✦ RUNS WELL
55/100
~43.1 tok/s
21K
→
41K
✦
QwQ-32B-MLX-4bit
5.1B
General
RUNS WELL
55/100
~43.1 tok/s
ctx
21K
✦ RUNS WELL
55/100
~43.1 tok/s
21K
→
83K
✦
DeepSeek-V2-Chat
236B
MoE
General
RUNS WELL
54/100
~0.6 tok/s
ctx
1K
✦ RUNS WELL
56/100
~0.6 tok/s
1K
→
5K
✦
DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
1.3B
Reasoning
RUNS WELL
54/100
~169.2 tok/s
ctx
131K
✦ RUNS WELL
54/100
~169.2 tok/s
131K
→
131K
✦
gpt-neo-1.3B
1.4B
General
RUNS WELL
54/100
~157.1 tok/s
ctx
2K
RUNS WELL
54/100
~157.1 tok/s
ctx
2K
phi-1_5
1.4B
General
RUNS WELL
54/100
~157.1 tok/s
ctx
2K
RUNS WELL
54/100
~157.1 tok/s
ctx
2K
LFM2-Audio-1.5B
1.5B
General
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
LFM2.5-Audio-1.5B
1.5B
General
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
OLMo-2-0425-1B
1.5B
General
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
Qwen2.5-Math-1.5B
1.5B
General
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
RUNS WELL
54/100
~146.7 tok/s
ctx
4K
SmolLM2-1.7B
1.7B
General
RUNS WELL
54/100
~129.4 tok/s
ctx
8K
RUNS WELL
54/100
~129.4 tok/s
ctx
8K
Nanbeige4.1-3B-AWQ-8bit
1.7B
General
RUNS WELL
54/100
~129.4 tok/s
ctx
96K
✦ RUNS WELL
54/100
~129.4 tok/s
96K
→
262K
✦
Qwen3-1.7B-Base
1.7B
General
RUNS WELL
54/100
~129.4 tok/s
ctx
33K
RUNS WELL
54/100
~129.4 tok/s
ctx
33K
Qwen3-1.7B-MLX-bf16
1.7B
General
RUNS WELL
54/100
~129.4 tok/s
ctx
41K
RUNS WELL
54/100
~129.4 tok/s
ctx
41K
Qwen2.5-1.5B-Instruct-AWQ
1.8B
Chat
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
Qwen2-1.5B-Instruct-AWQ
1.8B
Chat
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
Qwen2-1.5B-Instruct-GPTQ-Int4
1.8B
Chat
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
Qwen2.5-1.5B-quantized.w8a8
1.8B
General
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
Qwen1.5-1.8B-Chat
1.8B
Chat
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
RUNS WELL
54/100
~122.2 tok/s
ctx
33K
gemma-3n-E2B-it
4B
Multimodal
RUNS WELL
54/100
~55 tok/s
ctx
31K
✦ RUNS WELL
54/100
~55 tok/s
31K
→
124K
✦
Qwen3-14B-MLX-8bit
4.2B
General
RUNS WELL
54/100
~52.4 tok/s
ctx
29K
✦ RUNS WELL
54/100
~52.4 tok/s
29K
→
41K
✦
Qwen3-4B-SafeRL
4.4B
General
RUNS WELL
54/100
~50 tok/s
ctx
27K
✦ RUNS WELL
54/100
~50 tok/s
27K
→
41K
✦
Qwen3-4B-FP8
4.4B
General
RUNS WELL
54/100
~50 tok/s
ctx
27K
✦ RUNS WELL
54/100
~50 tok/s
27K
→
41K
✦
Nemotron-H-4B-Base-8K
4.5B
General
RUNS WELL
54/100
~48.9 tok/s
ctx
8K
RUNS WELL
54/100
~48.9 tok/s
ctx
8K
Qwen2.5-Coder-32B-Instruct-MLX-4bit
5.1B
Coding
RUNS WELL
54/100
~43.1 tok/s
ctx
21K
✦ RUNS WELL
54/100
~43.1 tok/s
21K
→
33K
✦
Qwen2.5-Coder-1.5B
1.5B
Coding
RUNS WELL
53/100
~146.7 tok/s
ctx
111K
✦ RUNS WELL
53/100
~146.7 tok/s
111K
→
128K
✦
gemma-2-2b-it
2B
Chat
RUNS WELL
53/100
~110 tok/s
ctx
8K
RUNS WELL
53/100
~110 tok/s
ctx
8K
granite-3.1-2b-instruct
2B
General
RUNS WELL
53/100
~110 tok/s
ctx
79K
✦ RUNS WELL
53/100
~110 tok/s
79K
→
128K
✦
nomic-embed-text-v1.5
0.14B
Embedding
RUNS WELL
53/100
~1571.4 tok/s
ctx
8K
RUNS WELL
53/100
~1571.4 tok/s
ctx
8K
pythia-1.4b
1.5B
General
RUNS WELL
53/100
~146.7 tok/s
ctx
2K
RUNS WELL
53/100
~146.7 tok/s
ctx
2K
Qwen2.5-Coder-1.5B-Instruct
1.5B
Coding
RUNS WELL
53/100
~146.7 tok/s
ctx
33K
RUNS WELL
53/100
~146.7 tok/s
ctx
33K
Minnow-Math-1.5B
1.6B
General
RUNS WELL
53/100
~137.5 tok/s
ctx
4K
✦ RUNS WELL
54/100
~137.5 tok/s
ctx
4K
QVikhr-3-1.7B-Instruction-noreasoning
1.7B
Reasoning
RUNS WELL
53/100
~129.4 tok/s
ctx
41K
RUNS WELL
53/100
~129.4 tok/s
ctx
41K
bloom-1b7
1.7B
General
RUNS WELL
53/100
~129.4 tok/s
ctx
4K
RUNS WELL
53/100
~129.4 tok/s
ctx
4K
DeepSeek-R1-Distill-Qwen-1.5B
1.8B
Reasoning
RUNS WELL
53/100
~122.2 tok/s
ctx
90K
✦ RUNS WELL
53/100
~122.2 tok/s
90K
→
131K
✦
Qwen3-1.7B
2B
General
RUNS WELL
53/100
~110 tok/s
ctx
41K
RUNS WELL
53/100
~110 tok/s
ctx
41K
Qwen3-1.7B-FP8
2B
General
RUNS WELL
53/100
~110 tok/s
ctx
41K
RUNS WELL
53/100
~110 tok/s
ctx
41K
Phi-4-reasoning-plus-MLX-4bit
2.3B
Reasoning
RUNS WELL
53/100
~95.7 tok/s
ctx
33K
RUNS WELL
53/100
~95.7 tok/s
ctx
33K
DeepSeek-R1-0528-Qwen3-8B-MLX-8bit
2.3B
Reasoning
RUNS WELL
53/100
~95.7 tok/s
ctx
66K
✦ RUNS WELL
53/100
~95.7 tok/s
66K
→
131K
✦
HTML-Pruner-Phi-3.8B
3.8B
General
RUNS WELL
53/100
~57.9 tok/s
ctx
34K
✦ RUNS WELL
53/100
~57.9 tok/s
34K
→
131K
✦
Nanbeige4.1-3B
3.9B
General
RUNS WELL
53/100
~56.4 tok/s
ctx
32K
✦ RUNS WELL
53/100
~56.4 tok/s
32K
→
129K
✦
Qwen3-4B-Base
4B
General
RUNS WELL
53/100
~55 tok/s
ctx
31K
✦ RUNS WELL
53/100
~55 tok/s
31K
→
33K
✦
Qwen3-4B-Thinking-2507
4B
General
RUNS WELL
53/100
~55 tok/s
ctx
31K
✦ RUNS WELL
53/100
~55 tok/s
31K
→
124K
✦
Qwen3-4B-AWQ
4B
General
RUNS WELL
53/100
~55 tok/s
ctx
31K
✦ RUNS WELL
53/100
~55 tok/s
31K
→
41K
✦
Jan-v1-4B
4B
General
RUNS WELL
53/100
~55 tok/s
ctx
31K
✦ RUNS WELL
53/100
~55 tok/s
31K
→
124K
✦
Jan-nano-128k
4B
General
RUNS WELL
53/100
~55 tok/s
ctx
31K
✦ RUNS WELL
53/100
~55 tok/s
31K
→
124K
✦
VLM2Vec-Full
4.1B
General
RUNS WELL
53/100
~53.7 tok/s
ctx
30K
✦ RUNS WELL
54/100
~53.7 tok/s
30K
→
119K
✦
Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit
5.3B
MoE
Coding
RUNS WELL
53/100
~33.2 tok/s
ctx
19K
✦ RUNS WELL
53/100
~33.2 tok/s
19K
→
77K
✦
Llama-4-Scout
17B
MoE
General
RUNS WELL
52/100
~10.4 tok/s
ctx
47K
✦ RUNS WELL
52/100
~10.4 tok/s
47K
→
188K
✦
Mixtral-8x7B-Instruct
46.7B
MoE
General
RUNS WELL
52/100
~3.7 tok/s
ctx
2K
✦ RUNS WELL
55/100
~3.7 tok/s
2K
→
7K
✦
granite-3.0-2b-instruct
2B
General
RUNS WELL
52/100
~110 tok/s
ctx
4K
RUNS WELL
52/100
~110 tok/s
ctx
4K
Phi-4-mini-reasoning-MLX-4bit
0.6B
Reasoning
RUNS WELL
52/100
~366.7 tok/s
ctx
131K
RUNS WELL
52/100
~366.7 tok/s
ctx
131K
SmolLM-1.7B
1.7B
General
RUNS WELL
52/100
~129.4 tok/s
ctx
2K
RUNS WELL
52/100
~129.4 tok/s
ctx
2K
Qwen2.5-Coder-1.5B-Instruct-AWQ
1.8B
Coding
RUNS WELL
52/100
~122.2 tok/s
ctx
33K
RUNS WELL
52/100
~122.2 tok/s
ctx
33K
Qwen3.5-2B
2.3B
General
RUNS WELL
52/100
~95.7 tok/s
ctx
66K
✦ RUNS WELL
52/100
~95.7 tok/s
66K
→
262K
✦
Qwen3.5-2B-Base
2.3B
General
RUNS WELL
52/100
~95.7 tok/s
ctx
66K
✦ RUNS WELL
52/100
~95.7 tok/s
66K
→
262K
✦
Qwen3-8B-MLX-8bit
2.3B
General
RUNS WELL
52/100
~95.7 tok/s
ctx
41K
RUNS WELL
52/100
~95.7 tok/s
ctx
41K
Qwen3-14B-MLX-4bit
2.3B
General
RUNS WELL
52/100
~95.7 tok/s
ctx
41K
RUNS WELL
52/100
~95.7 tok/s
ctx
41K
LFM2-2.6B
2.6B
General
RUNS WELL
52/100
~84.6 tok/s
ctx
57K
✦ RUNS WELL
52/100
~84.6 tok/s
57K
→
128K
✦
LFM2-2.6B-Exp
2.6B
General
RUNS WELL
52/100
~84.6 tok/s
ctx
57K
✦ RUNS WELL
52/100
~84.6 tok/s
57K
→
128K
✦
LFM2-2.6B-Transcript
2.6B
General
RUNS WELL
52/100
~84.6 tok/s
ctx
57K
✦ RUNS WELL
52/100
~84.6 tok/s
57K
→
128K
✦
T-lite-it-1.0_Q4_0
2.9B
General
RUNS WELL
52/100
~75.9 tok/s
ctx
33K
RUNS WELL
52/100
~75.9 tok/s
ctx
33K
LFM2-VL-3B
3B
General
RUNS WELL
52/100
~73.3 tok/s
ctx
47K
✦ RUNS WELL
52/100
~73.3 tok/s
47K
→
128K
✦
SmolLM3-3B
3.1B
General
RUNS WELL
52/100
~71 tok/s
ctx
45K
✦ RUNS WELL
52/100
~71 tok/s
45K
→
66K
✦
SmolLM3-3B-Base
3.1B
General
RUNS WELL
52/100
~71 tok/s
ctx
45K
✦ RUNS WELL
52/100
~71 tok/s
45K
→
66K
✦
Qwen2.5-3B
3.1B
General
RUNS WELL
52/100
~71 tok/s
ctx
33K
RUNS WELL
52/100
~71 tok/s
ctx
33K
xLAM-2-3b-fc-r
3.1B
General
RUNS WELL
52/100
~71 tok/s
ctx
33K
RUNS WELL
52/100
~71 tok/s
ctx
33K
Hermes-3-Llama-3.2-3B
3.2B
General
RUNS WELL
52/100
~68.8 tok/s
ctx
43K
✦ RUNS WELL
52/100
~68.8 tok/s
43K
→
131K
✦
Qwen2.5-Coder-14B-Instruct-MLX-8bit
4.2B
Coding
RUNS WELL
52/100
~52.4 tok/s
ctx
29K
✦ RUNS WELL
52/100
~52.4 tok/s
29K
→
33K
✦
xflux_text_encoders
4.8B
Coding
RUNS WELL
52/100
~45.8 tok/s
ctx
4K
RUNS WELL
52/100
~45.8 tok/s
ctx
4K
bge-large-en-v1.5
0.34B
Embedding
RUNS WELL
51/100
~647.1 tok/s
ctx
512
RUNS WELL
51/100
~647.1 tok/s
ctx
512
h2ovl-mississippi-2b
2.2B
General
RUNS WELL
51/100
~100 tok/s
ctx
4K
RUNS WELL
51/100
~100 tok/s
ctx
4K
EXAONE-3.5-2.4B-Instruct
2.4B
Chat
RUNS WELL
51/100
~91.7 tok/s
ctx
33K
RUNS WELL
51/100
~91.7 tok/s
ctx
33K
gemma-1.1-2b-it
2.5B
General
RUNS WELL
51/100
~88 tok/s
ctx
4K
RUNS WELL
51/100
~88 tok/s
ctx
4K
gemma-2-2b-jpn-it
2.6B
General
RUNS WELL
51/100
~84.6 tok/s
ctx
4K
RUNS WELL
50/100
~84.6 tok/s
ctx
4K
stablelm-3b-4e1t
2.8B
General
RUNS WELL
51/100
~78.6 tok/s
ctx
4K
RUNS WELL
51/100
~78.6 tok/s
ctx
4K
granite-4.0-h-micro
3.2B
MoE
General
RUNS WELL
51/100
~55 tok/s
ctx
43K
✦ RUNS WELL
51/100
~55 tok/s
43K
→
131K
✦
Llama-3.2-3B
3.2B
General
RUNS WELL
51/100
~68.8 tok/s
ctx
4K
RUNS WELL
51/100
~68.8 tok/s
ctx
4K
PowerLM-3b
3.5B
General
RUNS WELL
51/100
~62.9 tok/s
ctx
4K
RUNS WELL
51/100
~62.9 tok/s
ctx
4K
Qwen3-Coder-30B-A3B-Instruct-AWQ
4.6B
MoE
Coding
RUNS WELL
51/100
~38.3 tok/s
ctx
25K
✦ RUNS WELL
51/100
~38.3 tok/s
25K
→
99K
✦
Qwen2.5-VL-7B-Instruct-NVFP4
5B
Chat
RUNS WELL
51/100
~44 tok/s
ctx
21K
RUNS WELL
50/100
~44 tok/s
21K
→
86K
✦
Phi-4-multimodal-instruct
5.6B
Chat
RUNS WELL
51/100
~39.3 tok/s
ctx
17K
✦ RUNS WELL
52/100
~39.3 tok/s
17K
→
69K
✦
Llama-3.2-3B-Instruct
3B
Chat
RUNS WELL
50/100
~73.3 tok/s
ctx
47K
✦ RUNS WELL
50/100
~73.3 tok/s
47K
→
128K
✦
Qwen2.5-3B-Instruct
3B
Chat
RUNS WELL
50/100
~73.3 tok/s
ctx
47K
✦ RUNS WELL
50/100
~73.3 tok/s
47K
→
128K
✦
gemma-3-4b-it
4B
Chat
RUNS WELL
50/100
~55 tok/s
ctx
31K
✦ RUNS WELL
50/100
~55 tok/s
31K
→
124K
✦
Phi-3.5-mini-instruct
3.8B
Chat
RUNS WELL
50/100
~57.9 tok/s
ctx
34K
✦ RUNS WELL
50/100
~57.9 tok/s
34K
→
128K
✦
Phi-4-mini
3.8B
Chat
RUNS WELL
50/100
~57.9 tok/s
ctx
34K
✦ RUNS WELL
50/100
~57.9 tok/s
34K
→
128K
✦
DeepSeek-Coder-V2-16B
16B
MoE
Coding
RUNS WELL
50/100
~11 tok/s
ctx
63K
✦ RUNS WELL
50/100
~11 tok/s
63K
→
128K
✦
StarCoder2-3B
3B
Coding
RUNS WELL
50/100
~73.3 tok/s
ctx
16K
RUNS WELL
50/100
~73.3 tok/s
ctx
16K
Qwen2.5-Coder-14B-Instruct-MLX-4bit
2.3B
Coding
RUNS WELL
50/100
~95.7 tok/s
ctx
33K
RUNS WELL
50/100
~95.7 tok/s
ctx
33K
gpt-neo-2.7B
2.7B
General
RUNS WELL
50/100
~81.5 tok/s
ctx
2K
RUNS WELL
50/100
~81.5 tok/s
ctx
2K
phi-2
2.8B
General
RUNS WELL
50/100
~78.6 tok/s
ctx
2K
RUNS WELL
49/100
~78.6 tok/s
ctx
2K
pythia-2.8b
2.9B
General
RUNS WELL
50/100
~75.9 tok/s
ctx
2K
RUNS WELL
50/100
~75.9 tok/s
ctx
2K
starcoder2-3b
3B
Coding
RUNS WELL
50/100
~73.3 tok/s
ctx
16K
RUNS WELL
50/100
~73.3 tok/s
ctx
16K
Qwen2.5-Coder-3B-Instruct
3.1B
Coding
RUNS WELL
50/100
~71 tok/s
ctx
33K
RUNS WELL
50/100
~71 tok/s
ctx
33K
Qwen2.5-Coder-3B
3.1B
Coding
RUNS WELL
50/100
~71 tok/s
ctx
33K
RUNS WELL
50/100
~71 tok/s
ctx
33K
Qwen2.5-VL-3B-Instruct
3.8B
Chat
RUNS WELL
50/100
~57.9 tok/s
ctx
34K
✦ RUNS WELL
50/100
~57.9 tok/s
34K
→
128K
✦
Qwen3-4B-Instruct-2507
4B
Chat
RUNS WELL
50/100
~55 tok/s
ctx
31K
✦ RUNS WELL
50/100
~55 tok/s
31K
→
124K
✦
Qwen3-4B-Instruct-2507-GPTQ-Int4
4B
Chat
RUNS WELL
50/100
~55 tok/s
ctx
31K
✦ RUNS WELL
50/100
~55 tok/s
31K
→
124K
✦
Qwen3-4B-Instruct-2507-FP8
4.4B
Chat
RUNS WELL
50/100
~50 tok/s
ctx
27K
✦ RUNS WELL
50/100
~50 tok/s
27K
→
107K
✦
Nemotron-H-4B-Instruct-128K
4.5B
Chat
RUNS WELL
50/100
~48.9 tok/s
ctx
26K
✦ RUNS WELL
50/100
~48.9 tok/s
26K
→
103K
✦
Qwen3-30B-A3B-Instruct-2507-AWQ-4bit
5.3B
MoE
Chat
RUNS WELL
50/100
~33.2 tok/s
ctx
19K
✦ RUNS WELL
50/100
~33.2 tok/s
19K
→
77K
✦
granite-4.0-h-tiny-AWQ-4bit
2B
MoE
Chat
RUNS WELL
49/100
~88 tok/s
ctx
79K
✦ RUNS WELL
49/100
~88 tok/s
79K
→
131K
✦
PowerMoE-3b
3.4B
MoE
General
RUNS WELL
49/100
~51.8 tok/s
ctx
4K
✦ RUNS WELL
50/100
~51.8 tok/s
ctx
4K
Qwen2.5-3B-Instruct-AWQ
3.4B
Chat
RUNS WELL
49/100
~64.7 tok/s
ctx
33K
RUNS WELL
49/100
~64.7 tok/s
ctx
33K
granite-3b-code-base-2k
3.5B
Coding
RUNS WELL
49/100
~62.9 tok/s
ctx
2K
RUNS WELL
49/100
~62.9 tok/s
ctx
2K
Llama-3.2-3B-Instruct-FP8
3.6B
Chat
RUNS WELL
49/100
~61.1 tok/s
ctx
36K
✦ RUNS WELL
50/100
~61.1 tok/s
36K
→
131K
✦
Qwen3-32B
32B
Reasoning
DECENT
48/100
~2.9 tok/s
ctx
—
DECENT
48/100
~2.9 tok/s
ctx
—
Phi-3-mini-4k
3.8B
Chat
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
DeepSeek-R1-32B
32B
Reasoning
DECENT
48/100
~2.9 tok/s
ctx
—
DECENT
48/100
~2.9 tok/s
ctx
—
phi-3-mini-4k-instruct
3.8B
Chat
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
Phi-3-mini-4k-instruct-AWQ
3.8B
Chat
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
Phi-3-mini-4k-instruct-gptq-4bit
3.8B
Chat
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
RUNS WELL
48/100
~57.9 tok/s
ctx
4K
Qwen3-30B-A3B-Instruct-2507-AWQ
4.6B
MoE
Chat
RUNS WELL
48/100
~38.3 tok/s
ctx
25K
✦ RUNS WELL
48/100
~38.3 tok/s
25K
→
99K
✦
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
27.8B
Reasoning
DECENT
48/100
~3.6 tok/s
ctx
—
DECENT
48/100
~3.6 tok/s
ctx
—
DeepSeek-R1-Distill-Qwen-32B
32.8B
Reasoning
DECENT
48/100
~2.9 tok/s
ctx
—
✦ DECENT
49/100
~2.9 tok/s
ctx
—
OpenReasoning-Nemotron-32B
32.8B
Reasoning
DECENT
48/100
~2.9 tok/s
ctx
—
✦ DECENT
49/100
~2.9 tok/s
ctx
—
Phi-tiny-MoE-instruct
3.8B
MoE
Chat
RUNS WELL
46/100
~46.3 tok/s
ctx
4K
RUNS WELL
46/100
~46.3 tok/s
ctx
4K
DeepSeek-R1-Distill-Qwen-14B
14.8B
Reasoning
DECENT
46/100
~7.4 tok/s
ctx
—
DECENT
46/100
~7.4 tok/s
ctx
—
Qwen3-14B
14B
Reasoning
DECENT
45/100
~7.9 tok/s
ctx
—
DECENT
45/100
~7.9 tok/s
ctx
—
Phi-4
14B
Reasoning
DECENT
45/100
~7.9 tok/s
ctx
—
DECENT
45/100
~7.9 tok/s
ctx
—
DeepSeek-R1-14B
14B
Reasoning
DECENT
45/100
~7.9 tok/s
ctx
—
DECENT
45/100
~7.9 tok/s
ctx
—
Orca-2-13B
13B
Reasoning
DECENT
45/100
~8.5 tok/s
ctx
—
DECENT
45/100
~8.5 tok/s
ctx
—
Phi-4-reasoning
14B
Reasoning
DECENT
45/100
~7.9 tok/s
ctx
—
DECENT
45/100
~7.9 tok/s
ctx
—
HyperCLOVAX-SEED-Omni-8B
10.7B
General
RUNS WELL
43/100
~20.6 tok/s
ctx
943
✦ RUNS WELL
46/100
~20.6 tok/s
943
→
4K
✦
Llama-3.2-11B-Vision
11B
Multimodal
DECENT
40/100
~10 tok/s
ctx
454
✦ DECENT
42/100
~10 tok/s
454
→
2K
✦
Llama-3.2-11B-Vision-Instruct
10.7B
Chat
RUNS WELL
37/100
~20.6 tok/s
ctx
943
✦ RUNS WELL
39/100
~20.6 tok/s
943
→
4K
✦
SOLAR-10.7B-Instruct-v1.0
10.7B
Chat
RUNS WELL
37/100
~20.6 tok/s
ctx
943
✦ RUNS WELL
39/100
~20.6 tok/s
943
→
4K
✦
CodeLlama-34B-Instruct
34B
Coding
DECENT
36/100
~2.8 tok/s
ctx
—
DECENT
35/100
~2.8 tok/s
ctx
—
CodeLlama-34b-Instruct-hf
33.7B
Coding
DECENT
36/100
~2.8 tok/s
ctx
—
DECENT
35/100
~2.8 tok/s
ctx
—
Qwen2.5-Coder-32B
32B
Coding
DECENT
35/100
~2.9 tok/s
ctx
—
DECENT
35/100
~2.9 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-MLX-4bit
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-MLX-5bit
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-MLX-8bit
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-MLX-6bit
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-gptq-4bit
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-FP8
30.5B
MoE
Coding
DECENT
35/100
~2.5 tok/s
ctx
—
DECENT
35/100
~2.5 tok/s
ctx
—
Qwen2.5-Coder-32B-Instruct
32.8B
Coding
DECENT
35/100
~2.9 tok/s
ctx
—
DECENT
35/100
~2.9 tok/s
ctx
—
Qwen2.5-Coder-32B-Instruct-AWQ
32.8B
Coding
DECENT
35/100
~2.9 tok/s
ctx
—
DECENT
35/100
~2.9 tok/s
ctx
—
StarCoder2-15B
15B
Coding
DECENT
34/100
~7.3 tok/s
ctx
—
DECENT
34/100
~7.3 tok/s
ctx
—
Qwen3-Coder-Next-AWQ-4bit
14.4B
Coding
DECENT
34/100
~7.6 tok/s
ctx
—
DECENT
34/100
~7.6 tok/s
ctx
—
Qwen2.5-Coder-14B-Instruct
14.8B
Coding
DECENT
34/100
~7.4 tok/s
ctx
—
DECENT
34/100
~7.4 tok/s
ctx
—
Qwen2.5-Coder-14B-Instruct-AWQ
14.8B
Coding
DECENT
34/100
~7.4 tok/s
ctx
—
DECENT
34/100
~7.4 tok/s
ctx
—
WizardCoder-15B-V1.0
15.5B
Coding
DECENT
34/100
~7.1 tok/s
ctx
—
DECENT
34/100
~7.1 tok/s
ctx
—
Qwen3-Coder-30B-A3B-Instruct-FP4
15.6B
MoE
Coding
DECENT
34/100
~5.6 tok/s
ctx
—
DECENT
34/100
~5.6 tok/s
ctx
—
starcoder2-15b
15.7B
Coding
DECENT
34/100
~7 tok/s
ctx
—
DECENT
34/100
~7 tok/s
ctx
—
DeepSeek-Coder-V2-Lite-Instruct
15.7B
MoE
Coding
DECENT
34/100
~5.6 tok/s
ctx
—
DECENT
34/100
~5.6 tok/s
ctx
—
DeepSeek-Coder-V2-Lite-Instruct-FP8
15.7B
MoE
Coding
DECENT
34/100
~5.6 tok/s
ctx
—
DECENT
34/100
~5.6 tok/s
ctx
—
Qwen2.5-Coder-14B
14B
Coding
DECENT
33/100
~7.9 tok/s
ctx
—
DECENT
33/100
~7.9 tok/s
ctx
—
CodeLlama-13B-Instruct
13B
Coding
DECENT
33/100
~8.5 tok/s
ctx
—
DECENT
33/100
~8.5 tok/s
ctx
—
CodeLlama-13b-Instruct-hf
13B
Coding
DECENT
33/100
~8.5 tok/s
ctx
—
DECENT
33/100
~8.5 tok/s
ctx
—
xLAM-8x7b-r
46.7B
MoE
General
DECENT
32/100
~1.5 tok/s
ctx
—
DECENT
32/100
~1.5 tok/s
ctx
—
Nous-Hermes-2-Mixtral-8x7B-DPO
46.7B
MoE
General
DECENT
32/100
~1.5 tok/s
ctx
—
DECENT
32/100
~1.5 tok/s
ctx
—
Llama-3_3-Nemotron-Super-49B-v1_5
49.9B
General
DECENT
32/100
~1.7 tok/s
ctx
—
DECENT
32/100
~1.7 tok/s
ctx
—
Llama-3_3-Nemotron-Super-49B-v1_5-FP8
49.9B
General
DECENT
32/100
~1.7 tok/s
ctx
—
DECENT
32/100
~1.7 tok/s
ctx
—
Llama-3_3-Nemotron-Super-49B-v1
49.9B
General
DECENT
32/100
~1.7 tok/s
ctx
—
DECENT
32/100
~1.7 tok/s
ctx
—
Mistral-Small-24B
24B
General
DECENT
31/100
~4.3 tok/s
ctx
—
DECENT
31/100
~4.3 tok/s
ctx
—
Qwen2.5-32B-Instruct
32B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
gemma-2-27b-it
27B
General
DECENT
31/100
~3.7 tok/s
ctx
—
DECENT
31/100
~3.7 tok/s
ctx
—
gemma-3-27b-it
27B
General
DECENT
31/100
~3.7 tok/s
ctx
—
DECENT
31/100
~3.7 tok/s
ctx
—
Command-R
35B
General
DECENT
31/100
~2.7 tok/s
ctx
—
DECENT
31/100
~2.7 tok/s
ctx
—
Falcon-40B-Instruct
40B
General
DECENT
31/100
~2.1 tok/s
ctx
—
DECENT
31/100
~2.1 tok/s
ctx
—
t5gemma-9b-9b-ul2
20.3B
General
DECENT
31/100
~5.3 tok/s
ctx
—
DECENT
30/100
~5.3 tok/s
ctx
—
gpt-oss-20b
21.5B
General
DECENT
31/100
~5 tok/s
ctx
—
DECENT
31/100
~5 tok/s
ctx
—
ERNIE-4.5-21B-A3B-MLX-4bit
21.8B
MoE
General
DECENT
31/100
~3.9 tok/s
ctx
—
DECENT
31/100
~3.9 tok/s
ctx
—
ERNIE-4.5-21B-A3B-MLX-6bit
21.8B
MoE
General
DECENT
31/100
~3.9 tok/s
ctx
—
DECENT
31/100
~3.9 tok/s
ctx
—
ERNIE-4.5-21B-A3B-MLX-8bit
21.8B
MoE
General
DECENT
31/100
~3.9 tok/s
ctx
—
DECENT
31/100
~3.9 tok/s
ctx
—
LFM2-24B-A2B-MLX-4bit
23.8B
MoE
General
DECENT
31/100
~3.5 tok/s
ctx
—
DECENT
31/100
~3.5 tok/s
ctx
—
LFM2-24B-A2B-MLX-6bit
23.8B
MoE
General
DECENT
31/100
~3.5 tok/s
ctx
—
DECENT
31/100
~3.5 tok/s
ctx
—
LFM2-24B-A2B-MLX-8bit
23.8B
MoE
General
DECENT
31/100
~3.5 tok/s
ctx
—
DECENT
31/100
~3.5 tok/s
ctx
—
LFM2-24B-A2B-MLX-5bit
23.8B
MoE
General
DECENT
31/100
~3.5 tok/s
ctx
—
DECENT
31/100
~3.5 tok/s
ctx
—
LFM2-24B-A2B
23.8B
MoE
General
DECENT
31/100
~3.5 tok/s
ctx
—
DECENT
31/100
~3.5 tok/s
ctx
—
Qwen3.5-27B
27.8B
General
DECENT
31/100
~3.6 tok/s
ctx
—
DECENT
31/100
~3.6 tok/s
ctx
—
GLM-4.7-Flash-MLX-8bit
29.9B
MoE
General
DECENT
31/100
~2.6 tok/s
ctx
—
DECENT
31/100
~2.6 tok/s
ctx
—
GLM-4.7-Flash-MLX-6bit
29.9B
MoE
General
DECENT
31/100
~2.6 tok/s
ctx
—
DECENT
31/100
~2.6 tok/s
ctx
—
Qwen3-30B-A3B-Thinking-2507
30.5B
MoE
General
DECENT
31/100
~2.5 tok/s
ctx
—
DECENT
30/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-GPTQ-Int4
30.5B
MoE
General
DECENT
31/100
~2.5 tok/s
ctx
—
DECENT
30/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-Base
30.5B
MoE
General
DECENT
31/100
~2.5 tok/s
ctx
—
DECENT
30/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-AWQ
30.5B
MoE
General
DECENT
31/100
~2.5 tok/s
ctx
—
DECENT
30/100
~2.5 tok/s
ctx
—
GLM-4.7-Flash
31.2B
MoE
General
DECENT
31/100
~2.4 tok/s
ctx
—
DECENT
31/100
~2.4 tok/s
ctx
—
GLM-4.7-Flash-AWQ
31.2B
MoE
General
DECENT
31/100
~2.4 tok/s
ctx
—
DECENT
31/100
~2.4 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-4bit
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8bit
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-6bit
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-5bit
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
31.6B
General
DECENT
31/100
~3 tok/s
ctx
—
DECENT
31/100
~3 tok/s
ctx
—
EXAONE-4.0-32B
32B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
EXAONE-4.0.1-32B
32B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
EXAONE-4.0-32B-FP8
32B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
sarvam-30b
32.2B
MoE
General
DECENT
31/100
~2.3 tok/s
ctx
—
DECENT
31/100
~2.3 tok/s
ctx
—
Olmo-3-1125-32B
32.2B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
Qwen3-32B-AWQ
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
Qwen2.5-32B
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
QwQ-32B-AWQ
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
Baichuan-M2-32B
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
QwQ-32B
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
xLAM-2-32b-fc-r
32.8B
General
DECENT
31/100
~2.9 tok/s
ctx
—
DECENT
31/100
~2.9 tok/s
ctx
—
HyperCLOVAX-SEED-Think-32B
33.3B
General
DECENT
31/100
~2.8 tok/s
ctx
—
DECENT
31/100
~2.8 tok/s
ctx
—
dolphin-2.9.1-yi-1.5-34b
34.4B
General
DECENT
31/100
~2.7 tok/s
ctx
—
DECENT
31/100
~2.7 tok/s
ctx
—
c4ai-command-r-v01
35B
General
DECENT
31/100
~2.7 tok/s
ctx
—
DECENT
31/100
~2.7 tok/s
ctx
—
Qwen3.5-35B-A3B
36B
MoE
General
DECENT
31/100
~2.1 tok/s
ctx
—
DECENT
31/100
~2.1 tok/s
ctx
—
Bielik-11B-v3.0-Instruct
11.2B
Chat
DECENT
30/100
~9.8 tok/s
ctx
142
✦ DECENT
32/100
~9.8 tok/s
142
→
568
✦
Qwen3-Next-80B-A3B-Thinking-AWQ-4bit
14.7B
General
DECENT
30/100
~7.5 tok/s
ctx
—
DECENT
30/100
~7.5 tok/s
ctx
—
HyperCLOVAX-SEED-Think-14B-GPTQ
14.7B
General
DECENT
30/100
~7.5 tok/s
ctx
—
DECENT
30/100
~7.5 tok/s
ctx
—
Qwen3-14B-AWQ
14.8B
General
DECENT
30/100
~7.4 tok/s
ctx
—
DECENT
30/100
~7.4 tok/s
ctx
—
Qwen3-14B-Base
14.8B
General
DECENT
30/100
~7.4 tok/s
ctx
—
DECENT
30/100
~7.4 tok/s
ctx
—
Qwen2.5-14B
14.8B
General
DECENT
30/100
~7.4 tok/s
ctx
—
DECENT
30/100
~7.4 tok/s
ctx
—
Qwen3-30B-A3B-NVFP4
15.6B
MoE
General
DECENT
30/100
~5.6 tok/s
ctx
—
DECENT
30/100
~5.6 tok/s
ctx
—
DeepSeek-V2-Lite
15.7B
MoE
General
DECENT
30/100
~5.6 tok/s
ctx
—
DECENT
30/100
~5.6 tok/s
ctx
—
Moonlight-16B-A3B
16B
MoE
General
DECENT
30/100
~5.5 tok/s
ctx
—
DECENT
30/100
~5.5 tok/s
ctx
—
deepseek-moe-16b-base
16.4B
General
DECENT
30/100
~6.7 tok/s
ctx
—
DECENT
30/100
~6.7 tok/s
ctx
—
Ling-lite
16.8B
MoE
General
DECENT
30/100
~5.2 tok/s
ctx
—
DECENT
30/100
~5.2 tok/s
ctx
—
Qwen3-32B-NVFP4
17.2B
General
DECENT
30/100
~6.2 tok/s
ctx
—
DECENT
30/100
~6.2 tok/s
ctx
—
NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18.2B
General
DECENT
30/100
~5.9 tok/s
ctx
—
DECENT
30/100
~5.9 tok/s
ctx
—
Mistral-Nemo-12B
12B
General
DECENT
29/100
~9.2 tok/s
ctx
—
DECENT
29/100
~9.2 tok/s
ctx
—
Qwen2.5-14B-Instruct
14B
General
DECENT
29/100
~7.9 tok/s
ctx
—
DECENT
29/100
~7.9 tok/s
ctx
—
gemma-3-12b-it
12B
General
DECENT
29/100
~9.2 tok/s
ctx
—
DECENT
29/100
~9.2 tok/s
ctx
—
Phi-3-medium-128k
14B
General
DECENT
29/100
~7.9 tok/s
ctx
—
DECENT
29/100
~7.9 tok/s
ctx
—
OLMo-2-13B-Instruct
13B
General
DECENT
29/100
~8.5 tok/s
ctx
—
DECENT
29/100
~8.5 tok/s
ctx
—
pythia-12b
12B
General
DECENT
29/100
~9.2 tok/s
ctx
—
DECENT
29/100
~9.2 tok/s
ctx
—
Orca-2-13b
13B
General
DECENT
29/100
~8.5 tok/s
ctx
—
DECENT
29/100
~8.5 tok/s
ctx
—
HarmBench-Llama-2-13b-cls
13B
General
DECENT
29/100
~8.5 tok/s
ctx
—
DECENT
29/100
~8.5 tok/s
ctx
—
llm-jp-3.1-13b
13.7B
General
DECENT
29/100
~8 tok/s
ctx
—
DECENT
29/100
~8 tok/s
ctx
—
phi-4
14B
General
DECENT
29/100
~7.9 tok/s
ctx
—
DECENT
29/100
~7.9 tok/s
ctx
—
Phi-3-medium-14b-instruct
14B
General
DECENT
29/100
~7.9 tok/s
ctx
—
DECENT
29/100
~7.9 tok/s
ctx
—
Qwen1.5-MoE-A2.7B
14.3B
MoE
General
DECENT
29/100
~6.2 tok/s
ctx
—
DECENT
29/100
~6.2 tok/s
ctx
—
Yi-34B-Chat
34.4B
Chat
DECENT
23/100
~2.7 tok/s
ctx
—
DECENT
22/100
~2.7 tok/s
ctx
—
Seed-OSS-36B-Instruct-MLX-8bit
36.2B
MoE
Chat
DECENT
23/100
~2.1 tok/s
ctx
—
DECENT
23/100
~2.1 tok/s
ctx
—
Seed-OSS-36B-Instruct-MLX-4bit
36.2B
MoE
Chat
DECENT
23/100
~2.1 tok/s
ctx
—
DECENT
23/100
~2.1 tok/s
ctx
—
Seed-OSS-36B-Instruct-MLX-5bit
36.2B
MoE
Chat
DECENT
23/100
~2.1 tok/s
ctx
—
DECENT
23/100
~2.1 tok/s
ctx
—
Seed-OSS-36B-Instruct-MLX-6bit
36.2B
MoE
Chat
DECENT
23/100
~2.1 tok/s
ctx
—
DECENT
23/100
~2.1 tok/s
ctx
—
MiniMax-M2.5-AWQ-4bit
36.8B
MoE
Chat
DECENT
23/100
~2 tok/s
ctx
—
DECENT
23/100
~2 tok/s
ctx
—
Mixtral-8x7B-Instruct-v0.1
46.7B
MoE
Chat
DECENT
23/100
~1.5 tok/s
ctx
—
DECENT
23/100
~1.5 tok/s
ctx
—
Kimi-Linear-48B-A3B-Instruct
49.1B
Chat
DECENT
23/100
~1.7 tok/s
ctx
—
DECENT
23/100
~1.7 tok/s
ctx
—
vicuna-13b-v1.5
13B
Chat
DECENT
22/100
~8.5 tok/s
ctx
—
DECENT
21/100
~8.5 tok/s
ctx
—
WizardLM-13B-V1.2
13B
Chat
DECENT
22/100
~8.5 tok/s
ctx
—
DECENT
21/100
~8.5 tok/s
ctx
—
llm-jp-3.1-13b-instruct4
13.7B
Chat
DECENT
22/100
~8 tok/s
ctx
—
DECENT
22/100
~8 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-AWQ-4bit
14.7B
Chat
DECENT
22/100
~7.5 tok/s
ctx
—
DECENT
22/100
~7.5 tok/s
ctx
—
Qwen3-14B-Instruct
14.8B
Chat
DECENT
22/100
~7.4 tok/s
ctx
—
DECENT
22/100
~7.4 tok/s
ctx
—
Qwen2.5-14B-Instruct-AWQ
14.8B
Chat
DECENT
22/100
~7.4 tok/s
ctx
—
DECENT
22/100
~7.4 tok/s
ctx
—
Qwen2.5-14B-Instruct-GPTQ-Int4
14.8B
Chat
DECENT
22/100
~7.4 tok/s
ctx
—
DECENT
22/100
~7.4 tok/s
ctx
—
Qwen2.5-14B-Instruct-1M
14.8B
Chat
DECENT
22/100
~7.4 tok/s
ctx
—
DECENT
22/100
~7.4 tok/s
ctx
—
Qwen2.5-14B-Instruct-GPTQ-Int8
14.8B
Chat
DECENT
22/100
~7.4 tok/s
ctx
—
DECENT
22/100
~7.4 tok/s
ctx
—
Qwen3-30B-A3B-Instruct-2507-FP4
15.6B
MoE
Chat
DECENT
22/100
~5.6 tok/s
ctx
—
DECENT
22/100
~5.6 tok/s
ctx
—
DeepSeek-V2-Lite-Chat
15.7B
MoE
Chat
DECENT
22/100
~5.6 tok/s
ctx
—
DECENT
22/100
~5.6 tok/s
ctx
—
Moonlight-16B-A3B-Instruct
16B
MoE
Chat
DECENT
22/100
~5.5 tok/s
ctx
—
DECENT
22/100
~5.5 tok/s
ctx
—
LLaDA2.0-mini
16.3B
MoE
Chat
DECENT
22/100
~5.4 tok/s
ctx
—
DECENT
22/100
~5.4 tok/s
ctx
—
LLaDA2.1-mini
16.3B
MoE
Chat
DECENT
22/100
~5.4 tok/s
ctx
—
DECENT
22/100
~5.4 tok/s
ctx
—
deepseek-moe-16b-chat
16.4B
Chat
DECENT
22/100
~6.7 tok/s
ctx
—
DECENT
22/100
~6.7 tok/s
ctx
—
Mistral-Small-24B-Instruct-2501-AWQ
23.6B
Chat
DECENT
22/100
~4.4 tok/s
ctx
—
DECENT
22/100
~4.4 tok/s
ctx
—
Mistral-Small-24B-Instruct-2501-FP8-dynamic
23.6B
Chat
DECENT
22/100
~4.4 tok/s
ctx
—
DECENT
22/100
~4.4 tok/s
ctx
—
Mistral-Small-24B-Instruct-2501
24B
Chat
DECENT
22/100
~4.3 tok/s
ctx
—
DECENT
22/100
~4.3 tok/s
ctx
—
Qwen3-30B-A3B-Instruct-2507-MLX-4bit
30.5B
MoE
Chat
DECENT
22/100
~2.5 tok/s
ctx
—
DECENT
22/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-Instruct-2507-MLX-8bit
30.5B
MoE
Chat
DECENT
22/100
~2.5 tok/s
ctx
—
DECENT
22/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-Instruct-2507-MLX-6bit
30.5B
MoE
Chat
DECENT
22/100
~2.5 tok/s
ctx
—
DECENT
22/100
~2.5 tok/s
ctx
—
Qwen3-30B-A3B-Instruct-2507-FP8
30.5B
MoE
Chat
DECENT
22/100
~2.5 tok/s
ctx
—
DECENT
22/100
~2.5 tok/s
ctx
—
Qwen3-VL-30B-A3B-Instruct-AWQ
31.1B
MoE
Chat
DECENT
22/100
~2.4 tok/s
ctx
—
DECENT
22/100
~2.4 tok/s
ctx
—
OLMo-2-0325-32B-Instruct
32.2B
Chat
DECENT
22/100
~2.9 tok/s
ctx
—
DECENT
22/100
~2.9 tok/s
ctx
—
Qwen2.5-32B-Instruct-AWQ
32.8B
Chat
DECENT
22/100
~2.9 tok/s
ctx
—
DECENT
22/100
~2.9 tok/s
ctx
—
Qwen2.5-32B-Instruct-GPTQ-Int4
32.8B
Chat
DECENT
22/100
~2.9 tok/s
ctx
—
DECENT
22/100
~2.9 tok/s
ctx
—
Qwen2.5-32B-Instruct-GPTQ-Int8
32.8B
Chat
DECENT
22/100
~2.9 tok/s
ctx
—
DECENT
22/100
~2.9 tok/s
ctx
—
MiniMax-M2.5-BF16-INT4-AWQ
39.1B
MoE
Chat
DECENT
22/100
~1.8 tok/s
ctx
—
DECENT
22/100
~1.8 tok/s
ctx
—
falcon-40b-instruct
40B
Chat
DECENT
22/100
~2.1 tok/s
ctx
—
DECENT
22/100
~2.1 tok/s
ctx
—
GigaChat3-10B-A1.8B
11.5B
MoE
Chat
DECENT
21/100
~7.7 tok/s
ctx
—
DECENT
21/100
~7.7 tok/s
ctx
—
Mistral-Nemo-Instruct-2407
12.2B
Chat
DECENT
21/100
~9 tok/s
ctx
—
✦ DECENT
22/100
~9 tok/s
ctx
—
mistral-nemo-instruct-2407-awq
12.2B
Chat
DECENT
21/100
~9 tok/s
ctx
—
✦ DECENT
22/100
~9 tok/s
ctx
—
MiniMax-M2.7
230B
Reasoning
TOO HEAVY
5/100
~0.7 tok/s
ctx
—
TOO HEAVY
5/100
~0.7 tok/s
ctx
—
DeepSeek-R1-0528-NVFP4-v2
393.6B
MoE
Reasoning
TOO HEAVY
5/100
~0.3 tok/s
ctx
—
TOO HEAVY
5/100
~0.3 tok/s
ctx
—
DeepSeek-R1-NVFP4
396.8B
MoE
Reasoning
TOO HEAVY
5/100
~0.3 tok/s
ctx
—
TOO HEAVY
5/100
~0.3 tok/s
ctx
—
DeepSeek-R1
684.5B
MoE
Reasoning
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
DeepSeek-R1-0528
684.5B
MoE
Reasoning
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
DeepSeek-V3.2-Speciale
685B
MoE
Reasoning
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
TOO HEAVY
5/100
~0.2 tok/s
ctx
—
DeepSeek-R1-70B
70B
Reasoning
TOO HEAVY
3/100
~2.5 tok/s
ctx
—
TOO HEAVY
3/100
~2.5 tok/s
ctx
—
DeepSeek-R1-Distill-Llama-70B-FP8-dynamic
70.6B
Reasoning
TOO HEAVY
3/100
~2.4 tok/s
ctx
—
TOO HEAVY
3/100
~2.4 tok/s
ctx
—
Llama-3.1-70B-Instruct
70B
General
TOO HEAVY
0/100
~2.5 tok/s
ctx
—
TOO HEAVY
0/100
~2.5 tok/s
ctx
—
Llama-3.1-405B-Instruct
405B
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
Llama-3.2-90B-Vision
90B
Multimodal
TOO HEAVY
0/100
~1.9 tok/s
ctx
—
TOO HEAVY
0/100
~1.9 tok/s
ctx
—
Mixtral-8x22B-Instruct
141B
MoE
General
TOO HEAVY
0/100
~1 tok/s
ctx
—
TOO HEAVY
0/100
~1 tok/s
ctx
—
Qwen2.5-72B-Instruct
72B
General
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2.5-VL-72B
72B
Multimodal
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
DeepSeek-V3
671B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
Command-R+
104B
General
TOO HEAVY
0/100
~1.7 tok/s
ctx
—
TOO HEAVY
0/100
~1.7 tok/s
ctx
—
Qwen3.5-122B-A10B-NVFP4
64.4B
MoE
General
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
67.2B
General
TOO HEAVY
0/100
~2.6 tok/s
ctx
—
TOO HEAVY
0/100
~2.6 tok/s
ctx
—
Llama-3.3-70B-Instruct
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
llama-3.3-70b-instruct-awq
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Llama-3.3-70B-Instruct-AWQ
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
L3.3-GeneticLemonade-Final-v2-70B
70.6B
General
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Meta-Llama-3.3-70B-Instruct-AWQ-INT4
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Meta-Llama-3.1-70B-Instruct-quantized.w4a16
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Meta-Llama-3-70B-Instruct
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Llama-3.1-70B
70.6B
General
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Llama-3.1-Swallow-70B-Instruct-v0.3
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Meta-Llama-3.1-70B-Instruct-FP8
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Llama-3.3-70B-Instruct-FP8-dynamic
70.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
jais-adapted-70b-chat-4bit-bnb
71.6B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2.5-72B-Instruct-abliterated
72.7B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2.5-72B
72.7B
General
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2-72B-Instruct
72.7B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2-72B
72.7B
General
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2.5-72B-Instruct-AWQ
73B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen2.5-72B-Instruct-GPTQ-Int4
73B
Chat
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
TOO HEAVY
0/100
~2.4 tok/s
ctx
—
Qwen3-Coder-Next-8bit
79.7B
Coding
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-MLX-4bit
79.7B
Chat
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-MLX-8bit
79.7B
Chat
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-MLX-6bit
79.7B
Chat
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-MLX-5bit
79.7B
Chat
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Coder-Next
79.7B
Coding
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Coder-Next-FP8
79.7B
Coding
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
TOO HEAVY
0/100
~2.2 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct
81.3B
Chat
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
Qwen3-Next-80B-A3B-Instruct-FP8
81.3B
Chat
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
TOO HEAVY
0/100
~2.1 tok/s
ctx
—
GLM-4.5-Air
110.5B
MoE
General
TOO HEAVY
0/100
~1.2 tok/s
ctx
—
TOO HEAVY
0/100
~1.2 tok/s
ctx
—
Qwen1.5-110B-Chat-AWQ
111.2B
Chat
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
gpt-oss-120b-MLX-8bit
116.8B
General
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
gpt-oss-120b-heretic
116.8B
General
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
TOO HEAVY
0/100
~1.5 tok/s
ctx
—
gpt-oss-120b
120.4B
General
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
XORTRON.CriminalComputing.LARGE.2026.3
122.6B
General
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
NVIDIA-Nemotron-3-Super-120B-A12B-FP8
123.6B
General
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
NVIDIA-Nemotron-3-Super-120B-A12B-BF16
123.6B
General
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
TOO HEAVY
0/100
~1.4 tok/s
ctx
—
Qwen3.5-122B-A10B
125.1B
MoE
General
TOO HEAVY
0/100
~1.1 tok/s
ctx
—
TOO HEAVY
0/100
~1.1 tok/s
ctx
—
Mixtral-8x22B-Instruct-v0.1
140.6B
MoE
Chat
TOO HEAVY
0/100
~1 tok/s
ctx
—
TOO HEAVY
0/100
~1 tok/s
ctx
—
dots.llm1.inst
142.8B
MoE
General
TOO HEAVY
0/100
~1 tok/s
ctx
—
TOO HEAVY
0/100
~1 tok/s
ctx
—
bloom
176.2B
General
TOO HEAVY
0/100
~1 tok/s
ctx
—
TOO HEAVY
0/100
~1 tok/s
ctx
—
GLM-4.7-NVFP4
177.2B
MoE
General
TOO HEAVY
0/100
~0.8 tok/s
ctx
—
TOO HEAVY
0/100
~0.8 tok/s
ctx
—
falcon-180B-chat
179.5B
Chat
TOO HEAVY
0/100
~1 tok/s
ctx
—
TOO HEAVY
0/100
~1 tok/s
ctx
—
Step-3.5-Flash
199.4B
MoE
General
TOO HEAVY
0/100
~0.7 tok/s
ctx
—
TOO HEAVY
0/100
~0.7 tok/s
ctx
—
Step-3.5-Flash-FP8
199.4B
MoE
General
TOO HEAVY
0/100
~0.7 tok/s
ctx
—
TOO HEAVY
0/100
~0.7 tok/s
ctx
—
MiniMax-M2.5-MLX-8bit
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2.5-MLX-4bit
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2.5-MLX-6bit
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2-AWQ
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2.5-AWQ
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2.5
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
MiniMax-M2.1
228.7B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
Qwen3-235B-A22B-Instruct-2507-FP8
235.1B
MoE
Chat
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
Qwen3-235B-A22B-Thinking-2507-FP8
235.1B
MoE
General
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
Qwen3-235B-A22B-FP8
235.1B
MoE
General
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
deepseek-coder-v2-instruct-awq
235.7B
MoE
Coding
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
DeepSeek-V2.5-1210-FP8
235.7B
MoE
General
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
K-EXAONE-236B-A23B
237.1B
MoE
General
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
TOO HEAVY
0/100
~0.6 tok/s
ctx
—
ERNIE-4.5-300B-A47B-Paddle
300.5B
MoE
General
TOO HEAVY
0/100
~0.5 tok/s
ctx
—
TOO HEAVY
0/100
~0.5 tok/s
ctx
—
MiMo-V2-Flash
309.8B
MoE
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
GLM-4.6
356.8B
MoE
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
GLM-4.7
358.3B
MoE
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
GLM-4.5
358.3B
MoE
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
DeepSeek-V3.2-NVFP4
394.5B
MoE
General
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
DeepSeek-V3-0324-NVFP4
396.8B
MoE
General
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
Llama-4-Maverick-17B-128E-Instruct
401.6B
Chat
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
Qwen3.5-397B-A17B
403.4B
MoE
General
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
Llama-3.1-405B
405.9B
General
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
TOO HEAVY
0/100
~0.4 tok/s
ctx
—
Qwen3-Coder-480B-A35B-Instruct
480.2B
MoE
Coding
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
LongCat-Flash-Chat
561.9B
Chat
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
TOO HEAVY
0/100
~0.3 tok/s
ctx
—
DeepSeek-V3-0324
684.5B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
DeepSeek-V3.2-AWQ
685B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
DeepSeek-V3.2
685.4B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
GLM-5
753.9B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
GLM-5-FP8
753.9B
MoE
General
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
TOO HEAVY
0/100
~0.2 tok/s
ctx
—
Kimi-K2-Instruct
1026.5B
MoE
Chat
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
Kimi-K2-Instruct-0905
1026.5B
MoE
Chat
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
Kimi-K2-Thinking
1058.1B
MoE
General
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
Kimi-K2.5
1058.6B
MoE
General
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
TOO HEAVY
0/100
~0.1 tok/s
ctx
—
RUNS GREAT
RUNS WELL
DECENT
TOO HEAVY
✦ TurboQuant — 4× KV compression at 32K ctx