← Blog/Introducing Runyard Compare: Head-to-Head GPU Benchmarks with TurboQuant
product
Runyard Team
@runyard_dev
5 min read

Tags

#compare#turboquant#gpu#local-llm#feature#runyard
Runyard.dev — Find AI Models That Run on Your Hardware

Introducing Runyard Compare: Head-to-Head GPU Benchmarks with TurboQuant

Two GPUs side by side for head-to-head comparison
Pick any two devices and see exactly which one runs your AI models better — with and without TurboQuant.

Runyard now has a Compare page. Pick Device A and Device B from a catalogue of 160+ GPUs — Apple Silicon, NVIDIA RTX 50/40/30, AMD RX 7000/9000, Intel Arc, cloud rigs — and see a live score for every LLM across quality, speed, fit, and context. Toggle TurboQuant on Device B with a single switch and watch the scores update in real time. No spreadsheet. No guesswork.

What the Compare Page Does

The Compare page scores every model in the Runyard catalogue — 100+ LLMs — simultaneously across two devices you configure. Each device has a composite score (0–100) built from four dimensions: model quality, inference speed, memory fit, and context window headroom. The winner for each model row is highlighted. Sort by score, speed, model size, or name. Filter by use case. Click any row to see a detailed side-by-side breakdown with dimension bars, donut score, max context, quantization, and cloud run options.

  • 160+ GPUs in the dropdown — Apple M1 through M5 Ultra, RTX 5090 down to GTX 1050, AMD RX 9000, Intel Arc, mobile SoCs
  • Composite score updates live as you change VRAM, RAM, cores, backend, and bandwidth
  • Sort by score / speed / model size / name with ascending/descending toggle
  • Use-case filter pills — General, Coding, Reasoning, Creative, Multilingual, and more
  • Click any model row → full detail panel opens with donut chart, dimension bars, context bar, HuggingFace and Ollama links, cloud provider marquee

The TurboQuant Toggle — and Why It Matters

Device B has a TurboQuant toggle. Flip it on and every score recalculates immediately. TurboQuant (Zandieh et al., ICLR 2026) applies 4× KV cache compression — meaning Device B can run 4× longer context on the same VRAM. That context boost feeds directly into the composite score: higher context headroom raises the Context dimension score, which flows into the overall composite. Models that were ties become Device B wins. Models that were marginal become viable.

Example: RTX 3060 vs M2 Pro — Composite Score Shift with TurboQuant ON
RTX 3060 (normal)
68score / 100
M2 Pro (normal)
71score / 100
M2 Pro + TurboQuant
79score / 100

When TurboQuant is active, Device B's context score recalculates using the TQ-expanded window (up to 4× the hardware max, capped at the model's spec limit). The composite score reflects that gain. The green glow on Device B's column and the ✦ marker on winning rows make it immediately obvious which models flip from "tie" to "B wins" once TQ is on.

Device A shows a red "No TurboQuant" callout when TQ is active on Device B — so you can see exactly how much context headroom Device A is leaving on the table.

How to Use It

  1. 1.Go to runyard.dev/compare
  2. 2.Select your GPU from the Device A dropdown (or use auto-detected hardware)
  3. 3.Select the GPU you're comparing against in Device B
  4. 4.Toggle TurboQuant on Device B to see the score boost
  5. 5.Click any model row to open the full detail panel
  6. 6.Sort by score to see which device wins overall

Compare your GPU against any other device — with TurboQuant on and off.

Open Runyard Compare →

RUNYARD.DEV

Hardware-aware AI model discovery. Know exactly what runs on your machine — before you download.

© 2026 RUNYARD.DEV — All rights reserved.

Built for local AI.

Tools

Try Runyard

Find AI models that fit your exact hardware. Enter your specs and get a ranked list instantly.

Newsletter