Thesis

Local-first AI — comparative lens

This page compares the local AI landscape in two categories that match how the market actually works: SDKs & frameworks for developers building with local inference, and desktop appsfor end-users running models on their own hardware. Each category has its own radar, axes, and matrix — so you're always comparing apples to apples.

0–100 scaleOverall = mean of six axesIllustrative dataset

Two categories

SDKs/frameworks and desktop apps — compared separately.

Profile

Radar maps the six-axis trade-off surface per category.

Backing

Each entry shows who's behind it — corporate, community, or indie.

Context

Matrix adds one operational snapshot per entry + dossier link.

View analysis

Compare

SDKs & Frameworks — axes

Six axes (0–100 each). Radar = shape; legend = composite mean; matrix = snapshot + dossier link.

On-device perf

Raw inference speed and efficiency on consumer hardware.

Hardware breadth

Range of supported platforms: CPU, GPU, Apple Silicon, etc.

API maturity

Quality and compatibility of APIs (OpenAI-compatible, SDKs).

Ecosystem

Community size, integrations, and third-party tooling.

Setup simplicity

How easy it is to install and get running.

Backing

Strength of corporate or organizational support behind the project.

Radar & legend

Overall ranking (0-100)

1Ollama
82.5
2llama.cpp
75.2
3vLLM
74.8
4LocalAI
72.7
5QVAC SDK
70.5
6MLX
68.8
On-device perf·Hardware breadth·API maturity·Ecosystem·Setup simplicity·Backing

Matrix

#1

Ollama

One-command local LLM runtime · OpenAI-compatible API

82.5
Open full dossier

Open-source CLI/runtime that downloads and serves LLMs locally. Powers many downstream tools and integrations.

#2

llama.cpp

C/C++ inference engine · the foundation layer

75.2
Open full dossier

Minimal-dependency C/C++ LLM inference. The engine underneath Ollama, LM Studio, and many others. GGUF format, broad hardware.

#3

vLLM

High-throughput inference · PagedAttention · production-grade

74.8
Open full dossier

Fast Python inference/serving library. Industry standard for GPU production workloads; Apple Silicon support via vllm-metal plugin.

Primarily designed for production GPU servers. Consumer-device use is possible but not the primary target.

#4

LocalAI

All-in-one OSS AI engine · 35+ backends · drop-in API

72.7
Open full dossier

Go-based engine supporting LLMs, vision, voice, image, and video — no GPU required. OpenAI and Anthropic API compatible.

#5

QVAC SDK (Tether)

On-device inference SDK · agent primitives · Tether-backed

70.5
Open full dossier

Tether's SDK for building local AI applications — models, agents, and value rails on consumer devices.

#6

MLX (Apple)

Apple Silicon ML framework · unified memory · Neural Accelerators

68.8
Open full dossier

Apple's open-source array framework for ML on Apple Silicon. Python, C++, and Swift APIs optimized for unified memory.

Sources

Primary links (SDKs, apps & comparators)

Curated primaries for entities above — swap in RSS-fed rows when you wire the Action. Last curated: 2026-05-25.