← Home

Local AI inference engine

llama.cpp

The C/C++ foundation for local LLM inference — what most local AI tools build on.

Official site

Axes (0–100)

  • Local control / custody94
  • Open stack (models & tooling)96
  • Regulatory posture (curated)35
  • Interoperability78

Last reviewed: 2026-04-10

Facts (curated)

Focus
Minimal-dependency C/C++ LLM inference engine. llama-cli for chat, llama-server for OpenAI-compatible API.
Ecosystem
The engine underneath Ollama, LM Studio, LocalAI, and many others. GGUF is the de facto local model format.
Backing
Created by Georgi Gerganov. The ggml team joined Hugging Face in early 2026. MIT licensed.

Pros

  • Widest hardware support (CUDA, Vulkan, Metal, CPU) and best raw performance. The foundation everything else builds on.

Cons / risks

  • Requires building from source or manual setup for full control. Higher barrier to entry than wrapper tools like Ollama.

Related links