← Home
Local AI inference engine
llama.cpp
The C/C++ foundation for local LLM inference — what most local AI tools build on.
Official siteAxes (0–100)
- Local control / custody94
- Open stack (models & tooling)96
- Regulatory posture (curated)35
- Interoperability78
Last reviewed: 2026-04-10
Facts (curated)
- Focus
- Minimal-dependency C/C++ LLM inference engine. llama-cli for chat, llama-server for OpenAI-compatible API.
- Ecosystem
- The engine underneath Ollama, LM Studio, LocalAI, and many others. GGUF is the de facto local model format.
- Backing
- Created by Georgi Gerganov. The ggml team joined Hugging Face in early 2026. MIT licensed.
Pros
Widest hardware support (CUDA, Vulkan, Metal, CPU) and best raw performance. The foundation everything else builds on.
Cons / risks
Requires building from source or manual setup for full control. Higher barrier to entry than wrapper tools like Ollama.
Related links
- llama.cpp — LLM inference in C/C++
GitHub · 2026-04-09