Ollama v0.19

Run local LLMs faster on Apple Silicon using MLX backend

Visit Ollama v0.19 →

Ollama v0.19 rebuilds Apple Silicon inference on the MLX framework, delivering significantly faster local model performance for coding and agent workflows. It adds NVFP4 quantization support and improves KV cache management with snapshots, reuse, and eviction for more responsive sessions. Designed for developers running open-source LLMs locally without cloud dependencies.

At a glance

Company
Ollama
Pricing
free
API available
Yes
Self-hostable
Yes
Launched
2026-04
Last verified
2026-05-11

Capabilities

local-inferencemodel-quantizationkv-cacheapi-accessmulti-model-supportagent-workflows

Categories

Alternatives

For AI agents: machine-readable markdown version of this page at /tools/ollama-v0-19.md, or send Accept: text/markdown.