Ollama v0.19

Run local LLMs faster on Apple Silicon using MLX backend

Ollama v0.19 rebuilds Apple Silicon inference on the MLX framework, delivering significantly faster local model performance for coding and agent workflows. It adds NVFP4 quantization support and improves KV cache management with snapshots, reuse, and eviction for more responsive sessions. Designed for developers running open-source LLMs locally without cloud dependencies.

At a glance

Company: Ollama
Pricing: free
API available: Yes
Self-hostable: Yes
Launched: 2026-04
Last verified: 2026-05-11

Capabilities

local-inferencemodel-quantizationkv-cacheapi-accessmulti-model-supportagent-workflows

Alternatives

For AI agents: machine-readable markdown version of this page at /tools/ollama-v0-19.md, or send Accept: text/markdown.

Ollama v0.19

At a glance

Capabilities

Categories

Alternatives