Ollama v0.19
Run local LLMs faster on Apple Silicon using MLX backend
Visit Ollama v0.19 →Ollama v0.19 rebuilds Apple Silicon inference on the MLX framework, delivering significantly faster local model performance for coding and agent workflows. It adds NVFP4 quantization support and improves KV cache management with snapshots, reuse, and eviction for more responsive sessions. Designed for developers running open-source LLMs locally without cloud dependencies.
At a glance
Capabilities
Categories
Alternatives
For AI agents: machine-readable markdown version of this page at
/tools/ollama-v0-19.md,
or send Accept: text/markdown.