Patch release addressing a floating point exception crash in the gemma4:12b model and improving Windows installation for Hermes integrations.
Ollama
Ollama04.06.2026
Adds Cline CLI auto-installation, Qwen code integration, Radeon 8060S iGPU support, and Laguna architecture support. Includes security hardening for markdown URL handling, improved logging for troubleshooting, and various fixes for model loading and token counting.
Added support for the Gemma4-12B model, expanding the available model options for users.
Updates llama.cpp dependency and fixes Windows cleanup process by properly terminating llama-server. Known issue with gemma4:12b model causing crashes due to floating point exceptions.
Adds Codex App integration for desktop development workflows with built-in browser, code review features, and git support. Improves MLX sampler quality on Apple Silicon. Introduces new model recommendations for coding tasks.
Added vision model support with image inputs to ollama launch opencode command and fixed formatting issues with Claude tool results when using local image paths.
Adds llama.cpp engine for broader hardware support and GGUF model compatibility, but breaks text embedding behavior by converting inputs to lowercase instead of preserving mixed case.
Patch release improving MLX model handling with refined push behavior, enhanced image generation runner thread affinity, fixed inference timeouts, and resolved macOS target leakage issues. Also includes test hardening and update flow improvements.
Removes Claude Desktop from default launch command due to third-party integration limitations, adds API response caching for 6.7x performance improvement, and includes backup workflow enhancements and UI improvements.
Adds Gemma 4 MTP speculative decoding support for Mac MLX runner providing 2x speed improvements for coding tasks, updates MLX libraries with threading fixes, and bumps Go version to 1.26.
Adds Claude Desktop integration with launch command, introduces server-driven model recommendations in the app, fixes Windows gateway timeout issue, and improves Metal initialization error handling.
Added support for two new models: NVIDIA's Nemotron 3 Omni and Poolside's Laguna XS.2 coding model.
Minor update improving Gemma 4 model renderer for thinking and tool calling, enabling dynamic model recommendations without app updates, aligning desktop launch page with CLI integrations, and fixing Poolside integration title display.
Release candidate adds support for 'max' as a think parameter value in the API and maps OpenAI response reasoning effort to think functionality.
Minor update improving OpenClaw onboarding reliability, standardizing model ordering in launch command, and bundling web search plugin with OpenClaw integration.
Adds Kimi CLI integration for agentic execution tasks, improves MLX runner performance with logprobs support and faster sampling, enhances thread safety and tokenization, fixes GLM4 MoE Lite router performance, and resolves UI issues in macOS app model picker and Gemma 4 structured outputs.
Introduces Hermes Agent for adaptive workflow assistance, adds Gemma 4 MLX support for Apple Silicon with mixed-precision quantization, enhances ollama launch with GitHub Copilot CLI integration and improved configuration handling, fixes OpenClaw non-interactive setup and various build issues includ
Patch release that improves quality for specific Gemma model variants when thinking mode is disabled and updates ROCm to version 7.2.1 on Linux systems.
Enhances Gemma 4 tool calling with Google's latest fixes, improves parallel tool calling for streaming responses, adds Hermes agent integration documentation, and resolves image attachment errors in the Ollama app.
Adds OpenClaw messaging channel integration via new launch command, enables flash attention for Gemma 4 models on compatible GPUs, improves OpenCode detection for curl-based installations, and fixes save command issues with safetensors-imported models.
Minor update enhancing Gemma 4 tool calling functionality, adding new models to the application interface, and resolving issues with OpenClaw TUI launcher.
Performance improvements for M5 chips using NAX optimization and enabled flash attention for Gemma4 model to enhance processing efficiency.
Changed default app home view from launch screen to new chat interface for improved user experience.
Patch release with benchmarking improvements, Gemma4 model parsing fixes for quoted strings and tool calls, GGML optimizations for CUDA batched operations, flash attention enablement for Gemma4, and ROCm build fixes.
Adds support for new Gemma 4 model variants including 2B, 4B, 26B mixture of experts, and 31B dense models. Includes documentation updates, MLX tokenizer improvements, and SentencePiece-style BPE tokenizer support.
Ollama v0.19.0 introduces MLX framework support for Apple Silicon devices in preview, improving performance through unified memory architecture. Includes fixes for model loading issues, KV cache improvements, tool call parsing corrections, and resolves app notification bugs.
Release candidate addressing memory leak in MLX KV cache, disabling flash attention for Grok model, improving MLX snapshot scheduling during prefill, and updating VS Code documentation.
Adds Visual Studio Code integration through GitHub Copilot, allowing selection of local and cloud Ollama models within VS Code. Includes improvements to GLM parser for tool calls and OpenClaw integration for gateway checks.
Patch release improving OpenClaw integration with better dependency checks for npm and git installation, faster local Claude Code execution through cache optimization, fixed model parameter support for launch command, and corrected websearch package registration.
Adds web search and fetch capabilities to OpenClaw plugin, introduces non-interactive headless mode for ollama launch command with model specification requirements, and includes various improvements to benchmarking tools and OpenClaw integration.
Ollama 0.18.0 adds Nemotron-3-Super model, improves cloud model performance up to 10x faster, introduces non-interactive task support with --yes flag, integrates as OpenClaw provider, and requires driver updates for ROCm 7 support. Breaking change: cloud models no longer need manual pulling.