Update dependency ollama/ollama to v0.24.0
This MR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| ollama/ollama | minor | 0.12.11 → 0.24.0 |
|
| ollama/ollama | ironbank-github | minor | v0.12.11 → v0.24.0 |
Release Notes
ollama/ollama (ollama/ollama)
v0.24.0
Codex App
Ollama 0.24 includes support for the Codex App, OpenAI's desktop experience for working on Codex threads in parallel with built-in worktree support and git functionality.
ollama launch codex-appBuilt-in browser
Codex can load local servers and sites in its built-in browser, enabling you to directly annotate on the page to request changes.
Review mode
Review code inside the app, leave comments, and iterate without leaving your workspace.
Choosing a model
For difficult coding and agentic tasks:
- kimi-k2.6 (with vision support)
- glm-5.1
For local use without an Ollama Cloud subscription:
- nemotron-3-super
- gemma4:31b
- qwen3.6
Restore anytime
To restore the previous configuration of Codex App, run:
ollama launch codex-app --restoreWhat's Changed
- Reworked the MLX sampler for improved generation quality on Apple Silicon
Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0...v0.24.0
v0.23.4
What's Changed
ollama launch opencodenow supports vision models with image inputs- Fixed formatting of Claude tool results when using local image paths
Full Changelog: https://github.com/ollama/ollama/compare/v0.23.3...v0.23.4
v0.23.3
What's Changed
- mlx: refined model push behavior by @dhiltgen in #15431
- test: integration test hardening by @dhiltgen in #13532
- app: harden update flows by @dhiltgen in #16100
- mlx: update the imagegen runner for mlx thread affinity by @pdevine in #16096
- mlx: avoid status timeout during inference by @dhiltgen in #16086
- mlx: fix macOS 26 target leakage in v3 metallib by @dhiltgen in #16053
Full Changelog: https://github.com/ollama/ollama/compare/v0.23.2...v0.23.3
v0.23.2
What's Changed
ollama launchno longer includes Claude Desktop due to the third-party integration being limited to Anthropic models.- Use
ollama launch claude-desktop --restoreto restore Claude Desktop to its normal state. /api/showresponses are now cached, improving median latency by ~6.7x which will increase load speed for integrations like VS Code.- Improved backup workflow when managing launch integrations
- Cleaner image generation layout in the MLX runner
Full Changelog: https://github.com/ollama/ollama/compare/v0.23.1...v0.23.2
v0.23.1
Gemma 4 MTP (Multi-token Processing) for the MLX runner
Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks.
ollama run gemma4:31b-coding-mtp-bf16What's Changed
- Update MLX and MLX-C with threading fixes by @dhiltgen in #15845
- go: bump to 1.26 by @ParthSareen in #15904
- Add Gemma 4 MTP speculative decoding by @pdevine in #15980
Full Changelog: https://github.com/ollama/ollama/compare/v0.23.0...v0.23.1
v0.23.0
Claude Desktop
Claude Desktop is now supported with Ollama Launch.
Claude Cowork and Claude Code are supported within the Claude Desktop App.
ollama launch claude-desktopClaude Cowork
Claude Code
Claude Code on the terminal can still be accessed through the CLI with:
ollama launch claudeNot supported yet
- Web Search (coming soon)
- Extensions
What's Changed
- Launch Claude Desktop with
ollama launch claude-desktop - The Ollama app now surfaces featured models from server-driven recommendations
- Fixed OpenClaw gateway timeout on Windows by enforcing IPv4 loopback (thanks @UniquePratham)
- Hardened Metal initialization to gracefully handle ggml kernel compilation failures
New Contributors
- @UniquePratham made their first contribution in #15726
Full Changelog: https://github.com/ollama/ollama/compare/v0.22.1...v0.23.0
v0.22.1
What's Changed
- Updated the Gemma 4 renderer for thinking and tool calling improvements
- Model recommendations are now updated without updating Ollama
- Aligned the desktop app's launch page with
ollama launchintegrations - Fixed the Poolside integration title in
ollama launch
Full Changelog: https://github.com/ollama/ollama/compare/v0.22.0...v0.22.1
v0.22.0
New models
- NVIDIA's Nemotron 3 Omni
- Poolside's first open-weight coding model - Laguna XS.2
Full Changelog: https://github.com/ollama/ollama/compare/v0.21.2...v0.22.0
v0.21.2
What's Changed
- Improved reliability of the OpenClaw onboarding flow in
ollama launch - Recommended models in
ollama launchnow appear in a fixed, canonical order - OpenClaw integration now bundles Ollama's web search plugin in OpenClaw
New Contributors
Full Changelog: https://github.com/ollama/ollama/compare/v0.21.1...v0.21.2
v0.21.1
What's Changed
Kimi CLI
You can now install and run the Kimi CLI through Ollama.
ollama launch kimi --model kimi-k2.6:cloudKimi CLI with Kimi K2.6 excels at long horizon agentic execution tasks through a multi-agent system.
- MLX runner adds logprobs support for compatible models
- Faster MLX sampling with fused top-P and top-K in a single sort pass, plus repeat penalties applied in the sampler
- Improved MLX prompt tokenization by moving tokenization into request handler goroutines
- Better MLX thread safety for array management
- GLM4 MoE Lite performance improvement with a fused sigmoid router head
- Fixed model picker showing stale model after switching chats in the macOS app
- Fixed structured outputs for Gemma 4 when
think=false
Full Changelog: https://github.com/ollama/ollama/compare/v0.21.0...v0.21.1
v0.21.0
Hermes Agent
ollama launch hermesHermes learns with you, automatically creating skills to better serve your workflows. Great for research and engineering tasks.
What's Changed
- Gemma 4 on MLX. Added support for running Gemma 4 via MLX on Apple Silicon, including a text-only MLX runtime for the model. The MLX backend also picked up mixed-precision quantization, better capability detection, and a batch of new op wrappers (Conv2d, Pad, activations, trig, masked SDPA, and RoPE-with-freqs).
- Hermes and GitHub Copilot CLI in
ollama launch. Added both integrations, which can now be configured in one command alongside the rest of the supported coding agents. - OpenCode moved to inline config.
ollama launch opencodenow writes its config inline rather than to a separate file, matching how other integrations are handled. ollama launchno longer rewrites config when nothing changed. Pressing → on a configured multi-model integration, or passing--modelwith the current primary, used to trigger a confirmation prompt and rewrite both the editor's config file andconfig.json. Now it's a no-op when the resolved model list matches what's already saved.- Fixed
ollama launch openclaw --yesso it correctly skips the channels configuration step, so non-interactive setups complete cleanly. - Restored the Gemma 4 nothink renderer with the e2b-style prompt.
- Fixed the Gemma 4 compiler error that was breaking Metal builds.
- Fixed macOS cross-compiles so they no longer trigger
generate, which was breaking cmake builds on some Xcode versions. - Quieted cgo builds by suppressing deprecated warnings during
go build.
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.7...v0.21.0
v0.20.7
What's Changed
- Fix quality of gemma:e2b and gemma:e4b when thinking is disabled
- ROCm: Update to ROCm 7.2.1 on Linux by @saman-amd in #15483
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.6...v0.20.7
v0.20.6
What's Changed
- Gemma 4 tool calling ability is improved and updated to use Google's latest post-launch fixes
- Parallel tool calling improved for streaming responses
- Hermes agent Ollama integration guide is now available
- Ollama app is updated to fix image attachment errors
New Contributors
@matteocelani made their first contribution in #15272
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.5...v0.20.6
v0.20.5
OpenClaw channel setup with ollama launch
What's Changed
- OpenClaw channel setup: connect WhatsApp, Telegram, Discord, and other messaging channels through
ollama launch openclaw - Enable flash attention for Gemma 4 on compatible GPUs
ollama launch opencodenow detects curl-based OpenCode installs at~/.opencode/bin- Fix
/savecommand for models imported from safetensors
New Contributors
- @sjhddh made their first contribution in https://github.com/ollama/ollama/pull/15424
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.4...v0.20.5
v0.20.4
What's Changed
- mlx: Improve M5 performance with NAX
- gemma4: enable flash attention
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.3...v0.20.4
v0.20.3
What's Changed
- Gemma 4 Tool Calling improvements
- Added latest models to Ollama App
- OpenClaw fixes for launching TUI
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.2...v0.20.3
v0.20.2
What's Changed
- app: default app home view to new chat instead of launch by @jmorganca in #15312
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.1...v0.20.2
v0.20.1
What's Changed
- bench: add prompt calibration, context size flag, and NumCtx reporting by @dhiltgen in #15158
- model/parsers: fix gemma4 arg parsing when quoted strings contain " by @drifkin in #15254
- ggml: skip cublasGemmBatchedEx during graph reservation by @jessegross in #15301
- gemma4: enable flash attention by @dhiltgen in #15296
- ggml: fix ROCm build for cublasGemmBatchedEx reserve wrapper by @jessegross in #15305
- model/parsers: rework gemma4 tool call handling by @drifkin in #15306
Full Changelog: https://github.com/ollama/ollama/compare/v0.20.0...v0.20.1
v0.20.0
Gemma 4
Effective 2B (E2B)
ollama run gemma4:e2bEffective 4B (E4B)
ollama run gemma4:e4b26B (Mixture of Experts model with 4B active parameters)
ollama run gemma4:26b31B (Dense)
ollama run gemma4:31bWhat's Changed
- docs: update pi docs by @ParthSareen in #15152
- mlx: respect tokenizer add_bos_token setting in pipeline by @dhiltgen in #15185
- tokenizer: add SentencePiece-style BPE support by @dhiltgen in #15162
Full Changelog: https://github.com/ollama/ollama/compare/v0.19.0...v0.20.0-rc0
v0.19.0
Ollama is now powered by MLX on Apple Silicon in preview
Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.
https://github.com/user-attachments/assets/600297b0-3167-46a5-8e3a-fefda3a51b84
Read more: https://ollama.com/blog/mlx
What's Changed
- Ollama's app will now no longer incorrectly show "model is out of date"
ollama launch pinow includes web search plugin that uses Ollama's web search- Improved KV cache hit rate when using the Anthropic-compatible API
- Fixed tool call parsing issue with Qwen3.5 where tool calls would be output in thinking
- MLX runner will now create periodic snapshots during prompt processing
- Fixed KV cache snapshot memory leak in MLX runner
- Fixed issue where flash attention would be incorrectly enabled for
grokmodels - Fixed
qwen3-next:80bnot loading in Ollama
New Contributors
Full Changelog: https://github.com/ollama/ollama/compare/v0.18.3...v0.19.0
v0.18.3
Visual Studio Code
Microsoft Visual Studio Code now directly integrates with Ollama via GitHub Copilot.
If you have Ollama installed, any local or cloud model from Ollama can be selected for use within visual studio code.
What's Changed
- GLM parser improvements for tool calls
- OpenClaw integration improvements for gateway checks
Full Changelog: https://github.com/ollama/ollama/compare/v0.18.2...v0.18.3
v0.18.2
What's Changed
- Add extra check to ensure
npmandgitare installed before installing OpenClaw - Claude Code will now be faster when run locally, due to preventing cache breakages
- Fix to correctly support
ollama launch openclaw --model <model> - Register Ollama's websearch package correctly for OpenClaw
Full Changelog: https://github.com/ollama/ollama/compare/v0.18.1...v0.18.2
v0.18.1
Web Search and Fetch in OpenClaw
Ollama now ships with web search and web fetch plugin for OpenClaw. This allows Ollama's models (local or cloud) to search the web for the latest content and news. This also allows OpenClaw with Ollama to be able to fetch the web and extract readable content for processing. This feature does not execute JavaScript.
When using local models with web search in OpenClaw, ensure you are signed into Ollama with ollama signin
ollama launch openclawYou can install web search directly into OpenClaw as a plugin if you already have OpenClaw configured and working:
Ollama web search plugin
openclaw plugins install @​ollama/openclaw-web-searchNon-interactive (headless) mode for ollama launch
ollama launch can now run in non-interactive mode.
Perfect for:
-
Docker/containers: spin up an integration as a pipeline step to run evals, test prompts, or validate model behavior as part of your build. Tear it down when the job ends.
-
CI/CD: Generate code reviews, security checks, and other tasks within your CI
-
Scripts/automation: Kick off automated tasks with Ollama and claude code
-
--modelmust be specified to run in headless mode -
--yesflag will auto-pull the model and skip any selectors
Try with: ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"
Use non-interactive mode in OpenClaw
You can ask your OpenClaw to run tasks using claude with subagents:
ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?" using a subagentWhat's Changed
ollama launch openclawwill now use the official Ollama auth and model provider for OpenClaw- Improvements to Ollama's benchmarking tool in
./cmd/bench ollama launch openclawwill now skip--install-daemonwhen systemd is unavailable
Full Changelog: https://github.com/ollama/ollama/compare/v0.18.0...v0.18.1
v0.18.0
Ollama 0.18 includes improved performance for OpenClaw and Ollama’s cloud models, including the new Nemotron-3-Super model by NVIDIA designed for high-performance agentic reasoning tasks.
Improved OpenClaw performance with Kimi-K2.5
This release of Ollama improves performance of cloud models and their reliability.
- Up to 2x faster speeds with Kimi-K2.5
- Tool calling accuracy has been improved
ollama launch openclaw --model kimi-k2.5Ollama is now a provider in OpenClaw
Ollama can now be selected as an authentication and model provider during OpenClaw onboarding (thanks @BruceMacD for contributing and @steipete for reviewing!)
openclaw onboard --auth-choice ollamaMore information: https://docs.openclaw.ai/providers/ollama
Nemotron-3-Super
Nemotron-3-Super: is a new 122B parameter model with strong reasoning and tool calling capability, while having top performance when run on modern hardware:
ollama run nemotron-3-super:cloudollama run nemotron-3-superto run locally (requires 96GB+ of VRAM)
Nemotron-3-Super scores highest of any open model on PinchBench, a benchmark suite that measures how successful models are at completing tasks when used with OpenClaw.
ollama launch openclaw --model nemotron-3-super:cloudOr using OpenClaw’s onboarding:
openclaw onboard \
--auth-choice ollama \
--custom-model-id nemotron-3-super:cloudNon-interactive task support
ollama launch now supports non-interactive tasks by passing in --yes. This enables using Claude, Codex, Pi and more in scripts, GitHub Actions, and other non-interactive environments.
ollama launch claude \
--model glm-5:cloud \
--yes \
-- "Do a quick code review of this pull request and respond on GitHub with a comment summarizing your feedback."Lower latency on MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud
For customers in North America, MiniMax-M2.5 and Qwen3.5 on Ollama’s cloud now respond much faster, up to 10x and up to 2x faster respectively, and often in less than a second. This is ideal for tasks that require a fast Time To First Token (TTFT) when needing quick answers from OpenClaw or quick back-to-back coding tasks.
ollama launch claude --model minimax-m2.5Driver updates required for ROCm 7
This version of Ollama ships with ROCm 7, and requires updating drivers to the latest version for continued support.
What's Changed
- Ollama's cloud models no longer require downloading via
ollama pull. Setting:cloudas a tag will now automatically connect to cloud models. - New
--yesflag forollama launchthat skips all prompts, making it possible to run AI assistants and other tools in non-interactive environments - Fixed issue where "Reset to Defaults" in Ollama's app would disable downloading automatic updates.
- Ollama will now ensure context compaction occurs at the correct context length for each model when using
ollama launch claude
New Contributors
- @flipbit03 made their first contribution in #14821
- @shivamtiwari3 made their first contribution in #14825
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.7...v0.18.0
v0.17.7
What's Changed
- Allow thinking levels such as
"medium"to correctly interpreted in Ollama's API for all thinking models - Add context length to support compaction when using
ollama launch
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.6...v0.17.7
v0.17.6
What's Changed
- Fixed issue where GLM-OCR would not work due to incorrect prompt rendering
- Fixed tool calling parsing and rendering for Qwen 3.5 models
New Contributors
- @Victor-Quqi made their first contribution in #14584
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.5...v0.17.6
v0.17.5
New models
- Qwen3.5: the small Qwen 3.5 model series is now available in 0.8B, 2B, 4B and 9B parameter sizes.
What's Changed
- Fixed crash in Qwen 3.5 models when split over GPU & CPU
- Fixed issue where Qwen 3.5 models would repeat themselves due to no presence penalty (note: you may have to redownload the
qwen3.5models:ollama pull qwen3.5:35bfor example) ollama run --verbosewill now show peak memory usage when using Ollama's MLX engine- Fixed memory issues and crashes in MLX runner
- Fixed issue where Ollama would not be able to run models imported from Qwen3.5 GGUF files
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.4...v0.17.5
v0.17.4
New models
- Qwen 3.5: a family of open-source multimodal models that delivers exceptional utility and performance.
- LFM 2: LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.
Note: for users on 0.17.1, this version will not automatically update. Re-downloading is required to receive the latest version of Ollama.
What's Changed
- Tool call indices will now be included in parallel tool calls
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.3...v0.17.4
v0.17.3
What's Changed
- Fixed issue where tool calls in the Qwen 3 and Qwen 3.5 model families would not be parsed correctly if emitted during thinking
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.2...v0.17.3
v0.17.2
What's Changed
- Fixed issue where Ollama's app on Windows would crash when a new update has been downloaded
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.1...v0.17.2
v0.17.1
What's Changed
- Nemotron architecture support in Ollama's engine
- MLX engine now has improved memory usage
- Ollama's app will now allow models that support tools to use web search capabilities
- Improved LFM2 and LFM2.5 models in Ollama's engine
ollama createwill no longer default to affine quantization for unquantized models when using the MLX engine- Added configuration for disabling automatic update downloading
Full Changelog: https://github.com/ollama/ollama/compare/v0.17.0...v0.17.1
v0.17.0
OpenClaw
OpenClaw can now be installed and configured automatically via Ollama, making it the easiest way to get up and running with OpenClaw with open models like Kimi-K2.5, GLM-5, and Minimax-M2.5.
Get started
ollama launch openclaw
Web search in OpenClaw
When using cloud models, websearch is enabled - allowing OpenClaw to search the internet.
What's Changed
- Improved tokenizer performance
- Ollama's macOS and Windows apps will now default to a context length based on available VRAM
New Contributors
- @natl-set made their first contribution in https://github.com/ollama/ollama/pull/14322
Full Changelog: https://github.com/ollama/ollama/compare/v0.16.3...v0.17.0
v0.16.3
What's Changed
- New
ollama launch clineadded for the Cline CLI ollama launch <integration>will now always show the model picker- Added Gemma 3, Llama and Qwen 3 architectures to MLX runner
New Contributors
- @hellosaumil made their first contribution in #14271
Full Changelog: https://github.com/ollama/ollama/compare/v0.16.2...v0.16.3
v0.16.2
What's Changed
ollama launch claudenow supports searching the web when using:cloudmodels- Fixed rendering issue when running
ollamain PowerShell - New setting in Ollama's app makes it easier to disable cloud models for sensitive and private tasks where data cannot leave your computer. For Linux or when running
ollama servemanually, setOLLAMA_NO_CLOUD=1. - Fixed issue where experimental image generation models would not run in 0.16.0 and 0.16.1
Full Changelog: https://github.com/ollama/ollama/compare/v0.16.1...v0.16.2-rc0
v0.16.1
What's Changed
- Installing Ollama via the
curlinstall script on macOS will now only prompt for your password if its required - Installing Ollama via the
ieminstall script in Windows will now show progress - Image generation models will now respect the
OLLAMA_LOAD_TIMEOUTvariable
Full Changelog: https://github.com/ollama/ollama/compare/v0.16.0...v0.16.1
v0.16.0
New models
- GLM-5: A strong reasoning and agentic model from Z.ai with 744B total parameters (40B active), built for complex systems engineering and long-horizon tasks.
- MiniMax-M2.5: a new state-of-the-art large language model designed for real-world productivity and coding tasks.
New ollama
The new ollama command makes it easy to launch your favorite apps with models using Ollama
What's Changed
- Launch Pi with
ollama launch pi - Improvements to Ollama's MLX runner to support GLM-4.7-Flash
- Ctrl+G will now allow for editing text prompts in a text editor when running a model
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.6...v0.16.0
v0.15.6
What's Changed
- Fixed context limits when running
ollama launch droid ollama launchwill now download missing models instead of erroring- Fixed bug where
ollama launch claudewould cause context compaction when providing images
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.5...v0.15.6
v0.15.5
New models
- Qwen3-Coder-Next: a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development.
- GLM-OCR: GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture.
Improvements to ollama launch
ollama launchcan now be provided arguments, for exampleollama launch claude -- --resumeollama launchwill now work run subagents when usingollama launch claude- Ollama will now set context limits for a set of models when using
ollama launch opencode
What's Changed
- Sub-agent support for
ollama launchfor planning, deep research, and similar tasks ollama signinwill now open a browser window to make signing in easier- Ollama will now default to the following context lengths based on VRAM:
- < 24 GiB VRAM: 4,096 context
- 24-48 GiB VRAM: 32,768 context
- >= 48 GiB VRAM: 262,144 context
- GLM-4.7-Flash support on Ollama's experimental MLX engine
ollama signinwill now open the browser to the connect page- Fixed off by one error when using
num_predictin the API - Fixed issue where tokens from a previous sequence would be returned when hitting
num_predict
New Contributors
- @avukmirovich made their first contribution in #13934
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.4...v0.15.5
v0.15.4
What's Changed
ollama launch openclawwill now enter the standard OpenClaw onboarding flow if this has not yet been completed.
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.3...v0.15.4
v0.15.3
What's Changed
- Renamed
ollama launch clawdbottoollama launch openclawto reflect the project's new name - Improved tool calling for Ministral models
- docs: add clawdbot by @ParthSareen in #13925
- cmd/config: Use envconfig.Host() for base API in launch config packages by @gabe-l-hart in #13937
ollama launchwill now use the value ofOLLAMA_HOSTwhen running it
New Contributors
- @MBerguer made their first contribution in #13971
- @taronsung made their first contribution in #13965
- @noureldin-azzab made their first contribution in #13961
- @dhirajlochib made their first contribution in #13645
- @ThanhNguyxn made their first contribution in #13979
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.2...v0.15.3
v0.15.2
What's Changed
- New
ollama launch clawdbotcommand for launching Clawdbot using Ollama models
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.1...v0.15.2
v0.15.1
What's Changed
- GLM-4.7-Flash performance and correctness improvements, fixing repetitive answers and tool calling quality
- Fixed performance issues on macOS and arm64 Linux
- Fixed issue where
ollama launchwould not detectclaudeand would incorrectly updateopencodeconfigurations
New Contributors
- @stillhart made their first contribution in #13855
Full Changelog: https://github.com/ollama/ollama/compare/v0.15.0...v0.15.1
v0.15.0
ollama launch
A new ollama launch command to use Ollama's models with Claude Code, Codex, OpenCode, and Droid without separate configuration.
What's Changed
- New
ollama launchcommand for Claude Code, Codex, OpenCode, and Droid - Fixed issue where creating multi-line strings with
"""would not work when usingollama run - Ctrl+J and Shift+Enter now work for inserting newlines in
ollama run - Reduced memory usage for GLM-4.7-Flash models
v0.14.3
- Z-Image Turbo: 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates high-quality photorealistic images.
- Flux.2 Klein: Black Forest Labs’ fastest image-generation models to date.
New models
- GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
- LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.
What's Changed
- Fixed issue where Ollama's macOS app would interrupt system shutdown
- Fixed
ollama createandollama showcommands for experimental models - The
/api/generateAPI can now be used for image generation - Fixed minor issues in Nemotron-3-Nano tool parsing
- Fixed issue where removing an image generation model would cause it to first load
- Fixed issue where
ollama rmwould only stop the first model in the list if it were running
Full Changelog: https://github.com/ollama/ollama/compare/v0.14.2...v0.14.3
v0.14.2
New models
- TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.
What's Changed
- Shift + Enter (or Ctrl + j) will now enter a newline in Ollama's CLI
- Improve
/v1/responsesAPI to better confirm to OpenResponses specification
New Contributors
- @yuhongsun96 made their first contribution in #13135
- @koaning made their first contribution in #13326
Full Changelog: https://github.com/ollama/ollama/compare/v0.14.1...v0.14.2
v0.14.1
Image generation models (experimental)
Experimental image generation models are available for macOS and Linux (CUDA) in Ollama:
Available models
ollama run x/z-image-turboNote:
xis a username on ollama.com where experimental models are uploaded
More models coming soon:
- Qwen-Image-2512
- Qwen-Image-Edit-2511
- GLM-Image
What's Changed
- fix macOS auto-update signature verification failure
New Contributors
- @joshxfi made their first contribution in #13711
- @maternion made their first contribution in #13709
Full Changelog: https://github.com/ollama/ollama/compare/v0.14.0...v0.14.1
v0.14.0
What's Changed
ollama run --experimentalCLI will now open a new Ollama CLI that includes an agent loop and thebashtool- Anthropic API compatibility: support for the
/v1/messagesAPI - A new
REQUIREScommand for theModelfileallows declaring which version of Ollama is required for the model - For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
- More accurate VRAM measurements for AMD iGPUs
- Ollama's app will now highlight swift source code
- An error will now return when embeddings return
NaNor-Inf - Ollama's Linux install bundles files now use
zstcompression - New experimental support for image generation models, powered by MLX
New Contributors
- @Vallabh-1504 made their first contribution in #13550
- @majiayu000 made their first contribution in #13596
- @harrykiselev made their first contribution in #13615
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.5...v0.14.0-rc2
v0.13.5
New Models
- Google's FunctionGemma a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.
What's Changed
bertarchitecture models now run on Ollama's engine- Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
- Fixed issue where nested properties in tools may not have been rendered properly
New Contributors
- @familom made their first contribution in #13220
- @nathannewyen made their first contribution in #13469
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.4...v0.13.5
v0.13.4
New Models
- Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
- Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
What's Changed
- Enable Flash Attention automatically for models by default
- Fixed handling of long contexts with Gemma 3 models
- Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture
New Contributors
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.3...v0.13.4-rc0
v0.13.3
New models
- Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
- rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
- nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.
What's Changed
- Improved truncation logic when using
/api/embedand/v1/embeddings - Extend Gemma 3 architecture to support rnj-1 model
- Fix error that would occur when running qwen2.5vl with image input
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.2...v0.13.3
v0.13.2
New models
- Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.
What's Changed
- Flash attention is now enabled by default for vision models such as
mistral-3,gemma3,qwen3-vland more. This improves memory utilization and performance when providing images as input. - Fixed GPU detection on multi-GPU CUDA machines
- Fixed issue where
deepseek-v3.1would always think even with thinking is disabled in Ollama's app
New Contributors
- @chengcheng84 made their first contribution in #13265
- @nathan-hook made their first contribution in #13256
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.1...v0.13.2
v0.13.1
New models
- Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
- Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
What's Changed
nomic-embed-textwill now use Ollama's engine by default- Tool calling support for
cogito-v2.1 - Fixed issues with CUDA VRAM discovery
- Fixed link to docs in Ollama's app
- Fixed issue where models would be evicted on CPU-only systems
- Ollama will now better render errors instead of showing
Unmarshal:errors - Fixed issue where CUDA GPUs would fail to be detected with older GPUs
- Added thinking and tool parsing for cogito-v2.1
New Contributors
- @EntropyYue made their first contribution in #13237
- @kokes made their first contribution in #13231
Full Changelog: https://github.com/ollama/ollama/compare/v0.13.0...v0.13.1
v0.13.0
New models
- DeepSeek-OCR: DeepSeek-OCR uses optical 2D mapping to compress long contexts, achieving high OCR precision with reduced vision tokens and demonstrating practical value in document processing.
- Cogito-V2.1: instruction tuned generative models, currently the best open-weight LLM by a US company
DeepSeek-OCR
DeepSeek-OCR is now available on Ollama. Example inputs:
ollama run deepseek-ocr "/path/to/image\n<|grounding|>Given the layout of the image."ollama run deepseek-ocr "/path/to/image\nFree OCR."ollama run deepseek-ocr "/path/to/image\nParse the figure."ollama run deepseek-ocr "/path/to/image\nExtract the text in the image."ollama run deepseek-ocr "/path/to/image\n<|grounding|>Convert the document to markdown."New bench tool
Ollama's GitHub repo now includes a bench tool that can be used to test model performance. For the time being this is a separate tool that can be built in the Ollama GitHub repository:
First, install Go. Then from the root of the Ollama repository run:
go run ./cmd/bench -model gpt-oss:20bFor more information see the tool's documentation
What's Changed
- DeepSeek-OCR is now supported
- DeepSeek-V3.1 architecture is now supported in Ollama's engine
- Fixed performance issues that arose in Ollama 0.12.11 on CUDA
- Fixed issue where Linux install packages were missing required Vulkan libraries
- Improved CPU and memory detection while in containers/cgroups
- Improved VRAM information detection for AMD GPUs
- Improved KV cache performance to no longer require defragmentation
New Contributors
- @lnicola made their first contribution in #13096
- @vignesh1507 made their first contribution in #13078
- @pierwill made their first contribution in #12995
- @jjuliano made their first contribution in #11877
- @omahs made their first contribution in #10683
- @SiLeader made their first contribution in #10292
- @ssam18 made their first contribution in #13124
- @seolyam made their first contribution in #13116
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.11...v0.13.0
Configuration
- Branch creation
- At any time (no schedule defined)
- Automerge
- At any time (no schedule defined)
- If you want to rebase/retry this MR, check this box
This MR has been generated by Mend Renovate.