Update dependency ollama/ollama to v0.12.3
This MR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| ollama/ollama | minor |
v0.9.0 -> v0.12.3
|
|
| ollama/ollama | ironbank-github | minor |
v0.9.0 -> v0.12.3
|
Release Notes
ollama/ollama (ollama/ollama)
v0.12.3
New models
-
DeepSeek-V3.1-Terminus: DeepSeek-V3.1-Terminus is a hybrid model that supports both thinking mode and non-thinking mode. It delivers more stable & reliable outputs across benchmarks compared to the previous version:
Run on Ollama's cloud:
ollama run deepseek-v3.1:671b-cloudRun locally (requires 500GB+ of VRAM)
ollama run deepseek-v3.1 -
Kimi-K2-Instruct-0905: Kimi K2-Instruct-0905 is the latest, most capable version of Kimi K2. It is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.
ollama run kimi-k2:1t-cloud
What's Changed
- Fixed issue where tool calls provided as stringified JSON would not be parsed correctly
-
ollama pushwill now provide a URL to follow to sign in - Fixed issues where qwen3-coder would output unicode characters incorrectly
- Fix issue where loading a model with
/loadwould crash
New Contributors
- @gr4ceG made their first contribution in https://github.com/ollama/ollama/pull/12385
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.2...v0.12.3
v0.12.2
Web search
A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy.
What's Changed
- Models with Qwen3's architecture including MoE now run in Ollama's new engine
- Fixed issue where built-in tools for gpt-oss were not being rendered correctly
- Support multi-regex pretokenizers in Ollama's new engine
- Ollama's new engine can now load tensors by matching a prefix or suffix
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.1...v0.12.2
v0.12.1
New models
- Qwen3 Embedding: state of the art open embedding model by the Qwen team
What's Changed
- Qwen3-Coder now supports tool calling
- Ollama's app will now longer show "connection lost" in error when connecting to cloud models
- Fixed issue where Gemma3 QAT models would not output correct tokens
- Fix issue where
&characters in Qwen3-Coder would not be parsed correctly when function calling - Fixed issues where
ollama signinwould not work properly on Linux
Full Changelog: https://github.com/ollama/ollama/compare/v0.12.0...v0.12.1
v0.12.0
Cloud models
Cloud models are now available in preview, allowing you to run a group of larger models with fast, datacenter-grade hardware.
To run a cloud model, use:
ollama run qwen3-coder:480b-cloud
What's Changed
- Models with the Bert architecture now run on Ollama's engine
- Models with the Qwen 3 architecture now run on Ollama's engine
- Fix issue where older NVIDIA GPUs would not be detected if newer drivers were installed
- Fixed issue where models would not be imported correctly with
ollama create - Ollama will skip parsing the initial
<think>if provided in the prompt for /api/generate by @rick-github
New Contributors
- @egyptianbman made their first contribution in https://github.com/ollama/ollama/pull/12300
- @russcoss made their first contribution in https://github.com/ollama/ollama/pull/12280
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.11...v0.12.0
v0.11.11
What's Changed
- Support for CUDA 13
- Improved memory usage when using gpt-oss in Ollama's app
- Better scrolling better in Ollama's app when submitting long prompts
- Cmd +/- will now zoom and shrink text in Ollama's app
- Assistant messages can now by copied in Ollama's app
- Fixed error that would occur when attempting to import satefensor files by @rick-github in https://github.com/ollama/ollama/pull/12176
- Improved memory estimates for hybrid and recurrent models by @gabe-l-hart in https://github.com/ollama/ollama/pull/12186
- Fixed error that would occur when when batch size was greater than context length
- Flash attention & KV cache quantization validation fixes by @jessegross in https://github.com/ollama/ollama/pull/12231
- Add
dimensionsfield to embed requests by @mxyng in https://github.com/ollama/ollama/pull/12242 - Enable new memory estimates in Ollama's new engine by default by @jessegross in https://github.com/ollama/ollama/pull/12252
- Ollama will no longer load split vision models in the Ollama engine by @jessegross in https://github.com/ollama/ollama/pull/12241
New Contributors
- @KashyapTan made their first contribution in https://github.com/ollama/ollama/pull/12188
- @carbonatedWaterOrg made their first contribution in https://github.com/ollama/ollama/pull/12230
- @fengyuchuanshen made their first contribution in https://github.com/ollama/ollama/pull/12249
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.10...v0.11.11
v0.11.10
New models
- EmbeddingGemma a new open embedding model that delivers best-in-class performance for its size
What's Changed
- Support for EmbeddingGemma
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.9...v0.11.10
v0.11.9
What's Changed
- Improved performance via overlapping GPU and CPU computations
- Fixed issues where unrecognized AMD GPU would cause an error
- Reduce crashes due to unhandled errors in some Mac and Linux installations of Ollama
New Contributors
- @alpha-nerd-nomyo made their first contribution in https://github.com/ollama/ollama/pull/12129
- @pxwanglu made their first contribution in https://github.com/ollama/ollama/pull/12123
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.8...v0.11.9-rc0
v0.11.8
What's Changed
-
gpt-ossnow has flash attention enabled by default for systems that support it - Improved load times for
gpt-oss
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.7...v0.11.8
v0.11.7
DeepSeek-V3.1
DeepSeek-V3.1 is now available to run via Ollama.
This model supports hybrid thinking, meaning thinking can be enabled or disabled by setting think in Ollama's API:
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-v3.1",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"think": true
}'
In Ollama's CLI, thinking can be enabled or disabled by running the /set think or /set nothink commands.
Turbo (in preview)
DeepSeek-V3.1 has over 671B parameters, and so a large amount of VRAM is required to run it. Ollama's Turbo mode (in preview) provides access to powerful hardware in the cloud you can use to run the model.
Turbo via Ollama's app
- Download Ollama for macOS or Windows
- Select
deepseek-v3.1:671bfrom the model selector - Enable Turbo
Turbo via Ollama's CLI and libraries
- Create an account on ollama.com/signup
- Follow the docs for Ollama's CLI to upload authenticate your Ollama installation
- Run the following:
OLLAMA_HOST=ollama.com ollama run deepseek-v3.1
For instructions on using Turbo with Ollama's Python and JavaScript library, see the docs
What's Changed
- Fixed issue where multiple models would not be loaded on CPU-only systems
- Ollama will now work with models who skip outputting the initial
<think>tag (e.g. DeepSeek-V3.1) - Fixed issue where text would be emitted when there is no opening
<think>tag from a model - Fixed issue where tool calls containing
{or}would not be parsed correctly
New Contributors
- @zoupingshi made their first contribution in https://github.com/ollama/ollama/pull/12028
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.6...v0.11.7
v0.11.6
What's Changed
- Ollama's app will now switch between chats faster
- Improved layout of messages in Ollama's app
- Fixed issue where command prompt would show when Ollama's app detected an old version of Ollama running
- Improved performance when using flash attention
- Fixed boundary case when encoding text using BPE
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.5...v0.11.6
v0.11.5
What's Changed
- Performance improvements for the
gpt-ossmodels - New memory management: this release of Ollama includes improved memory management for scheduling models on GPUs, leading to better VRAM utilization, model performance and less out of memory errors. These new memory estimations can be enabled with
OLLAMA_NEW_ESTIMATES=1 ollama serveand will soon be enabled by default. - Improved multi-GPU scheduling and reduced VRAM allocation when using more than 2 GPUs
- Ollama's new app will now remember default selections for default model, Turbo and Web Search between restarts
- Fix error when parsing bad harmony tool calls
-
OLLAMA_FLASH_ATTENTION=1will also enable flash attention for pure-CPU models - Fixed OpenAI-compatible API not supporting
reasoning_effort - Reduced size of installation on Windows and Linux
New Contributors
- @vorburger made their first contribution in https://github.com/ollama/ollama/pull/11755
- @dan-and made their first contribution in https://github.com/ollama/ollama/pull/10678
- @youzichuan made their first contribution in https://github.com/ollama/ollama/pull/11880
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.4...v0.11.5
v0.11.4
What's Changed
- openai: allow for content and tool calls in the same message by @drifkin in https://github.com/ollama/ollama/pull/11759
- openai: when converting role=tool messages, propagate the tool name by @drifkin in https://github.com/ollama/ollama/pull/11761
- openai: always provide reasoning by @drifkin in https://github.com/ollama/ollama/pull/11765
New Contributors
- @gao-feng made their first contribution in https://github.com/ollama/ollama/pull/11170
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.3...v0.11.4
v0.11.3
What's Changed
- Fixed issue where
gpt-osswould consume too much VRAM when split across GPU & CPU or multiple GPUs - Statically link C++ libraries on windows for better compatibility
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.2...v0.11.3
v0.11.2
What's Changed
- Fix crash in gpt-oss when using kv cache quanitization
- Fix gpt-oss bug with "currentDate" not defined
Full Changelog: https://github.com/ollama/ollama/compare/v0.11.1...v0.11.2
v0.11.0
Welcome OpenAI's gpt-oss models
Ollama partners with OpenAI to bring its latest state-of-the-art open weight models to Ollama. The two models, 20B and 120B, bring a whole new local chat experience, and are designed for powerful reasoning, agentic tasks, and versatile developer use cases.
Feature highlights
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing (Ollama is providing a built-in web search that can be optionally enabled to augment the model with the latest information), python tool calls, and structured outputs.
- Full chain-of-thought: Gain complete access to the model's reasoning process, facilitating easier debugging and increased trust in outputs.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
Quantization - MXFP4 format
OpenAI utilizes quantization to reduce the memory footprint of the gpt-oss models. The models are post-trained with quantization of the mixture-of-experts (MoE) weights to MXFP4 format, where the weights are quantized to 4.25 bits per parameter. The MoE weights are responsible for 90+% of the total parameter count, and quantizing these to MXFP4 enables the smaller model to run on systems with as little as 16GB memory, and the larger model to fit on a single 80GB GPU.
Ollama is supporting the MXFP4 format natively without additional quantizations or conversions. New kernels are developed for Ollama’s new engine to support the MXFP4 format.
Ollama collaborated with OpenAI to benchmark against their reference implementations to ensure Ollama’s implementations have the same quality.
Get started
You can get started by downloading the latest Ollama version (v0.11)
The model can be downloaded directly in Ollama’s new app or via the terminal:
ollama run gpt-oss:20b
ollama run gpt-oss:120b
What's Changed
- kvcache: Enable SWA to retain additional entries by @jessegross in https://github.com/ollama/ollama/pull/11611
- kvcache: Log contents of cache when unable to find a slot by @jessegross in https://github.com/ollama/ollama/pull/11658
Full Changelog: https://github.com/ollama/ollama/compare/v0.10.1...v0.11.0
v0.10.1
What's Changed
- Fixed unicode character input for Japanese and other languages in Ollama's new app
- Fixed AMD download URL in the logs for
ollama serve
New Contributors
- @skools-here made their first contribution in https://github.com/ollama/ollama/pull/11579
Full Changelog: https://github.com/ollama/ollama/compare/v0.10.0...v0.10.1
v0.10.0
Ollama's new app
Ollama's new app is available for macOS and Windows: Download Ollama
What's Changed
-
ollama pswill now show the context length of loaded models - Improved performance in
gemma3nmodels by 2-3x - Parallel request processing now defaults to 1. For more details, see the FAQ
- Fixed issue where tool calling would not work correctly with
granite3.3andmistral-nemomodels - Fixed issue where Ollama's tool calling would not work correctly if a tool's name was part of of another one, such as
addandget_address - Improved performance when using multiple GPUs by 10-30%
- Ollama's OpenAI-compatible API will now support WebP images
- Fixed issue where
ollama showwould report an error -
ollama runwill more gracefully display errors
New Contributors
- @sncix made their first contribution in https://github.com/ollama/ollama/pull/11189
- @mfornet made their first contribution in https://github.com/ollama/ollama/pull/11425
- @haiyuewa made their first contribution in https://github.com/ollama/ollama/pull/11427
- @warting made their first contribution in https://github.com/ollama/ollama/pull/11461
- @ycomiti made their first contribution in https://github.com/ollama/ollama/pull/11462
- @minxinyi made their first contribution in https://github.com/ollama/ollama/pull/11502
- @ruyut made their first contribution in https://github.com/ollama/ollama/pull/11528
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.6...v0.10.0
v0.9.6
What's Changed
- Fixed styling issue in launch screen
-
tool_namecan now be provided in messages with"role": "tool"using the/api/chatendpoint
New Contributors
- @vrampal made their first contribution in https://github.com/ollama/ollama/pull/9681
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.5...v0.9.6-rc0
v0.9.5
Updates to Ollama for macOS and Windows
A new version of Ollama's macOS and Windows applications are now available. New improvements to the apps will be introduced over the coming releases:
New features
Expose Ollama on the network
Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.
Model directory
The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.
Smaller footprint and faster starting on macOS
The macOS app is now a native application and starts much faster while requiring a much smaller installation.
Additional changes in 0.9.5
- Fixed issue where the
ollamaCLI would not be installed by Ollama on macOS on startup - Fixed issue where files in
ollama-darwin.tgzwere not notarized - Add NativeMind to Community Integrations by @xukecheng in https://github.com/ollama/ollama/pull/11242
- Ollama for macOS now requires version 12 (Monterey) or newer
New Contributors
- @xukecheng made their first contribution in https://github.com/ollama/ollama/pull/11242
v0.9.4
Updates to Ollama for macOS and Windows
A new version of Ollama's macOS and Windows applications are now available. New improvements to the apps will be introduced over the coming releases:
New features
Expose Ollama on the network
Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.
Model directory
The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.
Smaller footprint and faster starting on macOS
The macOS app is now a native application and starts much faster while requiring a much smaller installation.
What's Changed
- Reduced download size and startup time for Ollama on macOS
- Tool calling with empty parameters will now work correctly
- Fixed issue when quantizing models with the Gemma 3n architecture
- Ollama for macOS should not longer ask for root privileges when updating unless required
- Ollama for macOS now requires version 12 (Monterey) or newer
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.3...v0.9.4
v0.9.3
Gemma 3n
Ollama now supports Gemma 3n.
Gemma 3n models are designed for efficient execution on everyday devices such as laptops, tablets or phones. These models were trained with data in over 140 spoken languages.
Effective 2B
ollama run gemma3n:e2b
Effective 4B
ollama run gemma3n:e4b
What's Changed
- Fixed issue where errors would not be properly reported on Apple Silicon Macs
- Ollama will now limit context length to what the model was trained against to avoid strange overflow behavior
New Contributors
- @Aj-Seven made their first contribution in https://github.com/ollama/ollama/pull/11169
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.2...v0.9.3
v0.9.2
What's Changed
- Fixed issue where tool calls without parameters would not be returned correctly
- Fixed
does not support generateerrors - Fixed issue where some special tokens would not be tokenized properly for some model architectures
New Contributors
- @NGC13009 made their first contribution in https://github.com/ollama/ollama/pull/11080
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.1...v0.9.2
v0.9.1
Tool calling improvements
New tool calling support
The following models now support tool calling:
- DeepSeek-R1-2508 (671B model)
- Magistral
Tool calling reliability has also been improved for the following models:
To re-download the models, use ollama pull.
New Ollama for macOS and Windows preview
A new version of Ollama's macOS and Windows applications are available to test for early feedback. New improvements to the apps will be introduced over the coming releases:
If you have feedback, please create an issue on GitHub with the app label. These apps will automatically update themselves to future versions of Ollama, so you may have to redownload new preview versions in the future.
New features
Expose Ollama on the network
Ollama can now be exposed on the network, allowing others to access Ollama on other devices or even over the internet. This is useful for having Ollama running on a powerful Mac, PC or Linux computer while making it accessible to less powerful devices.
Allow local browser access
Enabling this allows websites to access your local installation of Ollama. This is handy for developing browser-based applications using Ollama's JavaScript library.
Model directory
The directory in which models are stored can now be modified! This allows models to be stored on external hard disks or alternative directories than the default.
Smaller footprint and faster starting on macOS
The macOS app is now a native application and starts much faster while requiring a much smaller installation.
What's Changed
- Magistral now supports disabling thinking mode. Note: it is also recommended to change the system prompt when doing so.
- Error messages that previously showed
POST predictwill now be more informative - Improved tool calling reliability for some models
- Fixed issue on Windows where
ollama runwould not start Ollama automatically
New Contributors
- @JasonHonKL made their first contribution in https://github.com/ollama/ollama/pull/10174
- @hwittenborn made their first contribution in https://github.com/ollama/ollama/pull/10998
- @krzysztofjeziorny made their first contribution in https://github.com/ollama/ollama/pull/10973
Full Changelog: https://github.com/ollama/ollama/compare/v0.9.0...v0.9.1
Configuration
-
If you want to rebase/retry this MR, check this box
This MR has been generated by Renovate Bot.