Explore Gadget Wave's Latest Innovations — Revolutionize Your Tech Journey with AI

Alibaba acknowledges that Qwen3's hybrid-thinking approach was ill-conceived

Chinese e-commerce heavyweight reverts to instruction-focused and intelligence-centric models, emphasizing quality over convenience.

, and Administrator

2025 August 6 . 6:40 PM

2 min read

Alibaba confirms Qwen3's hybrid-thinking approach was a bad idea

Alibaba acknowledges that Qwen3's hybrid-thinking approach was ill-conceived

Alibaba has recently released updated versions of its Qwen3 models, each tailored for specific tasks. The new models, identifiable by the 2507 date code in their names, come in dedicated instruct-tuned and thinking-tuned variants.

Comparative Performance

The dedicated instruct-tuned Qwen3 model significantly outperforms leading models from OpenAI and DeepSeek on benchmarks involving instruction following, logical reasoning, coding, science, and tool use. It can handle very long context windows up to 256,000 tokens, a massive increase over earlier versions, enabling it to manage larger inputs efficiently and perform better on complex tasks.

On the other hand, the thinking-tuned models show improvements in specialized math-heavy benchmarks such as the AIME25 test, scoring between 13% and 54% better than the original Qwen3 release. However, the uplift here is described as less dramatic compared to the gains seen with the non-thinking, instruct-tuned models.

Original Release vs. Upgraded Models

The original Qwen3 releases had smaller context windows (32k tokens) and showed solid performance but were significantly outclassed by the newest instruct-tuned 235B parameter models in both accuracy and task range. Upgraded models improve on instruction fidelity, logical reasoning, coding, and support multi-domain tool usage with increased parameter tuning and context capacity.

Future Plans for Hybrid Thinking Mode

Alibaba initially introduced a "hybrid thinking mode" in Qwen3 that aimed to blend instruction-following and deep reasoning abilities in one unified model. Despite some initial enthusiasm, the hybrid thinking mode underperformed or was "dumb" in certain respects. Consequently, development on this mode is currently paused. The Alibaba team stated they have not abandoned the idea but are continuing research to resolve quality issues before reintroducing hybrid thinking functionality in future Qwen models.

Additional Notes

Alibaba also unveiled Qwen3 Coder, a specialized AI model for coding tasks with a massive 480 billion total parameters and ability to process context lengths up to 1 million tokens, surpassing even GPT-4.1 in coding benchmarks. The overall trend of Qwen3 releases is improving domain-specific capabilities (math, code, reasoning) and expanding the model's effective context length to support longer, more complex interactions.

The new releases of Qwen models have a larger context window, which has been extended from 32k tokens to 256k. The updated models are being made available in both their native BF16 and quantized FP8 datatypes.

The team hasn't specified whether they will also release updated versions of the smaller 30 billion parameter mixture of experts (MoE) model. The team hasn't provided details on the performance improvements of the updated thinking-tuned models compared to the non-thinking versions.

Stay tuned for more updates as the Qwen team plans to roll out additional versions of its Qwen3 models in the coming days.

The instruct-tuned Qwen3 model, with a larger context window, demonstrates superior performance on benchmarks involving instruction following, logical reasoning, coding, science, and tool use compared to leading models from OpenAI and DeepSeek, surpassing even the latest 235B parameter models.
Alibaba's thinking-tuned Qwen3 models, particularly the ones specialized in math-heavy benchmarks like the AIME25 test, show improvements, scoring between 13% and 54% better than the original Qwen3 release, although the gains are less dramatic compared to the instruct-tuned models.

Latest

Manufacturing

HMS Astute Returns for Major Overhaul After 15 Years of Global Service

HMS Astute, the first of its class to achieve numerous milestones, is back for a well-deserved refit. The multi-million-pound Mid-Life Revalidation Period will secure the submarine's future and reflect the Royal Navy's commitment to a strong underwater fleet.

, and Administrator

2025 October 9

In the center of the image we can see a man riding on the jet ski. At the bottom there is water. In...

Latest Tech Innovations

Salomon's Speedcross Peak Waterproof Sneaker: Fall 2025's Must-Have

Stay dry and stylish this fall with Salomon's latest. The Speedcross Peak Waterproof sneaker combines performance and fashion at a Prime Day discount.

, and Administrator

2025 October 9

In this picture there is a security person who is holding the papers. In front of him there is...

Fortify Your Gadget World

Rubrik Bolsters Leadership with Top Appointments, Surpasses $400M in ARR

Rubrik strengthens its leadership with high-profile appointments. With over $400M in ARR, it's poised to drive innovation in cybersecurity, especially in the APAC region.

, and Administrator

2025 October 9

This image consists of few persons. They are wearing the army dresses. At the bottom, there is...

Smart-home-devices

Wesel Police Offers Free E-bike & Pedelec Training & Coding This Fall

Boost your riding skills and security with free police-led training and coding for your E-bike or Pedelec. Sessions happening across Wesel this October.

, and Administrator

2025 October 9

Alibaba acknowledges that Qwen3's hybrid-thinking approach was ill-conceived

Alibaba acknowledges that Qwen3's hybrid-thinking approach was ill-conceived

Comparative Performance

Original Release vs. Upgraded Models

Future Plans for Hybrid Thinking Mode

Additional Notes

Read also:

Related

Latest