The announcement centers heavily on coding and agent use cases. Qwen says the model surpasses its direct predecessor, Qwen3.5-35B-A3B, by a wide margin and also competes with much larger dense models such as Qwen3.5-27B and Gemma4-31B. The benchmark charts released with the post show Qwen3.6-35B-A3B reaching 73.4 on SWE-bench Verified, 67.2 on SWE-bench Multilingual, 49.5 on SWE-bench Pro, and 51.5 on Terminal-Bench 2.0. Additional coding and agent scores shown in the materials include 52.6 on QwenClawBench, 29.4 on NL2Repo, 37.0 on MCPMark, and an Elo rating of 1397 on QwenWebBench. Qwen also highlights broader agent results such as 67.2 on TAU3-Bench, 62.8 on MCP-Atlas, and 60.1 on WideSearch.⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀
— Qwen (@Alibaba_Qwen) April 16, 2026
A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.
🔥 Agentic coding on par with models 10x its active size
📷 Strong multimodal perception and reasoning ability
🧠 Multimodal thinking + non-thinking modes… pic.twitter.com/UMiChPaLid
Beyond coding, the company presents Qwen3.6-35B-A3B as a more general-purpose model that still supports both multimodal thinking and non-thinking modes. In the language and reasoning tables shared in the release, the model posts 85.2 on MMLU-Pro, 93.3 on MMLU-Redux, 90.0 on C-Eval, and 86.0 on GPQA. On STEM and problem-solving tasks, the posted results include 80.4 on LiveCodeBench v6, 83.6 on HMMT Feb 26, 78.9 on IMOAnswerBench, and 92.7 on AIME26. Together, those numbers are used in the announcement to argue that the model is not limited to coding-only scenarios, even though agentic coding remains the lead message of the release.LM Performance:Qwen3.6-35B-A3B outperforms the dense 27B-param Qwen3.5-27B on several key coding benchmarks and dramatically surpasses its direct predecessor Qwen3.5-35B-A3B, especially on agentic coding and reasoning tasks. pic.twitter.com/PyXDNruoy2
— Qwen (@Alibaba_Qwen) April 16, 2026
Multimodality is the other major theme in the launch. Qwen says Qwen3.6 is natively multimodal, and that Qwen3.6-35B-A3B shows perception and multimodal reasoning capabilities well beyond what its size would suggest. The company states that across most vision-language benchmarks, the model matches Claude Sonnet 4.5 and surpasses it on several tasks. The charts shared in the announcement show Qwen3.6-35B-A3B scoring 81.7 on MMMU, 75.3 on MMMU-Pro, 85.3 on RealWorldQA, 92.8 on MMBench EN-DEV v1.1, 89.9 on OmniDocBench1.5, 81.9 on CC-OCR, and 92.7 on AI2D_TEST. Qwen gives special emphasis to spatial intelligence, where the model reaches 92.0 on RefCOCO and 50.8 on ODInW13. The same release also includes video benchmarks such as 86.6 on VideoMME, 83.7 on VideoMMMU, and 86.2 on MLVU. Taken together, the release positions Qwen3.6-35B-A3B as an open-source model built around agentic coding, multimodal understanding, and efficient inference. Qwen’s framing stays focused on what the model can deliver with only about 3 billion active parameters, while the benchmark set is used to show that the model is being pitched not only for code generation, but also for reasoning, document understanding, visual tasks, and broader agent workflows.VLM Performance:Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance… pic.twitter.com/nOVBNlVfzW
— Qwen (@Alibaba_Qwen) April 16, 2026