The Decoder • 87일 전

샤오미 MiMo 모델, 4시간 만에 컴파일러 자동 코딩

IMP

8/10

핵심 요약

샤오미가 1.02조 개 매개변수를 장착한 오픈웨이트 혼합 전문가(MoE) 언어모델 MiMo-V2.5-Pro를 공개했습니다. 이 모델은 최대 100만 토큰을 처리하며, 내부 테스트에서 불과 4.3시간 만에 완전한 컴파일러를 자율적으로 작성했습니다. 서구권 경쟁 모델들과 비교해 40~60% 적은 토큰으로 동등한 수준의 성능을 발휘하며 뛰어난 효율성을 입증했습니다.

번역된 본문

샤오미의 새로운 MiMo-V2.5-Pro는 내부 테스트에 따르면 5시간 미만으로 완전한 컴파일러를 작성했으며, 코딩 벤치마크에서 Anthropic의 Claude Opus 4.6에 근접한 성능을 기록했습니다. 이 오픈웨이트 모델은 서구권 경쟁사들보다 훨씬 적은 토큰을 소모하는 것으로 나타났습니다.

MiMo-V2.5-Pro는 혼합 전문가(MoE) 모델로, 요청 시 모델 전체가 아닌 일부만 활성화되어 작동합니다. 총 1.02조 개의 매개변수를 보유하고 있으며, 요청 당 420억 개의 매개변수가 활성화됩니다. MiMo 팀은 이 버전을 수 시간 동안 실행되고 수천 번의 도구 호출이 필요한 작업을 위해 특별히 설계했습니다. 컨텍스트 윈도우는 현재 가능한 최고 수준으로, 메인 버전은 한 번에 최대 100만 토큰을 처리할 수 있으며 재학습이 없는 기본 버전은 25만 6,000토큰까지 처리 가능합니다.

한 오후에 끝나는 컴파일러 샤오미는 이전 버전에 비해 가장 큰 도약을 세 가지 데모를 통해 선보였습니다. 첫 번째 데모에서 팀은 베이징대학교 강좌의 완전한 컴파일러 프로젝트를 모델에 구축하도록 지시했습니다. 샤오미에 따르면 이 작업은 일반적으로 컴퓨터 과학 학생에게 몇 주가 걸리는 과제입니다.

MiMo-V2.5-Pro는 672회의 도구 호출을 통해 4.3시간 만에 프로젝트를 마쳤으며, 숨겨진 테스트 스위트에서 233점 만점에 233점을 기록했습니다. 샤오미는 이 모델의 접근 방식이 가장 흥미로운 부분이라고 밝혔습니다. 모델은 먼저 전체 파이프라인을 스캐폴딩으로 구성한 다음, 각 단계를 계층별로 작업했습니다. 첫 번째 컴파일 시도에서 이미 233개 테스트 중 137개를 통과했습니다. 이후 리팩토링 과정에서 회귀 버그가 발생했으나, 모델이 스스로 이를 진단하고 수정했습니다.

두 번째 데모에서 MiMo-V2.5-Pro는 단 몇 개의 프롬프트만으로 약 8,000줄의 코드로 구성된 데스크톱 비디오 에디터를 작성했습니다. 모델은 11.5시간 동안 자율적으로 실행되며 약 1,870회의 도구 호출을 수행했습니다.

세 번째 데모에서는 Claude Code를 통해 모델을 회로 시뮬레이터에 연결하고 전압 조정기를 설계하도록 지시했습니다. 1시간 이내에 6개의 기술 사양을 모두 충족하는 결과를 얻었습니다. 이 중 4개 사양은 모델의 첫 번째 초안보다 약 한 자릿수(order of magnitude) 뛰어난 성능을 보였습니다.

적은 토큰으로 동등한 성능 달성 샤오미는 MiMo-V2.5-Pro를 주로 성능 대 토큰 비율 측면에서 강조하고 있습니다. 자체 ClawEval 에이전트 벤치마크에서 이 모델은 작업 실행 당 약 7만 토큰으로 64%의 점수를 기록했습니다. 팀에 따르면 이는 Claude Opus 4.6, Gemini 3.1 Pro, GPT-5.4가 비슷한 점수에 도달하는 데 필요한 토큰보다 40~60% 적은 수치입니다.

코딩 벤치마크에서 이 모델은 SWE-bench Verified에서 78.9점, SWE-Bench Pro에서 57.2점, Terminal-Bench 2.0에서 68.4점을 기록했습니다. 샤오미의 자체 MiMo Coding Bench에서는 73.7점을 받아 Claude Opus 4.6(77.1점)에 근접했으며 Gemini 3.1 Pro(67.8점)를 크게 앞섰습니다. 범용 에이전트 작업의 경우 GDPVal-AA에서 1,581 Elo 포인트, tau3-bench에서 72.9점을 기록했습니다.

이러한 발전은 긴 문맥 작업에서 가장 뚜렷하게 나타납니다. 모델이 복잡한 노드 그래프를 탐색하는 OpenAI의 GraphWalks 벤치마크에서 이전 버전인 MiMo-V2-Pro는 100만 토큰에서 점수가 0으로 떨어졌습니다. 반면 MiMo-V2.5-Pro는 동일한 길이에서 너비 우선 탐색에서 0.37, 부모 노드 쿼리에서 0.62를 기록했습니다. 이 모델은 이전 버전인 MiMo-V2-Flash의 기술적 기반을 계승했습니다. 샤오미에 따르면 로컬 및 글로벌 어텐션의 혼합은 긴 시퀀스의 메모리 요구량을 줄여줍니다.

원문 보기

원문 보기 (영어)

Xiaomi's open-weight MiMo-V2.5-Pro takes aim at Claude Opus with hours-long autonomous coding Jonathan Kemper View the LinkedIn Profile of Jonathan Kemper May 3, 2026 Xiaomi Key Points Xiaomi has released MiMo-V2.5-Pro, a mixture-of-experts model with 1.02 trillion parameters, specifically designed to handle lengthy, autonomous tasks. In internal benchmarks, the model reportedly programmed a complete compiler in just 4.3 hours, demonstrating its capability for complex, sustained coding work. MiMo-V2.5-Pro can process up to one million tokens at once and is said to require 40 to 60 percent fewer tokens than western competitors like Claude Opus 4.6 or Gemini 3.1 Pro, suggesting significant efficiency advantages. Ask about this article… Search Xiaomi's new MiMo-V2.5-Pro writes a complete compiler in under five hours and lands close to Anthropic's Claude Opus 4.6 on coding benchmarks, according to internal tests. The open-weight model also burns through significantly fewer tokens than its Western rivals. MiMo-V2.5-Pro is a mixture-of-experts model, meaning only part of the model fires for each request rather than the whole thing. It packs 1.02 trillion total parameters, with 42 billion active per request. The MiMo team built this version specifically for jobs that run for hours and rack up thousands of tool calls. The context window sits at the high end of what's currently possible: the main version handles up to one million tokens at once, while the base version without retraining caps out at 256,000 tokens. Ad A compiler in one afternoon Xiaomi shows off the biggest jump from the previous version through three demos. In the first, the team had the model build a complete compiler project from a Peking University course, a task that typically takes a computer science student several weeks, according to Xiaomi. Ad DEC_D_Incontent-1 MiMo-V2.5-Pro finished the project in 4.3 hours across 672 tool calls, scoring 233 out of 233 on the hidden test suite. Xiaomi says the approach is the most interesting part: The model first laid out the entire pipeline as scaffolding, then worked through each stage layer by layer. Its first compile run already passed 137 of 233 tests. A later refactoring phase introduced a regression, which the model diagnosed and fixed on its own. In the second demo, MiMo-V2.5-Pro wrote a desktop video editor with roughly 8,000 lines of code from just a few prompts. The model ran autonomously for 11.5 hours and made about 1,870 tool calls. Ad For the third demo, Xiaomi hooked the model up to a circuit simulator through Claude Code and tasked it with designing a voltage regulator. Within an hour, the result hit all six technical specs at once. Four of them beat the model's first draft by roughly an order of magnitude. Fewer tokens, comparable results Xiaomi is pitching MiMo-V2.5-Pro mainly on its performance-to-token ratio. On the company's own ClawEval agent benchmark, the model hits 64 percent with around 70,000 tokens per task run. That's 40 to 60 percent fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 need to reach similar numbers, according to the team. Ad DEC_D_Incontent-2 On coding benchmarks, the model scores 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, and 68.4 on Terminal-Bench 2.0. On Xiaomi's in-house MiMo Coding Bench, it scores 73.7, putting it close to Claude Opus 4.6 (77.1) and well ahead of Gemini 3.1 Pro (67.8). For general agent tasks, MiMo-V2.5-Pro hits 1,581 Elo points on GDPVal-AA and 72.9 on tau3-bench. Ad The progress shows up most clearly in long-context work. On OpenAI's GraphWalks benchmark, which has the model navigate complex node graphs, the previous MiMo-V2-Pro dropped to zero at one million tokens. MiMo-V2.5-Pro still scores 0.37 on breadth-first searches and 0.62 on parent node queries at the same length. The model inherits its technical foundation from its predecessor, MiMo-V2-Flash. According to Xiaomi, a mix of local and global attention cuts memory needs for long texts by nearly seven times, while a parallel token prediction mechanism triples output speed. Pre-training ran on 27 trillion tokens, with the context window then expanded in stages up to one million tokens. For post-training, Xiaomi uses a teacher-student setup: several specialized models first get optimized separately for areas like math, security, or tool use. A single student model then learns from its own attempts under the guidance of all the specialists, combining their skills into one. Three more models alongside the flagship Xiaomi is shipping three other systems alongside the Pro model. MiMo-V2.5 is a smaller version with 310 billion parameters, 15 billion of them active per request. It handles text, images, video, and audio directly and also supports up to one million tokens of context. Trained on roughly 48 trillion tokens, it scores 87.7 on the Video-MME benchmark, putting it on par with Gemini 3 Pro, according to Xiaomi. This model is also available as open weights on Hugging Face. MiMo-V2.5-TTS is a family of three variants: one with preset voices, one that generates new voices from text descriptions, and one that clones voices from short audio clips. Users can shape pronunciation by dropping control tags like [crying] or [whispers] straight into the text. These models are API-only through Xiaomi's platform, though currently free for a limited time. The MiMo-V2.5-ASR speech recognition model, on the other hand, is open . It works in both Chinese and English and, per the benchmarks, also handles Chinese dialects like Wu, Cantonese, and Hokkien, plus mid-sentence language switching and song lyrics. On the Open ASR Leaderboard, it averages a 5.73 percent word error rate. China's open-weight push is about volume With this release, Xiaomi's MiMo team is sticking to the path it set in late 2025: lots of models at once, mostly open, all built for autonomous AI agents. The team points to further scaling of training and a better grasp of long-range relationships beyond individual sentences as the next steps. Xiaomi rolled out its first complete three-model package recently with MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS . That earlier Pro model had quietly topped the OpenRouter usage rankings for several days under the codename "Hunter Alpha," with many users initially assuming it was a new Deepseek model. That one has now landed too: Deepseek has released Deepseek V4 , currently the largest open model on the market and one that significantly undercuts the competition on price. MiMo-V2.5-Pro now joins the arms race among Chinese open-weight providers—a race that's increasingly less about benchmark points and more about how cheaply and how long a model can work on a task by itself . AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: MiMo-V2.5 | MiMo-V2.5-Pro | MiMo-V2.5-TTS | MiMo-V2.5-ASR

샤오미 오픈소스 모델 자율 코딩 MoE AI 코딩 에이전트