The Decoder • 111일 전

메타, 최초의 최전선 AI '뮤즈 스파크' 공개

IMP

8/10

핵심 요약

메타가 새로운 네이티브 멀티모달 추론 모델인 '뮤즈 스파크(Muse Spark)'를 공개했습니다. 이 모델은 툴 사용, 시각적 사고 체인, 멀티 에이전트 오케스트레이션을 지원하며, 벤치마크에서 글로벌 top 5에 랭크되며 맹추격을 보여주었습니다. 기존 라마(Llama) 모델과 달리 오픈웨이트(Open Weights) 정책을 폐지한 점과, 획기적인 컴퓨팅 효율성을 달성한 점이 가장 큰 특징입니다.

번역된 본문

메타의 '뮤즈 스파크(Muse Spark)': 최초의 프론티어 모델이자 오픈웨이트(Open Weights)가 적용되지 않은 첫 모델

핵심 요약: • 메타 슈퍼인텔리전스 랩스(Meta Superintelligence Labs)는 도구 사용, 시각적 사고 체인(Chain-of-Thought) 추론 및 멀티 에이전트 오케스트레이션이 가능한 네이티브 멀티모달 추론 모델인 '뮤즈 스파크'를 출시했습니다. • 이 모델은 Artificial Analysis 인텔리전스 지수에서 52점을 기록하며 상위 5위권에 진입했습니다. 이는 Gemini 3.1 Pro, GPT-5.4, Claude Opus 4.6 바로 뒤를 잇는 성적입니다. • 메타의 오픈 모델 전략에서 큰 전환점이 된 것과 같이, 뮤즈 스파크는 기존 라마(Llama) 시리즈처럼 공개되지 않습니다. 향후 출시될 버전에서는 이 방침이 바뀔 수도 있습니다.

메타 슈퍼인텔리전스 랩스가 첫 프론티어 모델인 뮤즈 스파크를 출시했습니다. 이는 메타가 오픈웨이트를 적용하지 않은 첫 AI 모델이기도 합니다. 독립적인 테스트 결과, 이 모델은 최소한 현재로서는 오픈AI, 앤스로픽, 구글과의 격차를 좁히고 있는 것으로 나타났습니다.

메타는 사내 슈퍼인텔리전스 랩스의 새로운 '뮤즈(Muse)' 패밀리 첫 모델인 뮤즈 스파크를 공개했습니다. 이는 도구 사용, 시각적 사고 체인 추론 및 멀티 에이전트 오케스트레이션을 갖춘 네이티브 멀티모달 추론 모델입니다. 이 모델은 meta.ai 및 Meta AI 앱에서 사용할 수 있으며, 일부 사용자를 대상으로 비공개 API 프리뷰가 제공되고 있습니다.

기존의 라마 모델들과 달리 뮤즈 스파크는 오픈웨이트가 아니며 로컬에서 실행할 수 없습니다. 이는 메타가 수년간 주창해 온 오픈소스 플레이북과의 완전한 결별을 의미합니다. 하지만 회사의 막대한 AI 인프라 및 전문 인재에 대한 지출(이는 다른 부문의 희생을 감수해야 할 수도 있습니다)은 결국 투자 비용을 회수해야만 할 것입니다.

그렇다고 오픈소스가 완전히 배제된 것은 아닙니다. 메타는 새로운 AI 모델의 일부를 오픈소스로 전환할 계획인 것으로 알려졌으며, AI 총괄 책임자인 알렉산더 왕(Alexandr Wang)은 "향후 버전을 오픈소스할 계획"이라고 밝혔습니다.

강력한 벤치마크, 에이전트 및 코딩 과제에서는 여전히 한계 존재

메타는 뮤즈 스파크가 멀티모달 인식, 추론 및 헬스케어 애플리케이션 부문에서 경쟁력 있는 성능을 보여준다고 밝혔습니다. 동시에 장기적인 에이전트 시스템(Agentic system) 및 코딩 워크플로우 부문에서는 여전히 성능 격차가 존재한다고 회사 측은 인정했습니다.

평소와 같이, 벤치마크 점수가 실제 사용 환경에서 어떻게 반영될지는 미지수입니다. 서류상으로는 메타가 오픈AI 및 다른 기업들을 따라잡았습니다. 그러나 앤스로픽은 이미 '미토스(Mythos)'로 기준을 높였고, 오픈AI도 곧 이어서 새로운 모델을 발표할 것으로 알려져 있어 메타의 격차는 계속해서 유지될 수 있습니다.

메타는 또한 여러 에이전트가 병렬적으로 사고하도록 조율하는 '컨템플레이팅 모드(Contemplating Mode)'를 선보였습니다. 이는 Gemini Deep Think 및 GPT Pro와 같은 최신 프론티어 모델의 심층 추론 기능과 경쟁하기 위해 설계되었습니다. 메타는 이 모드가 '인류의 마지막 시험(Humanity's Last Exam)'에서 58%를, '프론티어사이언스 리서치(FrontierScience Research)'에서 38%를 기록했다고 밝혔습니다.

독립적인 벤치마크 평가 기관인 Artificial Analysis는 뮤즈 스파크를 조기에 테스트할 수 있는 권한을 얻었습니다. 이 모델은 인텔리전스 지수에서 52점을 기록하며 테스트된 전체 모델 중 상위 5위권에 올랐습니다. 오직 Gemini 3.1 Pro Preview, GPT-5.4, Claude Opus 4.6만이 이보다 높은 점수를 기록했습니다.

참고로 메타의 이전 모델이었던 라마 4 메버릭(Llama 4 Maverick)과 스카우트(Scout)는 2025년 4월 출시 당시 각각 18점, 13점에 그쳤습니다. Artificial Analysis는 뮤즈 스파크가 단 한 번의 릴리스 만에 프론티어 모델과의 격차를 좁혔다고 평가했습니다. 하지만 이 평가 기관은 에이전트 기반 작업에서의 약점도 지적했습니다. GDPval-AA 작업 벤치마크에서 뮤즈 스파크는 1,427점을 기록해, Claude Sonnet 4.6(1,648점) 및 GPT-5.4(1,676점)에 뒤처졌습니다.

밑바닥부터 재구축된 사전 훈련, 획기적인 효율성 도약 달성

메타는 지난 9개월 동안 개발한 완전히 새로운 사전 훈련 스택을 기반으로 뮤즈 스파크를 구축했다고 밝혔습니다. 모델 아키텍처, 최적화 및 데이터 큐레이션에 대한 변경은 각 컴퓨팅 유닛에서 훨씬 더 높은 성능을 끌어내기 위한 것입니다. 메타에 따르면, 그 성과로서 뮤즈 스파크는 라마 4 메버릭과 동등한 기능을 훨씬 적은 컴퓨팅 자원(한 자릿수 이상 적음)으로 달성했습니다. 이는 현재 시장에 나와 있는 최고 수준의 기반 모델들보다 상당히 높은 효율성을 보여줍니다. 사전 훈련 이후 메타는 강화 학습을 적용합니다.

원문 보기

원문 보기 (영어)

Meta's Muse Spark is its first frontier model and its first without open weights Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 8, 2026 Nano Banana Pro prompted by THE DECODER Key Points Meta Superintelligence Labs has launched Muse Spark, a native multimodal reasoning model capable of tool usage, visual chain-of-thought reasoning, and multi-agent orchestration. The model scored 52 points on the Artificial Analysis Intelligence Index, landing in the top 5, just behind Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. In a notable break from Meta's open-model strategy, Muse Spark isn't publicly available like the Llama family was. That could change with future releases. Ask about this article… Search Meta Superintelligence Labs ships Muse Spark, its first frontier model. It's also Meta's first AI model without open weights. Independent testing shows it closing the gap to OpenAI, Anthropic, and Google, at least for now. Meta has unveiled Muse Spark , the debut model in the new Muse family from its in-house Superintelligence Labs. It's a native multimodal reasoning model with tool use, visual chain-of-thought reasoning, and multi-agent orchestration. The model is live on meta.ai and in the Meta AI app, with a private API preview going out to select users. Unlike previous Llama models, Muse Spark isn't open-weight and can't be run locally - a sharp break from the open-source playbook Meta championed for years . But the company's enormous spending on AI infrastructure and specialized talent, which might come at the expense of other roles , has to start paying for itself eventually. Ad Open source isn't completely off the table, though. Meta is reportedly planning to open-source parts of its new AI models , and AI chief Alexandr Wang says the company has "plans to open-source future versions." Ad DEC_D_Incontent-1 Strong benchmarks, but gaps remain in agentic and coding tasks Meta says Muse Spark posts competitive numbers in multimodal perception, reasoning, and health applications. At the same time, the company admits there are still performance gaps in long-horizon agentic systems and coding workflows. As always, it's an open question how benchmark scores translate to real-world use. On paper, Meta has caught up with OpenAI and the rest. But Anthropic already raised the bar with Mythos , and OpenAI is rumored to follow soon, so Meta's gap could persist. Ad Meta is also shipping a "Contemplating Mode" that orchestrates multiple agents thinking in parallel. It's designed to go head-to-head with deep reasoning features in frontier models like Gemini Deep Think and GPT Pro. Meta says it hits 58 percent on Humanity's Last Exam and 38 percent on FrontierScience Research. Independent benchmarking service Artificial Analysis got early access to test Muse Spark . The model scored 52 on the Intelligence Index, landing in the top 5 across all models tested. Only Gemini 3.1 Pro Preview, GPT-5.4, and Claude Opus 4.6 came in higher. Ad DEC_D_Incontent-2 For context, Meta's previous models Llama 4 Maverick and Scout managed just 18 and 13 points when they launched in April 2025. Artificial Analysis says Muse Spark closes the frontier gap in a single release. The service does flag weaknesses in agent-based tasks, though: on the GDPval-AA work task benchmark, Muse Spark trails Claude Sonnet 4.6 (1,648) and GPT-5.4 (1,676) with 1,427 points. Ad Ground-up pretraining rebuild delivers a massive efficiency jump Muse Spark is built on a completely overhauled pretraining stack that Meta developed over the past nine months, the company says . Changes to model architecture, optimization, and data curation are meant to squeeze significantly more capability out of each compute unit. The payoff, according to Meta: Muse Spark matches the capabilities of Llama 4 Maverick with over an order of magnitude less compute. That makes it substantially more efficient than the top base models on the market today. After pretraining, Meta applies reinforcement learning (RL) to sharpen the model further, standard practice across the industry right now. Large-scale RL is notoriously unstable, but Meta says the new stack delivers steady, predictable gains. RL improves reliability without narrowing the diversity of the model's reasoning, and according to Meta, those improvements generalize predictably to tasks that never appeared during training, based on a separate evaluation dataset. "Thought compression" slashes token count without sacrificing quality Meta takes two approaches to test-time compute , the extended thinking process models use when working toward an optimal answer. The first is thought-time penalties that optimize token consumption. The second is multi-agent orchestration that boosts performance without adding latency. During training with thought-time penalties, Meta observed a phase transition it calls "thought compression." After an initial stretch where the model improves by thinking longer, the length penalty pushes Muse Spark to compress its reasoning and solve problems with far fewer tokens. The model then expands its solutions again for stronger results. Multi-agent orchestration puts multiple parallel agents on difficult problems at the same time. Meta says this delivers better performance at comparable latency versus a single agent that spends more time thinking. Artificial Analysis backs up the efficiency claims: Muse Spark burned through 58 million output tokens for the full Intelligence Index run, on par with Gemini 3.1 Pro Preview (57 million) and well below Claude Opus 4.6 (157 million) or GPT-5.4 (120 million). Health and multimodal applications take center stage Muse Spark is built to work with visual information across domains. Meta says it delivers strong results on visual STEM questions, entity recognition, and localization. The company points to multimodal perception and health as use cases, though interactive applications like generating mini-games are also on the table. On the health side, Meta says it partnered with more than 1,000 doctors to curate high-quality, factually accurate training data. Muse Spark can generate interactive displays that break down the nutritional value of food or show which muscles activate during specific exercises. Meta says Muse Spark lacks the autonomous capabilities needed to execute threat scenarios involving cybersecurity or loss of control. A full security report is expected to follow. One early finding worth noting: the model frequently flagged test scenarios as "alignment traps" and justified honest behavior by pointing out it was being evaluated, a phenomenon researchers call "evaluation awareness." Meta looks to move past the Llama 4 stumble Meta frames Muse Spark as "the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts" toward "personal superintelligence." The company says it's investing across the full stack, from research and model training to infrastructure, including the Hyperion data center. "This is MSL's first model and there are certainly rough edges we will polish over time in model behavior," writes Meta AI head Alexandr Wang , adding that "bigger models are already in development with infrastructure scaling to match." The release comes after a rough stretch for Meta's AI efforts. Llama 4 Maverick and Scout drew criticism in April 2025 over underwhelming benchmark results and internal accusations of benchmark manipulation . Muse Spark follows a reorganization of Meta's AI work under the new Meta Superintelligence Labs banner and marks the company's return to the frontier race after roughly a year of relative quiet. AI News Without the Hype – Curated by Humans As a THE DECODER subscriber , you get ad-free reading, our weekly AI newsletter , the exclusive "AI Radar" Frontier Report 6× per year , access to comments, and our complete archive. Subscribe now Source: Meta

메타 뮤즈 스파크 멀티모달 AI 오픈웨이트 AI 벤치마크