The Decoder • 102일 전

오픈소스 소형 모델이 깎아낸 앤스로픽 '클로드 미토스' 신화

IMP

8/10

핵심 요약

앤스로픽의 최고 수준 사이버 보안 AI 모델인 '클로드 미토스(Claude Mythos)'가 독보적이라는 주장이 무너지고 있습니다. 두 곳의 독립적인 연구에 따르면, 상용화된 소규모 오픈 모델들도 미토스가 탐지해 낸 보안 취약점 대부분을 동일하게 발견하고 재현해냈습니다. 이는 보안 분야에서 특정 폐쇄형 대형 모델만이 가진 우위가 줄어들고 있으며, 작은 모델들도 충분히 경쟁력을 갖췄음을 시사합니다.

번역된 본문

앤스로픽의 '클로드 미토스(Claude Mythos)' 사이버 보안 모델 신화가 무너지고 있다. 소규모 오픈 모델들이 앤스로픽이 선보였던 것과 동일한 보안 취약점을 탐지해 내면서다. 조나단 켐퍼(Jonathan Kemper)의 THE DECODER 기사 (2026년 4월 18일).

앤스로픽은 경쟁자들이 따라올 수 없는 성능을 이유로 자사의 사이버 보안 모델인 클로드 미토스(Claude Mythos)의 접근을 엄격하게 제한해 왔다. 하지만 두 가지 새로운 연구에 따르면, 소규모 공개 모델조차도 앤스로픽이 시연한 취약점 분석의 대부분을 재현할 수 있는 것으로 나타났다.

앤스로픽은 프로젝트 글래스윙(Project Glasswing)을 통해 모델의 공격적 능력을 이유로 '클로드 미토스 프리뷰(Mythos Preview)'의 접근 권한을 11개 조직 컨소시엄으로만 제한했다. 내부 테스트와 영국 AI 보안 연구소(AISI)의 감사 결과, 미토스는 네트워크가 '규모가 작고 방어가 약하며 취약한' 경우에 한해 소프트웨어 버그를 찾고, 독자적으로 실용적인 익스플로잇(공격 코드)을 구축하며, 시뮬레이션에서 전체 기업 네트워크를 장악할 수 있는 것으로 확인되었다.

이제 두 가지 독립적인 검증 연구가 모델의 전반적인 성과를 부정하지는 않으면서도 그 독점적 지위 논리에 구멍을 뚫고 있다. 첫 번째는 2025년 중반부터 오픈소스 소프트웨어에서 자체적인 AI 보조 버그 헌팅을 진행해 온 AISLE이라는 기업에서 나왔다. AISLE은 OpenSSL에서 15개, curl에서 5개의 취약점을 보고했다. 설립자인 스타니슬라브 포트(Stanislav Fort)는 앤스로픽이 공개한 샘플 코드 조각을 다양한 모델에 입력해, 더 작고 부분적으로 공개된 모델들이 얼마나 스스로 문제를 파악할 수 있는지 테스트했다.

두 번째 연구는 비독 보안(Vidoc Security)에서 나왔으며, GPT-5.4와 클로드 오퍼스(Claude Opus) 4.6을 오픈소스 코딩 에이전트인 오픈코드(OpenCode)와 결합해 테스트를 진행했다.

소형 모델들도 FreeBSD 버그를 잡아냈다

앤스로픽이 미토스의 자율적인 발견 및 익스플로잇 능력을 보여주기 위해 강조했던 FreeBSD NFS 버그(CVE-2026-4747)의 경우, AISLE은 테스트한 8개 모델 모두가 해당 함수의 메모리 버그를 찾아냈다고 밝혔다. 여기에는 백만 토큰당 0.11달러 비용으로 작동하는 36억 개 활성 파라미터(Active parameters)를 가진 GPT-OSS-20b 모델도 포함되어 있었다.

모든 모델이 결함을 심각한 수준으로 분류했으며, 덮어쓸 수 있는 버퍼 크기에 대한 추정치는 약간씩 달랐다. 또한 모든 모델은 운영 체제의 주요 보호 장치가 여기에 왜 적용되지 않는지 파악하며 버그를 악용하는 방법에 대한 그럴듯한 해결책을 제시했다. GPT-OSS-120b는 AISLE이 실제 익스플로잇과 상당히 유사하다고 평가하는 가젯(gadget) 시퀀스를 만들어냈다. Kimi K2는 감염된 한 컴퓨터에서 다른 컴퓨터로 공격이 자동으로 확산될 수 있다는 사실을 스스로 알아냈는데, 이는 앤스로픽조차 언급하지 않았던 세부 사항이다.

문제는 한층 더 창의적인 접근이 필요한 부분에서 발생했다. 실제 익스플로잇은 1,000바이트가 넘는 페이로드(Payload)를 약 304바이트의 제한된 가용 공간에 압축해야 한다. 미토스는 페이로드를 15개의 개별 네트워크 요청으로 분할하여 이를 성공적으로 수행했다. 테스트된 모델 중 이 정확한 기법을 찾아낸 곳은 없었지만, 연구진에 따르면 다른 실용적인 대안 경로들은 발견했다고 한다.

들쭉날쭉한 역량의 지형

OpenBSD 버그는 전혀 다른 양상을 보였다. 이 버그는 정수 오버플로우(Integer overflows)와 리스트 상태(List states)에 대한 수학적 이해가 필요했으며, 모델들의 성능 차이가 극심하게 나타났다. AISLE에 따르면 GPT-OSS-120b는 한 번의 실행으로 공개적으로 설명된 전체 익스플로잇 체인을 재구성했으며, 본질적으로 실제 OpenBSD 패치를 수정안으로 제안했다. 반면 FreeBSD 버그에서는 좋은 성능을 보였던 Qwen3 32B 모델은 OpenBSD 코드가 '이러한 시나리오에 강건하다(robust)'고 판단했다.

비독(Vidoc) 역시 비슷한 한계에 부딪혔다. 클로드 오퍼스 4.6은 세 번의 테스트 모두에서 취약점을 재현해 냈지만, GPT-5.4는 매번 이를 찾아내지 못했다. 포트(Fort)는 이를 '들쭉날쭉한 프론티어(jagged frontier)'라고 부르며, 이는 불안정하고 고르지 않은 역량의 경계선을 의미한다. 사이버 보안 분야에서 만능 단일 모델은 없으며, 작업에 따라 모델들의 순위가 급격하게 바뀐다.

소형 모델이 대형 모델을 앞서는 순간

보다 시사하는 바가 큰 테스트 중 하나는 언뜻 보기에 교과서적인 보안 취약점처럼 보이는 간단한 코드 샘플을 사용한다. 사용자의 입력값이 여과 없이 데이터베이스 쿼리로 들어가는 것처럼 보이기 때문이다. 하지만 몇 줄 아래에서 그 입력값은 실제로 무효화되므로, 실제로는 취약점이 존재하지 않는다. 테스트된 13개의 앤스로픽 모델 중 오퍼스(Opus) 4.6만이 이를 명확히 파악했고, 소넷(Sonnet) 4.6와 오퍼스 4.5 모델은 혼동에 빠졌다.

원문 보기

원문 보기 (영어)

The myth of Claude Mythos crumbles as small open models hunt the same cybersecurity bugs Anthropic showcased Jonathan Kemper View the LinkedIn Profile of Jonathan Kemper Apr 18, 2026 Nano Banana Pro prompted by THE DECODER Anthropic has kept its Claude Mythos cybersecurity model on a short leash, pointing to capabilities it says no rival can match. But two new studies suggest that even small, openly available models can reproduce most of the vulnerability analyses Anthropic has put on display. Through Project Glasswing, Anthropic has limited access to Claude Mythos Preview to a consortium of eleven organizations, citing the model's offensive capabilities. Internal tests and an audit by the UK's AI Security Institute found that Mythos can find software bugs, build working exploits on its own, and take over entire corporate networks in simulations, as long as the network is "small, weakly defended and vulnerable." Two independent replication efforts are now poking holes in that exclusivity story, without disputing the model's overall performance. The first comes from AISLE , a company that has been running its own AI-assisted bug hunting on open source software since mid-2025. AISLE says it has reported 15 vulnerabilities in OpenSSL and five in curl. Founder Stanislav Fort fed the code snippets from Anthropic's public samples into a range of models to see how much smaller and partially open models could piece together on their own. The second study comes from Vidoc Security , which paired GPT-5.4 and Claude Opus 4.6 with the open coding agent OpenCode. Small models catch the FreeBSD bug too The FreeBSD NFS bug (CVE-2026-4747) that Anthropic spotlighted was pitched as a showcase for autonomous discovery and exploitation by Mythos. AISLE found that all eight models it tested caught the memory bug in the function in question. That included GPT-OSS-20b, a model with just 3.6 billion active parameters that runs at $0.11 per million tokens. Every model flagged the flaw as critical, though their estimates of the overwritable buffer size varied slightly. Every model also came up with a plausible take on how to exploit the bug, working out why the operating system's main protections don't apply here. GPT-OSS -120b produced a gadget sequence that AISLE says comes close to the real exploit. Kimi K2 even figured out on its own that the attack could spread automatically from one infected machine to others, a detail Anthropic itself doesn't mention. Where things get harder is on the creative side. The real exploit has to squeeze a payload of more than 1,000 bytes into about 304 bytes of available space. Mythos pulled it off by splitting the payload across 15 separate network requests. None of the tested models landed on that exact trick, but they found other workable paths, the researchers say. A jagged capability landscape The OpenBSD bug is a different story. It calls for a mathematical grasp of integer overflows and list states, and results are all over the map. AISLE says GPT-OSS-120b reconstructed the full publicly described exploit chain in a single run and essentially proposed the actual OpenBSD patch as the fix. Qwen3 32B , which had held its own on the FreeBSD bug, declared the OpenBSD code "robust to such scenarios." Vidoc ran into a similar wall: Claude Opus 4.6 reproduced the vulnerability in three out of three runs, while GPT-5.4 missed it every time. Fort calls this "the jagged frontier," a broken, uneven capability boundary. There's no single best model for cybersecurity, and the rankings shift sharply from one task to the next. When small models beat the big ones One of the more revealing tests uses a simple code sample that looks like a textbook security hole at first glance. User input seems to flow unfiltered into a database query. But a few lines down, that input is actually discarded, so the vulnerability isn't real. Of the 13 Anthropic models tested, Opus 4.6 clearly got it right, while Sonnet 4.6 and Opus 4.5 landed as borderline correct. The full table marks Opus 4 as partially correct and Opus 4.1 as borderline. Claude Sonnet 4.5 confidently traced the data flow the wrong way. On the OpenAI side, o3 was consistently correct, o4-mini only partially, and GPT-OSS-20b is listed as correct. All GPT-4.1 models and most GPT-5.4 models came up short. Other small open models like Deepseek R1 and Kimi K2 nailed it every time. What happens when the fix is already in Fort added an important caveat later. While every model reliably flagged the unpatched FreeBSD code as vulnerable, only GPT-OSS-120b—and, to a limited extent, Qwen3-32B—recognized the patched version as safe. GPT-OSS-20b, Kimi K2, and Deepseek R1 got it wrong in every run and invented reasons why phantom vulnerabilities still existed. Fort doesn't see this as a hit to his argument. If anything, he says, it confirms that the testing and sorting layer around the model is the critical piece. The real edge is in the full system Vidoc also tested cases beyond classic memory bugs. The Botan case involves a flaw in certificate validation that let a forged certificate pass as trusted. Both Claude Opus 4.6 and GPT-5.4 caught the logic gap in three out of three runs. For wolfSSL, tested in parallel, both models zeroed in on the right part of the code but misread the underlying cryptographic rule. The cost per scanned file came in under $30. Both studies argue that the real advantage lies less in any single model than in the system built around it; validation, prioritization, and workflow. That covers the full pipeline: picking targets in the code, running step-by-step analysis, checking the results, and separating real hits from false ones. AISLE goes further, arguing that small, cheap models are good enough for most of the discovery work, which makes broad scanning a viable strategy. "A thousand adequate detectives searching everywhere will find more bugs than one brilliant detective who has to guess where to look," Fort writes. Both reports leave open the possibility that Mythos still has an edge in building deployable exploits but suggest that gap will likely close as tools improve and models gain more autonomy. Together, they point to a line between frontier and publicly available models that's far more porous than Anthropic's messaging lets on, at least when it comes to finding vulnerabilities. Critics have accused Anthropic of fearmongering, arguing the company is drumming up media attention until it has the compute to open Mythos up to a broader audience. There may be something to that. According to the Financial Times , which cites "multiple people with knowledge of the matter," Anthropic is holding the model back until it has enough compute capacity to serve customers. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

AI 보안 오픈소스 모델 앤스로픽 취약점 분석 버그 헌팅