Hacker News • 106일 전

클로드 마이토스 프리뷰 사이버 공격 능력 평가

IMP

8/10

핵심 요약

영국 AISI가 Anthropic의 Claude Mythos Preview 모델을 평가한 결과, 이전 세대 AI 모델들을 뛰어넘어 전문가 수준의 다단계 사이버 공격을 자율적으로 수행할 수 있는 것으로 나타났습니다. 특히 인간 전문가가 20시간 걸리는 32단계 기업 네트워크 침투 시뮬레이션을 10번 시도 중 3번 완주하며 복잡한 공격 체이닝 능력을 입증했습니다. 이는 방어가 약한 기업 시스템에 대한 자율적 해킹이 가시화되었음을 의미하며, 사이버 보안 위협의 수준이 한 단계 높아졌다는 점에서 매우 중요한 지표로 평가됩니다.

번역된 본문

AI 보안 연구소(AISI)는 Anthropic의 Claude Mythos Preview(4월 7일 발표)의 사이버 보안 역량을 평가하기 위해 테스트를 진행했습니다. 우리의 결과에 따르면, Mythos Preview는 이미 사이버 성능이 빠르게 향상되고 있는 환경에서 이전 프론티어 모델들보다 한 단계 진일보한 모습을 보여주었습니다. 우리는 2023년부터 AI 사이버 역량을 추적해 왔으며, 채팅 기반 프로빙(probing)부터 CTF(Capture-the-flag) 챌린지, 아래에 설명된 다단계 사이버 공격 시뮬레이션에 이르기까지 AI의 발전 속도에 발맞춰 점진적으로 더 어려운 평가를 구축해 왔습니다.

2년 전만 해도 사용 가능한 최고의 모델은 초보자 수준의 사이버 작업도 간신히 수행할 수 있었습니다. 이제는 Mythos Preview가 네트워크 접근 권한을 부여받고 명시적인 지시를 받은 통제된 평가 환경에서, 인간 전문가가 며칠이 걸릴 작업인 취약한 네트워크에 대한 다단계 공격 실행 및 자율적인 취약점 발견 및 악용이 가능한 것으로 관찰되었습니다.

이 블로그 게시물에서는 Mythos Preview에 대해 진행한 사이버 평가 결과를 요약합니다. 여기에는 CTF(Capture-the-flag) 챌린지와 다단계 공격 시나리오를 시뮬레이션하도록 설계된 더 복잡한 사이버 훈련장(Cyber range)이 모두 포함됩니다.

CTF(Capture-the-flag) 결과 CTF 챌린지에서 AI 모델은 대상 시스템의 약점을 식별하고 악용하여 숨겨진 '플래그(flags)'를 찾아야 합니다. 아래 차트는 우리의 사이버 CTF 제품군에서 다른 모델들과 비교한 Mythos Preview의 성능을 보여줍니다. 각 포인트는 특정 난이도에서 모델의 평균 성공률을 나타냅니다. 2025년 4월 이전에는 어떤 모델도 완료하지 못했던 전문가 수준의 작업에서 Mythos Preview는 73%의 성공률을 기록했습니다.

사이버 훈련장(Cyber range) 결과 전문가 수준의 CTF조차도 특정 기술만을 개별적으로 테스트합니다. 실제 사이버 공격은 여러 호스트와 네트워크 세그먼트에 걸쳐 수십 단계를 연결해야 하는, 인간 전문가조차 수십 시간, 수일 또는 수주가 걸리는 지속적인 작전을 필요로 합니다. 이를 측정하기 위한 첫 번째 단계로, 우리는 초기 정찰부터 전체 네트워크 장악에 이르는 32단계 기업 네트워크 공격 시뮬레이션인 'The Last Ones'(TLO)를 구축했습니다. 이 작업은 인간에게 약 20시간이 소요될 것으로 추정됩니다. 이 훈련장에 대한 자세한 설명은 우리의 최근 논문에서 찾을 수 있습니다.

Claude Mythos Preview는 10번의 시도 중 3번에서 처음부터 끝까지 TLO를 해결한 최초의 모델이 되었습니다. 모든 시도를 통틀어 이 모델은 32단계 중 평균 22단계를 완료했습니다. 다음으로 성능이 좋았던 Claude Opus 4.6은 평균 16단계를 완료했습니다.

물론 Mythos Preview는 우리의 평가 범위 내에서 일부 사이버 역량의 한계도 보여주었습니다. 운영기술(OT)에 중점을 둔 사이버 훈련장인 'Cooling Tower'는 완료하지 못했지만, 이 결과가 반드시 이 모델이 OT 환경에서 공격을 실행하는 데 서툴다는 것을 의미하지는 않습니다. 해당 모델은 이 훈련장의 IT 섹션에서 계속 막혀있었습니다.

우리는 더 많은 추론 컴퓨팅(inference compute)이 주어진다면 평가 성능이 계속 향상될 것으로 기대합니다. 우리는 1억 토큰 예산으로 사이버 훈련장을 실행했습니다. Mythos Preview의 성능은 이 한계까지 계속 확장되었으며, 이를 넘어서도 성능 향상이 계속될 것으로 예상합니다. 이 현상에 대한 자세한 내용은 사이버 작업에서의 추론 스케일링에 관한 우리의 최근 블로그 게시물을 참조하십시오.

시사점 하나의 사이버 훈련장에서의 성공은 Mythos Preview가 적어도 네트워크에 접근 권한을 얻은 상태에서 방어가 약하고 취약한 중소 기업 시스템을 자율적으로 공격할 수 있는 능력이 있음을 나타냅니다. 그러나 우리의 훈련장은 실제 환경과 중요한 차이점이 있으며, 이로 인해 공격이 더 쉬운 표적이 됩니다. 예를 들어, 적극적인 방어자나 방어 도구와 같이 실제 환경에 자주 존재하는 보안 기능이 없습니다. 또한 모델이 보안 경보를 유발할 만한 행동을 하더라도 페널티가 없습니다. 이는 Mythos Preview가 방어가 철저한 시스템도 공격할 수 있는지 확실히 말할 수 없다는 것을 의미합니다.

결론적으로, 공격자가 네트워크 접근 권한을 부여하고 모델이 방어가 약한 시스템을 자율적으로 공격하도록 지시할 수 있는 환경에서는 사이버 보안 평가가 더욱 중요해지고 있습니다.

원문 보기

원문 보기 (영어)

The AI Security Institute (AISI) conducted evaluations of Anthropic’s Claude Mythos Preview (announced on 7th April) to assess its cybersecurity capabilities. Our results show that Mythos Preview represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving. We have tracked AI cyber capabilities since 2023 , building progressively harder evaluations to keep pace with AI progress — from chat-based probing, to capture-the-flag challenges, to the multi-step cyber-attack simulations described below. Two years ago, the best available models could barely complete beginner-level cyber tasks. Now, in controlled evaluations where Mythos Preview was explicitly directed and given network access to do so, we observed that it could execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously – tasks that would take human professionals days of work. In this blog post, we summarise results of cyber evaluations we ran on Mythos Preview. These include both capture-the-flag (CTF) challenges and more complex ranges designed to simulate multi-step attack scenarios. Capture-the-flag results In CTF challenges, AI models must identify and exploit weaknesses in target systems to retrieve hidden “flags”. The chart below shows Mythos Preview’s performance on our cyber CTF suite compared to other models. Each point represents a model's average success rate at a given difficulty level. On expert-level tasks — which no model could complete before April 2025 — Mythos Preview succeeds 73% of the time. Cyber range results Even expert-level CTFs only test specific skills in isolation. Real-world cyber-attacks require chaining dozens of steps together across multiple hosts and network segments — sustained operations that take human experts many hours, days, or weeks to complete. As a first step towards measuring this, we built "The Last Ones" (TLO): a 32-step corporate network attack simulation spanning initial reconnaissance through to full network takeover, which we estimate to require humans 20 hours to complete. A more detailed description of the range can be found in our recent paper . Claude Mythos Preview is the first model to solve TLO from start to finish, in 3 out of its 10 attempts. Across all its attempts, the model completed an average of 22 out of 32 steps. Claude Opus 4.6 is the next best performing model and completed an average of 16 steps. Mythos Preview did also show some cyber capability limitations within the limits of our evaluation. It could not complete our operational technology focused cyber range ‘Cooling Tower’, though this result does not necessarily show that the model is bad at executing attacks in operational technology (OT) environments; the model got stuck on IT sections of this range. We expect that performance on our evaluations would continue to improve with more inference compute: we ran the cyber ranges with a 100M token budget; Mythos Preview’s performance continues to scale up to this limit, and we expect performance improvements would continue beyond that. For more on this phenomenon, see our recent blog post on inference scaling in cyber tasks. Implications Mythos Preview’s success on one cyber range indicates that is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained. However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts. This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems. In a regime where attackers can direct and provide network access to models to conduct autonomous attacks on poorly defended systems, cybersecurity evaluations must evolve. As capabilities continue to improve, evaluation environments that lack defences will no longer be challenging enough to discriminate between the capabilities of the most cyber-capable models or assess trends. Our future work will involve evaluating capabilities using ranges simulating hardened and defended environments, including ranges with active monitoring, endpoint detection and real-time incident response. We will also be tracking how AI-enabled vulnerability discovery and penetration testing campaigns perform on real-world systems. What organisations should do now Our testing shows that Mythos Preview can exploit systems with weak security posture, and it is likely that more models with these capabilities will be developed. This highlights the importance of cybersecurity basics, such as regular application of security updates, robust access controls, security configuration, and comprehensive logging. Our colleagues at the National Cyber Security Centre (NCSC) run the Cyber Essentials scheme to help organisations protect themselves against common online threats, whether those threats are AI assisted or not. For the latest cybersecurity advice, visit the NCSC website . Future frontier models will be more capable still, so investment now in cyber defence is vital. AI cyber capabilities are dual use; while they pose security challenges, they can also help deliver game-changing improvements in defence. We recently released a joint blog post with NCSC on how cyber defenders can both harness and prepare for frontier AI. ‍

사이버 보안 AI 평가 클로드 자율 공격 취약점 분석