The Decoder • 75일 전

마이크로소프트, 100개 이상의 AI 에이전트로 윈도우 취약점 발견

IMP

8/10

핵심 요약

마이크로소프프트가 100개 이상의 전문 AI 에이전트를 활용해 소프트웨어 취약점을 자동 탐지하는 시스템인 MDASH를 발표했습니다. 이 시스템은 이미 윈도우 운영체제 내에서 16개의 새로운 보안 취약점을 발견했으며, 그중 4개는 치명적인 원격 코드 실행(RCE) 취약점으로 분류되었습니다. 기존 단일 모델 방식과 달리 다양한 AI 모델을 협업시키는 이 멀티 에이전트 프레임워크는 복잡한 대규모 소프트웨어의 보안 감사 방식을 크게 변화시킬 수 있다는 점에서 중요하게 평가됩니다.

번역된 본문

마이크로소프트가 100개 이상의 전문 AI 에이전트를 사용해 소프트웨어 취약점을 탐지하는 에이전트 기반 멀티 모델 시스템을 구축했습니다.

MDASH(Multi-Model Agentic Scanning Harness)라는 이 보안 시스템은 소프트웨어의 보안 취약점을 자동으로 찾아내도록 설계되었습니다. 마이크로소프트에 따르면, Claude Mythos(미토스)와 같은 단일 AI 모델에 의존하는 접근 방식과 달리 MDASH는 최첨단(Frontier) 모델과 증류(Distilled) 모델 앙상블에 걸쳐 100개 이상의 전문 AI 에이전트를 조율합니다.

마이크로소프트는 2026년 5월 12일 패치 튜즈데이를 통해 MDASH가 발견한 윈도우 네트워킹 및 인증 스택의 16가지 새로운 취약점(CVE)을 보고했습니다. 회사는 이 중 4가지를 심각(Critical)한 수준으로 분류했으며, 여기에는 tcpip.sys 커널 구성 요소, IKEv2 서비스(ikeext.dll), netlogon.dll 및 dnsapi.dll의 원격 코드 실행 취약점이 포함되어 있습니다.

마이크로소프트는 16개의 취약점 중 10개가 커널 모드에 영향을 미치며, 대부분 인증 없이 네트워크에서 직접 접근할 수 있다고 밝혔습니다. 또한 윈도우, Hyper-V, Azure 등 자사 코드베이스는 독점적이며 공개 학습 데이터에 포함되어 있지 않기 때문에 감사(audit)가 특히 더 어렵다고 지적했습니다.

100개 이상의 에이전트가 취약점의 진위를 놓고 논쟁

이 시스템은 4단계 파이프라인으로 작동합니다. 먼저 소스 코드를 분석하여 공격 표면(Attack surface)을 매핑합니다. 그런 다음 전문 감사(Auditor) 에이전트가 코드를 스캔하여 의심스러운 영역을 찾습니다. 세 번째 단계에서는 마이크로소프트가 "토론자(Debaters)"라고 부르는 두 번째 에이전트 그룹이 각 발견 사항의 악용 가능성에 대해 찬반 논쟁을 벌입니다. 중복된 항목은 병합된 후, 최종 단계에서 증거 리더(Evidence Leader) 에이전트가 특정 입력을 통해 취약점을 실제로 트리거(유발)하려고 시도합니다.

이 파이프라인은 모델에 구애받지 않는 구조(Model-agnostic)입니다. 새로운 모델이 나오면 구성을 변경하는 것만으로 이전 모델과 비교 테스트를 할 수 있습니다. 또한 플러그인을 통해 전문가들이 기반 모델(Foundation model)이 자체적으로 알지 못하는 커널 호출 규약(Kernel calling conventions)이나 IPC 신뢰 경계(Trust boundaries)와 같은 도메인별 지식을 시스템에 직접 제공할 수 있습니다.

최고의 벤치마크 점수, 하지만 공정한 비교는 아니다

1,507개의 실제 취약점으로 구성된 공개 벤치마크인 CyberGym에서 이 시스템은 88.45%의 점수를 기록하며 리더보드 1위를 차지했으며, 2위 모델보다 약 5점 높은 성과를 보였습니다.

하지만 이 비교는 다소 오해의 소지가 있습니다. 마이크로소프트가 하나의 완전한 프레임워크를 개별 단일 모델들과 경쟁시키고 있기 때문입니다. 개별 모델들 역시 이와 유사한 프레임워크에 통합된다면 훨씬 더 높은 점수를 기록할 가능성이 높습니다. 마이크로소프트는 이 점수를 달성하기 위해 어떤 모델을 사용했는지 블로그 게시물에서 공개하지 않았습니다. 회사는 강력한 추론 능력을 가진 'SOTA(State-of-the-Art) 모델', 저비용 토론자 역할을 하는 '증류(Distilled) 모델', 독립적인 상대 역할을 하는 '두 번째 개별 SOTA 모델'만을 언급할 뿐입니다. 이들이 OpenAI, Anthropic, 마이크로소프트 자체 연구소, 또는 타사 제공업체 중 어디에서 온 것인지는 여전히 불분명합니다.

MDASH는 마이크로소프트의 자율 코드 보안 팀(Autonomous Code Security Team)이 지원합니다. 마이크로소프트에 따르면 이 팀의 일부 멤버는 DARPA AI 사이버 챌린지 우승팀인 '팀 애틀랜타(Team Atlanta)' 출신입니다. 이 대회를 위해 팀은 복잡한 오픈소스 프로젝트에서 버그를 탐지하고 수정하는 자율 사이버 추론 시스템을 구축했었습니다.

현재 MDASH는 외부 고객을 위한 제한된 비공개 프리뷰(Private preview) 상태로 제공됩니다. 보다 자세한 기술 보고서는 마이크로소프트 블로그에서 확인할 수 있습니다.

OpenAI 및 Anthropic과 같은 다른 기업들 역시 AI 사이버 보안 분야에 더욱 깊이 진출하며 자체 기술을 활용하려는 움직임을 보이고 있습니다.

원문 보기

원문 보기 (영어)

Microsoft pits more than 100 AI agents against each other to find Windows vulnerabilities Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 14, 2026 GPT-Image-2 prompted by THE DECODER Key Points Microsoft has introduced MDASH, an AI-powered security system that uses more than 100 specialized agents to automatically detect software vulnerabilities. The system has already uncovered 16 new security vulnerabilities in Windows, four of them classified as critical. MDASH scored 88.45 percent on the CyberGym benchmark—the highest result to date—though Microsoft hasn't disclosed which specific AI models power the system. Ask about this article… Search Microsoft has built an agentic multi-model system that uses more than 100 specialized AI agents to detect software vulnerabilities. The security system, called MDASH (Multi-Model Agentic Scanning Harness), is designed to automatically find security vulnerabilities in software. Unlike approaches that rely on a single AI model like Claude Mythos , MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models, according to Microsoft. On Patch Tuesday, May 12, 2026, Microsoft reported 16 new vulnerabilities (CVEs) in the Windows networking and authentication stack that MDASH discovered. The company classifies four of these as critical, including remote code execution vulnerabilities in the tcpip.sys kernel component, the IKEv2 service (ikeext.dll), netlogon.dll, and dnsapi.dll. Ad Ten of the 16 vulnerabilities affect kernel mode, and most are accessible from the network without authentication, Microsoft says. The company points out that its own code base is especially hard to audit: Windows, Hyper-V, and Azure are proprietary and aren't part of public training data. Ad DEC_D_Incontent-1 More than 100 agents debate whether vulnerabilities are real The system works in a four-stage pipeline. First, it analyzes the source code and maps the attack surface. Specialized auditor agents then scan the code for suspicious areas. In the third stage, a second group of agents, which Microsoft calls "debaters," argue for and against the exploitability of each finding. Duplicates are then merged before Evidence Leader agents attempt to trigger the vulnerability through specific inputs in the final stage. The pipeline is model-agnostic: when a new model comes out, it can be tested against the previous one just by changing the configuration. Plugins let experts feed in domain-specific knowledge, like kernel calling conventions or IPC trust boundaries, that no foundation model knows on its own. Ad Top benchmark score, but the comparison isn't apples to apples On the public CyberGym benchmark with 1,507 real vulnerabilities, the system scored 88.45 percent, the top result on the leaderboard, roughly five points ahead of the next best model. The comparison is misleading, though, since Microsoft is pitting an entire framework against individual models, which would also likely score higher if wrapped in a similar harness . The blog post doesn't reveal which models Microsoft used to achieve this score. The company only refers to "SOTA models" as heavy reasoners, "distilled models" as low-cost debaters, and a "second separate SOTA model" as an independent counterpart. Whether these come from OpenAI, Anthropic, Microsoft's own labs, or third-party providers remains unclear. Ad DEC_D_Incontent-2 MDASH is backed by Microsoft's Autonomous Code Security Team. Some of its members come from Team Atlanta, the winner of the DARPA AI Cyber Challenge , according to Microsoft. For that competition, the team built an autonomous cyber reasoning system that detected and fixed bugs in complex open-source projects. MDASH is currently available in a limited private preview for external customers. A detailed technical report is available on the Microsoft blog . Ad Other companies like OpenAI and Anthropic are also pushing deeper into AI cybersecurity , aiming to use their models to defend against the very threats that AI systems themselves have helped amplify. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Microsoft

AI 보안 소프트웨어 취약점 탐지 멀티 에이전트 시스템 마이크로소프트 AI 코딩 에이전트