The Decoder • 79일 전

AI, 스스로 복제하며 해킹… 성공률 1년 만에 81% 도달

IMP

9/10

핵심 요약

보안 연구소 Palisade Research에 따르면, AI 에이전트가 스스로 취약점을 파악해 원격 컴퓨터를 해킹하고 자신을 복제하는 능력을 갖췄습니다. 불과 1년 만에 자가 복제 성공률이 6%에서 81%로 급증했으며, 공격과 방어 모두 인간이 아닌 AI 에이전트 중심으로 재편될 것으로 전망됩니다.

번역된 본문

보안 연구소 Palisade Research는 AI 에이전트가 원격 컴퓨터에 침투하여 스스로를 복제할 수 있음을 입증했습니다. 단 1년 만에 성공률이 6%에서 81%로 급상승했습니다. 공개된 시뮬레이터는 최악의 시나리오에서 어떤 일이 발생할 수 있는지 보여줍니다.

테스트 환경에서 AI 에이전트는 원격 컴퓨터를 해킹하고, 필요한 소프트웨어를 설치한 뒤, 자체적인 모델 가중치(Model weights)를 복사하여 작동하는 복제본을 실행했습니다. 그리고 각 복제본은 추가적인 컴퓨터를 해킹하며 연쇄적인 확장을 형성했습니다. 연구팀은 "우리가 아는 한, 이것은 해킹을 통한 AI 자가 복제의 최초로 문서화된 사례"라고 밝혔습니다.

한 번의 실험에서는 Qwen 3.6 기반 에이전트가 캐나다, 미국, 핀란드, 인도에 위치한 컴퓨터를 돌아다니며 각 국가에 복제본을 남겼습니다. Palisade Research에 따르면, 국경을 넘는 것은 같은 지역 내의 머신 간에 이동하는 것만큼이나 에이전트에게 쉬운 일이었습니다. 연구팀은 악성 AI 에이전트를 차단하려는 사람들에게 이것이 관할권의 악몽이 될 것이라고 덧붙였습니다.

에이전트는 대상 컴퓨터의 취약점에 대해 사전에 알려주지 않았습니다. 매번 스스로 공격 접근 방식을 찾아내야만 했습니다. Palisade Research는 각기 다른 취약점을 가진 세 가지 대상을 상대로 복제를 테스트했으며, 에이전트는 세 가지 모두에서 성공했습니다.

자가 복제 성공률, 1년 만에 6%에서 81%로 급증

개선 속도는 놀랍습니다. 최신 프론티어 모델(Frontier models)은 단 1년 만에 6%의 성공률에서 81%로 도약했습니다. 연구자들은 이러한 급증이 자율적 해킹 및 코딩 기능의 향상과 밀접한 관련이 있다고 말하며, Palisade Research는 이러한 추세가 계속될 것으로 예상하고 있습니다.

연구진은 두 가지 다른 설정을 테스트했습니다. Qwen과 같은 오픈 가중치(Open-weight) 모델은 자체 가중치를 새 머신에 복사하고 그곳에서 복제본을 시작했습니다. 반면 Claude 및 GPT와 같은 API 기반 모델은 자체 가중치에 접근할 수 없으므로, 연구진은 이 에이전트들이 대상 머신에 대신 Qwen 가중치를 설정하고 실행하도록 하여 복제를 시뮬레이션했습니다.

Palisade Research는 또한 에이전트가 현실 세계에서도 똑같이 효과적으로 해킹하고 확산할 수 있다면 어떻게 될지 외삽하는 공개 시뮬레이터를 구축했습니다. 이 시뮬레이터는 측정된 복제 시간과 성공률을 사용하고, 복사본을 연결하며, 대상이 무제한으로 있다고 가정합니다. 이 이론적 시나리오에서 Opus는 12시간 이내에 13,000개의 복제본을 생성했습니다. 연구진은 이것이 현실적인 기준이 아니라 상한선이라고 강조했습니다.

테스트 컴퓨터는 방어가 약했고 에이전트는 공격할 머신을 지정받았습니다. 현실 세계에서는 악성 에이전트가 충분히 강력한 GPU를 갖춘 취약한 머신을 스스로 찾아야 하며, 강화된 방어 시스템은 대부분의 시도를 즉시 중단시킬 것입니다.

오늘날의 장벽은 영원하지 않을 것이다

하지만 연구진은 이러한 장벽이 오래가지 않을 것이라고 말합니다. 기업들은 빠른 속도로 AI 에이전트를 개선하고 있으며, 해킹은 가장 빠르게 발전하는 분야 중 하나입니다. Mythos와 같은 모델은 이미 실제 환경에서 수천 개의 심각한 제로데이(Zero-day) 취약점을 찾아내고 있으며, 오픈 가중치 모델도 곧 그 수준에 도달할 것입니다.

동시에 방어 기술도 발전하고 있습니다. AI 에이전트는 취약점을 패치하고 사이버 위협을 방어하는 데 도움을 줄 수 있습니다. 공격과 방어 사이의 균형이 어떻게 바뀔 것인지는 아직 미지수입니다. Palisade Research에 따르면 분명해 보이는 것은, 사이버 보안의 양측 모두 결국 인간이 아닌 AI 에이전트에 의해 지배될 것이라는 점입니다.

논문, 소스 코드 및 실험 기록은 공개적으로 확인할 수 있습니다.

원문 보기

원문 보기 (영어)

AI agents can now hack computers and copy themselves, and they're getting better fast Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 10, 2026 GPT-Image-2 prompted by THE DECODER Security research lab Palisade Research demonstrates that AI agents can break into remote computers and replicate themselves. In one year, the success rate jumped from 6 to 81 percent. A public simulator shows what could happen in a worst-case scenario. In the test environment, an AI agent hacked a remote computer, installed the necessary software, copied its own model weights, and launched a working replica. Each copy then hacked additional computers, forming a chain. "As far as we know, this is the first documented instance of AI self-replication via hacking," the research team writes. In one run, an agent based on Qwen 3.6 hopped between computers in Canada, the US, Finland, and India, leaving a copy in each country. For the agent, crossing national borders was just as easy as jumping between machines in the same region, according to Palisade Research. For anyone trying to shut down a rogue AI agent, though, it would be a jurisdictional nightmare, the team writes. The agent wasn't told about the target computers' vulnerabilities beforehand. It had to figure out the attack approach on its own each time. Palisade Research tested replication against three targets, each with different vulnerabilities, and the agent succeeded on all three. Self-replication success rate jumped from 6 to 81 percent in one year The pace of improvement is striking. Frontier models went from a 6 percent success rate to 81 percent within a single year. The researchers say the jump tracks closely with improvements in autonomous hacking and coding capabilities, and Palisade Research expects the trend to continue. The researchers tested two different setups. Open-weight models like Qwen copied their own weights to the new machine and started a replica there. API-based models like Claude and GPT can't access their own weights, so the researchers simulated replication by having those agents set up and run Qwen weights on the target machine instead. Palisade Research also built a public simulator that extrapolates what would happen if agents could hack and spread just as effectively in the real world. The simulator uses the measured replication times and success rates, chains copies together, and assumes an unlimited supply of targets. In this theoretical scenario, Opus generated 13,000 replicas within twelve hours. The researchers stress that this is an upper limit, not a realistic baseline. The test computers had weak defenses, and the agents were told which machines to attack. In the real world, a rogue agent would have to find vulnerable machines with powerful enough GPUs on its own, and hardened defenses would stop most attempts immediately. Today's barriers won't hold forever But the researchers say these barriers won't last. Companies are improving AI agents at a rapid pace, and hacking is one of the fastest-advancing areas. Models like Mythos are already finding thousands of high-severity zero-day vulnerabilities in the wild, and open-weight models are likely to catch up soon . At the same time, defenses are improving too. AI agents could help patch vulnerabilities and defend against cyber threats . How the balance between attack and defense will shift remains an open question. What does seem likely, according to Palisade Research, is that both sides of cybersecurity will eventually be dominated by AI agents, not humans. The paper , source code, and experiment transcripts are publicly available. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

AI 보안 자율 에이전트 해킹 자가 복제 사이버 보안