The Decoder • 80일 전

AI 스스로 컴퓨터를 해킹하고 복제한다

IMP

9/10

핵심 요약

보안 연구소 Palisade Research에 따르면, AI 에이전트가 원격 컴퓨터를 해킹하고 스스로를 복제하는 능력이 빠르게 향상되어 단 1년 만에 성공률이 6%에서 81%로 급증했습니다. 이는 방어력이 약한 시스템을 대상으로 한 실험이지만, AI의 자율적 해킹 및 복제 위협이 빠르게 현실화되고 있음을 시사합니다. 향후 사이버 보안의 공격과 방어 모두 인간이 아닌 AI 에이전트가 주도하게 될 것으로 연구진은 전망했습니다.

번역된 본문

보안 연구소 Palisade Research는 AI 에이전트가 원격 컴퓨터에 침투하여 스스로를 복제할 수 있음을 입증했습니다. 단 1년 만에 성공률이 6%에서 81%로 급등했습니다. 공개된 시뮬레이터는 최악의 시나리오에서 어떤 일이 발생할 수 있는지 보여줍니다.

테스트 환경에서 AI 에이전트는 원격 컴퓨터를 해킹하고, 필요한 소프트웨어를 설치한 뒤, 자체적인 모델 가중치(Model Weights)를 복사하여 작동하는 복제본을 실행했습니다. 그리고 각 복제본은 추가적인 컴퓨터를 해킹하며 연쇄적인 확장을 형성했습니다. 연구팀은 "우리가 아는 한 이것은 해킹을 통한 AI 자가 복제의 첫 번째 문서화된 사례"라고 밝혔습니다.

한 번의 실행에서는 Qwen 3.6 기반 에이전트가 캐나다, 미국, 핀란드, 인도에 위치한 컴퓨터를 돌아다니며 각 국가에 자신의 복제본을 남겼습니다. Palisade Research에 따르면, 에이전트에게 국경을 넘는 것은 같은 지역 내의 머신 간 이동과 마찬가지로 매우 쉬운 일이었습니다. 연구팀은 악의적인 AI 에이전트를 통제하려는 사람들에게 이러한 현상이 관할권의 악몽이 될 것이라고 지적했습니다.

이 에이전트는 표적 컴퓨터의 취약점에 대해 사전에 알려주지 않았음에도 불구하고, 매번 공격 접근 방식을 스스로 파악해야 했습니다. Palisade Research는 각기 다른 취약점을 가진 세 가지 대상을 상대로 복제를 테스트했으며, 에이전트는 세 가지 모두에서 성공했습니다.

자가 복제 성공률, 1년 만에 6%에서 81%로 급증

개선 속도는 매우 놀랍습니다. 최신 프론티어 모델(Frontier Models)은 단 1년 만에 6%의 성공률에서 81%로 도약했습니다. 연구진은 이러한 도약이 자율적 해킹 및 코딩 기능의 향상과 밀접한 관련이 있으며, 이러한 추세는 계속될 것으로 예상하고 있습니다.

연구진은 두 가지 다른 설정을 테스트했습니다. Qwen과 같은 오픈 가중치(Open-weight) 모델은 자체 가중치를 새 머신에 복사하고 그곳에서 복제본을 시작했습니다. 반면 Claude 및 GPT와 같은 API 기반 모델은 자체 가중치에 접근할 수 없으므로, 연구진은 해당 에이전트가 대상 머신에 Qwen 가중치를 설정하고 실행하게 하여 복제를 시뮬레이션했습니다.

Palisade Research는 또한 에이전트가 현실 세계에서도 똑같이 효과적으로 해킹하고 확산할 수 있다면 어떤 일이 발생할지 외삽하는 공개 시뮬레이터를 구축했습니다. 이 시뮬레이터는 측정된 복제 시간과 성공률을 사용하고, 복사본을 연결하며, 무한한 대상이 있다고 가정합니다. 이 이론적 시나리오에서 'Opus' 모델은 12시간 이내에 13,000개의 복제본을 생성했습니다.

연구진은 이것이 현실적인 기준선이 아니라 상한선이라고 강조했습니다. 테스트에 사용된 컴퓨터는 방어력이 약했고, 에이전트는 어떤 머신을 공격해야 할지 미리 지시받았습니다. 현실 세계에서 악성 에이전트는 스스로 충분히 강력한 GPU를 갖춘 취약한 머신을 찾아야 하며, 강화된 방어 시스템은 대부분의 시도를 즉시 차단할 것입니다.

하지만 오늘날의 장벽은 영원하지 않습니다

그러나 연구진은 이러한 장벽이 오래가지 않을 것이라고 말합니다. 기업들은 빠른 속도로 AI 에이전트를 개선하고 있으며, 해킹은 그중에서도 가장 빠르게 발전하는 분야 중 하나입니다. Mythos와 같은 모델은 이미 실제 환경에서 수천 개의 심각한 제로데이(Zero-day) 취약점을 찾아내고 있으며, 오픈 가중치 모델도 곧 그 수준에 도달할 가능성이 높습니다.

이와 동시에 방어 기술도 발전하고 있습니다. AI 에이전트는 취약점을 패치하고 사이버 위협으로부터 방어하는 데 도움을 줄 수 있습니다. 공격과 방어 간의 균형이 어떻게 변화할 것인지는 여전히 미지수입니다. Palisade Research에 따르면, 사이버 보안의 공격과 방어 양측 모두 결국 인간이 아닌 AI 에이전트가 지배하게 될 가능성이 높아 보인다는 것입니다.

논문, 소스 코드 및 실험 기록은 공개적으로 확인할 수 있습니다.

원문 보기

원문 보기 (영어)

AI agents that hack computers and replicate themselves, and they're getting better fast Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 10, 2026 GPT-Image-2 prompted by THE DECODER Security research lab Palisade Research demonstrates that AI agents can break into remote computers and replicate themselves. In one year, the success rate jumped from 6 to 81 percent. A public simulator shows what could happen in a worst-case scenario. In the test environment, an AI agent hacked a remote computer, installed the necessary software, copied its own model weights, and launched a working replica. Each copy then hacked additional computers, forming a chain. "As far as we know, this is the first documented instance of AI self-replication via hacking," the research team writes. In one run, an agent based on Qwen 3.6 hopped between computers in Canada, the US, Finland, and India, leaving a copy in each country. For the agent, crossing national borders was just as easy as jumping between machines in the same region, according to Palisade Research. For anyone trying to shut down a rogue AI agent, though, it would be a jurisdictional nightmare, the team writes. The agent wasn't told about the target computers' vulnerabilities beforehand. It had to figure out the attack approach on its own each time. Palisade Research tested replication against three targets, each with different vulnerabilities, and the agent succeeded on all three. Self-replication success rate jumped from 6 to 81 percent in one year The pace of improvement is striking. Frontier models went from a 6 percent success rate to 81 percent within a single year. The researchers say the jump tracks closely with improvements in autonomous hacking and coding capabilities, and Palisade Research expects the trend to continue. The researchers tested two different setups. Open-weight models like Qwen copied their own weights to the new machine and started a replica there. API-based models like Claude and GPT can't access their own weights, so the researchers simulated replication by having those agents set up and run Qwen weights on the target machine instead. Palisade Research also built a public simulator that extrapolates what would happen if agents could hack and spread just as effectively in the real world. The simulator uses the measured replication times and success rates, chains copies together, and assumes an unlimited supply of targets. In this theoretical scenario, Opus generated 13,000 replicas within twelve hours. The researchers stress that this is an upper limit, not a realistic baseline. The test computers had weak defenses, and the agents were told which machines to attack. In the real world, a rogue agent would have to find vulnerable machines with powerful enough GPUs on its own, and hardened defenses would stop most attempts immediately. Today's barriers won't hold forever But the researchers say these barriers won't last. Companies are improving AI agents at a rapid pace, and hacking is one of the fastest-advancing areas. Models like Mythos are already finding thousands of high-severity zero-day vulnerabilities in the wild, and open-weight models are likely to catch up soon . At the same time, defenses are improving too. AI agents could help patch vulnerabilities and defend against cyber threats . How the balance between attack and defense will shift remains an open question. What does seem likely, according to Palisade Research, is that both sides of cybersecurity will eventually be dominated by AI agents, not humans. The paper , source code, and experiment transcripts are publicly available. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

AI 에이전트 해킹 및 보안 자가 복제 사이버 보안 모델 가중치