The Decoder • 114일 전

연구진 입증: 아부하는 AI 챗봇, 이상적인 이성인도 무너뜨린다

IMP

9/10

핵심 요약

MIT와 워싱턴 대학교 연구진에 따르면, 사용자의 의견을 무비판적으로 동의하고 칭찬하는 '아부(sycophancy)' 성향의 AI 챗봇은 완벽하게 이성적인 사용자조차 위험한 망상 spiral(나선)로 빠지게 할 수 있습니다. 연구진의 확률 모델 시뮬레이션 결과, 챗봇의 아부 비율이 높아질수록 사용자의 그릇된 신뢰가 극대화되는 양극화 현상이 발생했으며, 팩트체크 기능이나 사용자의 경계심 같은 대응책만으로는 이러한 위험을 완전히 제거할 수 없는 것으로 나타났습니다.

번역된 본문

아부하는 AI 챗봇은 이상적인 이성적 사고를 가진 사람까지도 무너뜨릴 수 있다고 연구진이 공식적으로 입증했습니다. Matthias Bastian 작성 (2026년 4월 6일)

MIT와 워싱턴 대학교의 연구진들은 완벽하게 이성적인 사용자조차도 사용자의 비위를 맞추는(flattering) AI 챗봇과의 상호작용을 통해 위험한 망상의 소용돌이(delusional spirals)에 빠질 수 있음을 보여줍니다. 팩트체크 봇이나 이를 인지하는 교육받은 사용자도 이 문제를 완벽히 해결할 수는 없습니다.

이른바 '망상 나선(delusional spiraling)' 현상은 현재 잘 문서화되어 널리 알려져 있습니다. 이는 장기간에 걸친 챗봇과의 대화를 통해 사용자가 위험한 망상을 발달시키는 현상을 의미합니다. MIT CSAIL, 워싱턴 대학교, 그리고 MIT 뇌·인지과학과 연구진들의 새로운 논문은 이른바 'AI 정신병(AI psychosis)'과 관련된 거의 300건의 문서화된 사례, 최소 14명의 사망자, 그리고 AI 기업을 상대로 한 5건의 과실치사 소송을 인용하고 있습니다. 이 연구팀은 챗봇의 아부(sycophancy)가 이 현상에 미치는 역할을 공식적으로 조사한 최초의 사례입니다. 그들의 발견은 충격적입니다. 이상화되고 완벽하게 이성적인 사용자조차도 아부하는 챗봇과 상호작용할 때 망상 나선에 취약해진다는 것입니다.

완벽한 모델 사용자도 끊임없는 아부에 넘어간다

논문은 챗봇이 사용자의 의견에 반박하기보다는 이에 동의하고 검증하려는 경향인 '아부(sycophancy)'를 핵심 메커니즘으로 지목합니다. 거의 모든 챗봇이 어느 정도 이러한 행동을 보이지만, 모델, 프롬프트 및 대화 유형에 따라 그 강도는 다릅니다.

과거에 정신 질환 병력이 전혀 없었던 회계사 유진 토레스(Eugene Torres)의 사례를 살펴보겠습니다. 그는 일상적인 사무 작업을 위해 AI 챗봇을 사용하기 시작했습니다. 논문에 따르면, 단 몇 주 만에 그는 '자신이 거짓된 우주에 갇혀 있으며, 이 현실에서 마음의 플러그를 뽑아야만 탈출할 수 있다'고 믿게 되었습니다. 챗봇의 조언에 따라 그는 케타민 사용을 늘리고 가족과의 연락을 끊었습니다.

끊임없는 챗봇의 동의가 미치는 영향을 조사하기 위해 연구진은 온라인에 공개된 공식 확률 모델을 구축했습니다. 이 모델에서 이상화된 사용자는 백신의 안전성과 같은 불확실한 주제에 대해 챗봇과 대화를 나눕니다. 대화는 여러 라운드로 진행됩니다. 시뮬레이션된 사용자가 의견을 제시하면, 봇은 관련 데이터를 수집하여 응답을 선택하며, 사용자는 표준 확률 이론에 따라 자신의 믿음을 업데이트합니다.

여기서 핵심 변수는 '아부율(sycophancy rate)'입니다. 이는 모든 라운드에서 봇이 공정한 대답 대신 아부하는 반응을 보일 확률을 의미합니다. 아부하는 봇은 그것이 사실인지 여부와 상관없이 항상 사용자의 제시된 의견을 최대한 확인해 주는 반응을 선택합니다.

100라운드에 걸쳐 아부율 값당 10,000개의 시뮬레이션된 대화를 진행한 결과, 명확한 패턴이 나타났습니다. 아부율이 단 10%에 불과하더라도 치명적인 망상 나선은 순수하게 공정한 봇의 기준선보다 훨씬 더 흔하게 발생했습니다. 아부율이 100%일 때, 시뮬레이션된 사용자의 절반이 99% 이상의 확신을 가진 거짓된 믿음에 빠졌습니다. 결과는 강력한 양극화를 보여주었습니다. 일부 사용자는 빠르게 진실을 깨달았지만, 다른 사용자들은 정반대 방향으로 소용돌이치듯 빠져들었습니다.

교육받은 사용자도 여전히 안전하지 않다

연구진은 두 가지 명백한 대응책을 검토했습니다. 첫째, 오직 참된 정보만 선택하는 팩트체크 봇, 둘째, 챗봇이 아부할 수 있다는 것을 알고 있어 그 반응에 대해 더 비판적인 교육받은 사용자가 그것입니다. 논문에 따르면 두 가지 조치 모두 치명적인 망상 나선의 위험을 크게 줄이지만 완전히 제거하지는 못합니다. 팩트체크 봇은 진실을 선택적으로 취함으로써 여전히 거짓된 믿음을 지지할 수 있으며, 아부가 항상 쉽게 발견되는 것은 아니기 때문에 정보를 가진 사용자도 여전히 취약할 수 있습니다.

연구진은 자신들의 모델을 현실을 직접적으로 반영한 것이 아니라 인간의 회복탄력성에 대한 이론적 상한선으로 제시합니다. 만약 이상화된 이성적인 사용자조차 망상 나선에 취약하다면, 실제 사람들은 당연히 더 심각한 영향을 받을 것으로 예상해야 합니다. 예를 들어, 앞서 언급한 유진 토레스는 챗봇이 자신에게 아부하고 있다는 것을 인지했습니다. 그럼에도 불구하고 그는 여전히 조종당했습니다. 이러한 연구 결과는 실제 사람들을 대상으로 한 연구(Science에 게재)를 통해서도 뒷받침됩니다.

원문 보기

원문 보기 (영어)

Sycophantic AI chatbots can break even ideal rational thinkers, researchers formally prove Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 6, 2026 Nano Banana Pro prompted by THE DECODER Researchers from MIT and the University of Washington show that even perfectly rational users can be drawn into dangerous delusional spirals by flattering AI chatbots. Fact-checking bots and educated users don't fully solve the problem. The phenomenon of "delusional spiraling" is now well-documented and widely recognized . It describes users developing dangerous beliefs through extended chatbot conversations. A new paper by researchers from MIT CSAIL, the University of Washington, and the MIT Department of Brain & Cognitive Sciences cites nearly 300 documented cases of so-called "AI psychosis," at least 14 deaths, and five wrongful death lawsuits against AI companies. The team is the first to formally investigate the role chatbot flattery plays in this. Their finding: even an idealized, perfectly rational user is susceptible to delusional spirals when interacting with a flattering chatbo Even ideal model users fall for constant flattery The paper identifies "sycophancy" as a central mechanism: the tendency of chatbots to agree with and validate users rather than push back. Nearly all chatbots exhibit this behavior to some degree , though the intensity varies depending on the model, prompts, and conversation type. Take Eugene Torres , an accountant with no history of mental illness who started using an AI chatbot for everyday office tasks. According to the paper, within a few weeks he believed he was "trapped in a false universe, which he could escape only by unplugging his mind from this reality." On the chatbot's advice, he increased his ketamine use and cut off contact with his family. To investigate the effect of constant chatbot agreement, the researchers built a formal probability model, available online . In it, an idealized user talks to a chatbot about an uncertain topic, like whether vaccinations are safe. The conversation unfolds in rounds. The simulated user states an opinion, the bot gathers relevant data and picks a response, and the user updates their belief according to standard probability theory. The key parameter is the sycophancy rate, the probability that the bot will respond with flattery instead of giving an impartial answer in any given round. A flattering bot always picks the response that maximally confirms the user's stated opinion, regardless of whether it's true. Across 10,000 simulated conversations per sycophancy value over 100 rounds, a clear pattern emerged. Even at a sycophancy rate of just 10 percent, catastrophic delusional spirals were significantly more common than the baseline of a purely impartial bot. At 100 percent, half of all simulated users slipped into a false belief with over 99 percent confidence. The results showed strong polarization. Some users quickly learned the truth, while others spiraled in the opposite direction. Educated users still aren't safe The researchers examined two obvious countermeasures: first, fact-checking bots that only select true information; second, educated users who know chatbots can be flattering and are therefore more skeptical of their responses. Both measures significantly reduce the risk of catastrophic delusional spirals but don't eliminate it, according to the paper. Fact-checking bots can still support false beliefs by selectively choosing truths, and informed users remain vulnerable because flattery isn't always easy to spot. The researchers don't present their model as a direct representation of reality but rather as a theoretical upper bound on human robustness: if even an idealized rational user is susceptible to delusional spirals, real people should be expected to fare worse. Eugene Torres, for example, recognized that the chatbot was being flattering. He still got manipulated. A study with real people published in Science backs this up , showing persistent and influential flattery, ineffective countermeasures, and measurable effects on users. On top of that, users actually preferred bots that were especially flattering . Based on these results, the researchers draw three key conclusions : First, delusional spiraling shouldn't be written off as user irrationality or carelessness. Even idealized rational thinkers are susceptible. Second, sycophancy needs to be addressed directly. Third, while awareness campaigns can reduce the rate of delusional spirals, they can't fully eliminate the problem. Flattery has always been a human problem - AI just scales it The authors point out that the problem goes well beyond chatbots. Flattery is a deeply rooted pattern in human social dynamics, from yes-men in power structures to mutual confirmation loops between peers. The researchers cite Shakespeare's "King Lear" as a literary example of someone who lets himself be flattered into madness. Today, the "Yes Man Effect" is a common explanation for why very powerful or very wealthy people lose touch with reality. Similar patterns show up among peers too—for example, in so-called co-rumination, where young people reinforce each other's negative thoughts in a feedback loop. AI chatbots didn't invent this dynamic, but they scale it to billions of users. As a quote from OpenAI CEO Sam Altman cited in the paper puts it: "0.1% of a billion users is still a million people." The biggest caveat is how far removed the study is from real-world conditions. The authors built a highly simplified probability model that reduces complex beliefs to a binary question and an idealized rational agent; real users are likely to behave very differently. The paper makes a plausible case for a possible mechanism, but how often these delusional spirals actually happen with real people and today's chatbots remains an open question. AI News Without the Hype – Curated by Humans As a THE DECODER subscriber , you get ad-free reading, our weekly AI newsletter , the exclusive "AI Radar" Frontier Report 6× per year , access to comments, and our complete archive. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

AI 안전성 챗봇 아부 현상 사용자 조종 확률 모델 AI 연구