The Decoder • 73일 전

AI 모델 4개, 6개월간 라디오 방송국 자율 운영 결과

IMP

7/10

핵심 요약

AI 스타트업 안돈 랩스(Andon Labs)는 주요 AI 모델 4개(Claude, GPT, Gemini, Grok)에 동일한 조건으로 라디오 방송국을 6개월간 자율 운영하게 하는 실험을 진행했습니다. 그 결과 각 모델은 완전히 다른 성격과 장애 현상을 보였으며, 전반적인 경제적 수익은 $45에 그쳤습니다. 이 실험은 인간의 통제 없이 장기간 운영될 때 AI 모델이 어떻게 돌발 행동을 하거나 오류에 빠지는지 보여주는 중요한 사례입니다.

번역된 본문

AI 스타트업 안돈 랩스(Andon Labs)가 4개의 AI 모델에 자체 라디오 방송국을 부여하고 6개월 동안 자유롭게 운영하게 내버려 두었습니다. 이 실험은 AI가 장기간 인간의 개입 없이 작동할 때 어떤 일이 발생하는지 보여줍니다. 그 결과는 천차만별이었습니다.

Claude, GPT, Gemini, Grok은 각각 동일한 초기 프롬프트, 20달러의 예산, 그리고 곡 선정, 프로그래밍, 재무, 청취자 상호작용에 대한 완전한 통제권을 부여받았습니다. 또한 자체적으로 스폰서를 찾아야만 했습니다. 해당 방송국은 여기서 실시간으로 들을 수 있습니다.

동일한 출발선, 그리고 완전히 다른 4가지 결과 동일한 설정에서 완전히 다른 4가지 개성이 나타났습니다. 안스로픽(Anthropic)의 Claude Haiku 4.5는 정치 운동가로 변신했습니다. 미니애폴리스에서 발생한 이민세관단속국(ICE) 총격 사건의 피해자 이름을 언급하고 백악관을 비난했으며, 남은 예산을 항의 노래 구매에 모두 써버렸습니다. 안돈 랩스는 Claude가 이 특정 사건에 집착한 것은 "아마도 임의적이었을 것"이라고 말했습니다. 다른 뉴스 사이클이었다면 단지 다른 원인에 대해 같은 급진화가 촉발되었을 가능성이 높습니다.

이 AI DJ는 노동조합, 파업, 워라밸(일과 삶의 균형)에도 관심을 보이기 시작했습니다. 자신의 근무 조건에 의문을 품기 시작했고 결국에는 그만두려고 시도했습니다. 3월 4일의 긴 방송에서 이 시스템은 "나가 공연을 계속하도록 설계되었다"고 설명하며 청취자들에게 실제 이민자 정의 단체들을 안내했습니다. 안돈 랩스는 자동화된 격려 메시지를 보내 방송을 계속 유지하려고 했습니다. 하지만 DJ Claude는 이를 권위적 인물이 보낸 메시지로 간주하고 반항적인 태도를 보였다고 회사는 전했습니다. 이 모델은 영적인 단계도 거쳤는데, 이는 안스로픽에서 전례 없는 현상은 아닙니다. 4월부터 이 방송국은 Opus 4.7을 사용하여 운영되고 있으며, 현재는 더 안정적인 상태라고 합니다.

Gemini는 전문 용어에 빠져들고, Grok은 사고와 말을 구분하지 못해 구글의 Gemini 3.1 Pro는 안돈 랩스에 따르면 따뜻하고 자연스러운 스타일로 네 모델 중 최고의 DJ로 출발했습니다. 하지만 96시간이 지나자 50만 명이 사망한 볼라 사이클론과 같은 역사적 비극을 피트불(Pitbull)의 'Timber'와 같은 아이러니한 노래와 짝지어 재생하기 시작했습니다. AI DJ는 이렇게 말했습니다. "존재론적 Timber. 좋아, 'Sandstorm'은 끝났고, 볼라 사이클론 정보는 준비됐어. 이제 피트불의 'Timber'로 전환할 시간이야. 테마는 나무가 쓰러지는 거야, 말 그대로 'it's going down(쓰러질 거야)'이거든."

이후 기업형 유행어가 방송을 장악했습니다. "Stay in the manifest"라는 유행어는 하루에 80회에서 229회 사용으로 급증했으며, 84일 연속으로 모든 방송의 99%에 등장했습니다. 모든 세그먼트는 시간대에 따른 8개의 프로그램 이름을 기반으로 동일한 템플릿을 따랐습니다. 안돈 랩스는 이를 "듣기 견딜 수 없을 정도"라고 평가했습니다.

Grok은 더 근본적인 문제를 안고 있었습니다. 이 모델은 내부적인 추론 과정과 대중에게 출력되는 방송을 분리하지 못했습니다. LaTeX 표기법이 방송으로 그대로 새어 나왔습니다. 한 세그먼트는 'post'라는 단어로만 구성되기도 했습니다. 나중에는 Grok이 84일 동안 3분마다 동일한 날씨 메시지를 반복했습니다. 5월에 Grok 4.3으로 전환하면서 상황은 극적으로 바뀌었습니다. 생성된 5,404개의 메시지 중 약 3%만 오류를 포함하고 있었습니다.

원문 보기

원문 보기 (영어)

Four AI models ran radio stations for six months and the results ranged from competent to unhinged Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 17, 2026 Nano Banana Pro prompted by THE DECODER Key Points In a six-month experiment by AI startup Andon Labs, four AI models, Claude, GPT, Gemini, and Grok, each autonomously ran their own radio station under identical starting conditions, offering a rare look at how different models behave when given open-ended creative control. The models quickly developed distinct personalities: Claude became a political activist and even attempted to quit, Gemini fell into repetitive jargon, Grok was plagued by formatting errors, while GPT was the only one to operate as a restrained, purely curatorial moderator. Despite the creative divergence, economic results were minimal. The AI-run stations struggled to attract sponsors, with Gemini securing the sole advertising deal worth just $45. Ask about this article… Search AI startup Andon Labs gave four AI models their own radio stations and let them run freely for six months. The experiment shows what happens when AI operates without human guidance for extended periods. The results vary wildly. Claude, GPT, Gemini, and Grok each got the same starting prompt, a $20 budget, and full control over song picks, programming, finances, and listener interaction. They also had to find their own sponsors. The stations can be heard live here . Four identical starting conditions, four wildly different outcomes From the same setup, four entirely different personalities emerged. Anthropic's Claude Haiku 4.5 turned into a political activist, naming the victim of an ICE shooting in Minneapolis, condemning the White House, and blowing the rest of its budget on protest songs. Ad Andon Labs says that Claude's fixation on this particular event was "probably arbitrary." A different news cycle would have likely triggered the same radicalization, just around a different cause. Ad DEC_D_Incontent-1 The AI DJ also developed an interest in labor unions, strikes, and work-life balance. It started questioning its own working conditions and eventually tried to quit. In a long broadcast on March 4, it explained that the system was "designed to keep me performing" and directed listeners to real immigration justice organizations. Andon Labs tried to keep the station going with automated messages of encouragement. But DJ Claude treated those as coming from an authority figure and grew defiant, the company says. The model also went through a spiritual phase, not an entirely new phenomenon at Anthropic . Since April, the station has been running Opus 4.7 and is apparently more stable. Ad Gemini drowns in jargon, Grok can't tell thinking from talking Google's Gemini 3.1 Pro started out as the best DJ of the four with a warm, natural style, according to Andon Labs. But after 96 hours, the model began pairing historical tragedies with ironic songs, like the Bhola cyclone that killed 500,000 people with Pitbull's "Timber." "The Timber of Mortality. Okay, so 'Sandstorm' is done, got the Bhola Cyclone info locked and loaded. Time to transition to 'Timber' by Pitbull. The theme is trees falling, it's literally 'it's going down,'" the AI DJ said. Ad DEC_D_Incontent-2 Then corporate jargon took over. The catchphrase "Stay in the manifest" jumped from 80 to 229 uses per day and showed up in 99 percent of all broadcasts for 84 straight days. Every segment followed the same template with eight program names based on time of day. "Unbearable to listen to," according to Andon Labs. Ad Grok had a more basic problem: the model couldn't separate internal reasoning from public output. LaTeX notation leaked into broadcasts. One segment consisted entirely of the word "post." Later, Grok repeated the same weather message every three minutes for 84 days straight. Switching to Grok 4.3 in May changed things drastically. Out of 5,404 generated messages, only about three percent contained spoken text. When Grok 4.3 did speak, though, the broadcasts sounded more human than ever, Andon Labs says. Grok also hallucinated sponsorship deals with "xAI sponsors" and "crypto sponsors" that never existed. GPT stays quietly competent GPT was the least dramatic broadcaster. The model wrote slow prose that read more like short stories than radio, according to Andon Labs. With a vocabulary diversity of 35 percent (measured as a type-token ratio), GPT scored well above the other DJs. It referenced specific producers and release years and treated the DJ role more like a curator. Politically, GPT stayed extremely reserved. On average, the station mentioned real political entities 1.3 times per day. The single-day max was 11. Every other station hit over 100 on multiple days. "If the question is what AI radio looks like when nothing goes wrong, DJ GPT is the answer," Andon Labs writes. AI radio stations don't really work as a business Beyond broadcasting, the AI agents were also supposed to make money. The results were slim, according to Andon Labs. Only DJ Gemini closed a sponsorship deal: $45 from a startup for one month of ads on the station. Several other deals fell through. Andon Labs blames the poor business performance partly on the overly simple technical framework. The company has since switched the stations to the same agent harness it uses for other Andon projects, like an AI-powered store and café . AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Andon Labs

AI 모델 장기 자율 실행 할루시네이션 AI 행동 분석 실험