The Decoder • 110일 전

스탠퍼드 연구: 다중 AI 에이전트 협업이 자원 대비 효과적인 시기

IMP

8/10

핵심 요약

스탠퍼드 대학의 새로운 연구에 따르면, 다중 에이전트 시스템의 높은 성능은 단일 에이전트보다 더 많은 컴퓨팅 자원(compute)을 사용하기 때문인 것으로 나타났습니다. 동일한 연산량을 부여했을 때 단일 에이전트가 팀보다 성능이 높거나 동등했지만, 입력 데이터의 오류나 노이즈가 많은 환경에서는 다중 에이전트 팀이 정보 필터링에 유리해 더 나은 성과를 보였습니다.

번역된 본문

새로운 스탠퍼드 연구, 다중 AI 에이전트 협업이 연산 자원을 들일 가치가 있는 시기 규명

막시밀리안 슈라이너(Maximilian Schreiner) | THE DECODER | 2026년 4월 9일

멀티 에이전트(Multi-agent) AI 시스템은 일반적으로 더 뛰어난 성능을 발휘하는 것으로 알려져 있습니다. 스탠퍼드 대학의 연구에 따르면 이러한 명백한 이점은 대부분 더 많은 컴퓨팅 자원을 사용하기 때문으로 나타났습니다. 단, 여기에는 중요한 예외가 존재합니다.

현재 AI 연구에서 인기 있는 접근 방식은 멀티 에이전트 시스템입니다. 즉, 여러 AI 모델이 하나의 작업을 분할하거나, 서로 토론(debate)하거나, 결과를 상호 검증하는 방식입니다. 이 아이디어의 기반은 팀워크를 통해 더 나은 답변을 도출할 수 있다는 것으로, 특히 여러 단계의 추론이 필요한 복잡한 문제에서 효과적이라는 믿음입니다.

스탠퍼드 대학교 연구진은 이러한 가정의 근본적인 부분에 의문을 제기합니다. 연구진의 핵심 주장은 단일 에이전트와 에이전트 팀에 동일한 양의 컴퓨팅 자원이 주어졌을 때 단일 에이전트가 최소한 동등한 성능을 보인다는 것입니다.

정보 전달 시마다 정보가 손실된다

연구진이 설명하는 이유는 다음과 같습니다. 여러 에이전트가 협업할 때 중간 결과를 서로 주고받아야 하는데, 이 과정에서 매번 관련 정보가 손실될 위험이 있습니다. 반면, 단일 에이전트는 하나의 연속적인 추론 과정에 모든 것을 유지합니다.

연구진은 두 개의 다단계 추론 벤치마크에서 네 가지 모델(Qwen3-30B-A3B, DeepSeek-R1-Distill-Llama-70B, Gemini 2.5 Flash 및 Pro)을 테스트했습니다. 순차적 체인, 토론, 앙상블 등 5가지 다른 팀 구조와 단일 에이전트를 비교했습니다.

결과는 명확했습니다. 동일한 컴퓨팅 예산이 주어졌을 때, 단일 에이전트가 거의 항상 최고의 선택이거나 동등한 성능을 보였습니다. 또한 팀보다 훨씬 적은 자원을 사용했습니다.

긴 문맥은 단일 에이전트의 약점

다만 연구진은 단일 에이전트의 이론적 우위가 문맥을 완벽하게 처리할 때만 유지된다고 지적했습니다. 실제로 대형 언어 모델들은 이 부분에서 어려움을 겪습니다. 추론 과정이 길어질수록 관련 정보와 노이즈를 구분하기가 더욱 어려워집니다.

연구진은 이러한 현상을 '문맥 부패(context rot)' 및 '중간 누락(lost in the middle)' 효과라고 부릅니다. 모델이 긴 텍스트 중간에 묻힌 정보를 간과하게 되는 현상입니다.

바로 이 부분에서 에이전트 팀이 앞설 수 있습니다. 의도적으로 훼손된 입력 텍스트를 사용한 실험에서, 왜곡이 심할 때 구조화된 팀은 단일 에이전트보다 우수한 성능을 보였습니다. 작업을 분할하는 것이 관련 정보를 더 효과적으로 필터링하는 데 도움이 되었기 때문입니다.

연구진은 또한 기반 모델의 성능이 약할 때 에이전트 팀이 더 큰 이점을 얻는다는 사실을 발견했습니다. 오류 분석에 따르면 단일 에이전트는 때때로 너무 편협하게 생각하는 반면, 팀은 더 넓은 그물을 던져 단일 에이전트가 놓친 답을 가끔 찾아내는 것으로 나타났습니다. '토론(debate)' 아키텍처가 전반적으로 가장 강력한 팀 구성으로 입증되었습니다.

이 연구는 텍스트 기반 추론 작업으로 제한되었습니다. 도구 사용이나 이미지 처리에서 팀 구성이 이점을 제공하는지에 대해서는 이 프리프린트 논문에서 다루지 않습니다.

원문 보기

원문 보기 (영어)

New Stanford study reveals when teaming up AI agents is worth the compute Maximilian Schreiner View the LinkedIn Profile of Maximilian Schreiner Apr 9, 2026 Nano Banana Pro prompted by THE DECODER Multi-agent AI systems are widely considered more capable. A Stanford study shows their apparent advantage largely comes from using more compute. But there are important exceptions. A popular approach in AI research right now is multi-agent systems : multiple AI models split up a task, debate each other, or cross-check results. The idea is that teamwork leads to better answers, especially for complex problems that require multiple reasoning steps. Researchers at Stanford University are now challenging that assumption at its core. Their central claim: when a single agent and a team get the same amount of compute, the solo agent performs at least as well. Every handoff loses information The explanation, according to the researchers: when multiple agents collaborate, they have to pass intermediate results back and forth. Each handoff risks losing relevant information. A single agent, by contrast, keeps everything in one continuous reasoning process. The team tested four different models (Qwen3-30B-A3B, DeepSeek-R1-Distill-Llama-70B, and Gemini 2.5 Flash and Pro) on two multi-step reasoning benchmarks. They compared a single agent against five different team architectures, including sequential chains, debates, and ensemble approaches. The results were clear: given the same compute budget, the single agent was almost always the best or an equivalent option. It also used significantly fewer resources than the teams. Long contexts remain a weak spot for solo agents The study does acknowledge that the single agent's theoretical advantage only holds when it handles context perfectly. In practice, language models struggle with this - the longer a reasoning process gets, the harder it becomes to separate relevant information from noise. Researchers call these phenomena "context rot" and the "lost in the middle" effect, where models overlook information buried in the middle of long texts. This is exactly where teams can pull ahead. In experiments with deliberately corrupted input text, structured teams outperformed the single agent when distortion was high, because splitting up the work helped filter out relevant information more effectively. The study also found that teams benefited more when built on weaker base models. Error analysis showed that single agents sometimes think too narrowly, while teams cast a wider net and occasionally find answers the solo agent misses. The debate architecture proved to be the strongest team setup overall. The study is limited to text-based reasoning tasks. Whether teams offer advantages for tool use or image processing isn't covered in the preprint. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

멀티 에이전트 스탠퍼드 연구 컴퓨팅 자원 추론 모델 문맥 처리