MIT Tech Review • 98일 전

AI의 다음 도약, '세계 모델'의 부상

IMP

8/10

핵심 요약

현재 AI는 디지털 영역을 정복했지만 물리적 세계를 이해하는 데는 한계가 있습니다. 이를 극복하기 위해 구글 딥마인드, 스탠퍼드 이비 리 교수, 얀 르쿤 등 AI 거장들이 물리적 환경을 시뮬레이션하고 예측하는 '세계 모델(World Model)' 연구에 본격적으로 나섰습니다. 세계 모델은 기존 대형 언어 모델(LLM)의 취약한 세계 인식 능력을 극복하고, 향후 로봇 공학과 자율 주행 등 물리적 환경 상호작용을 필요로 하는 AI 에이전트의 혁신을 이끌 핵심 기술로 평가받고 있습니다.

번역된 본문

[현재 AI에서 중요한 10가지] 전체 목록 보기

AI 시스템은 이미 디지털 세계에 대해 놀라운 수준의 통제력을 확보했지만, 물리적 세계는 여전히 인류의 고유 영역입니다. 결국 새로운 소설을 쓰거나 앱을 개발할 수 있는 AI 시스템을 구축하는 것이 빨래를 개거나 도심 거리를 누비는 AI를 개발하는 것보다 훨씬 쉽다는 사실이 밝혀졌습니다. 많은 연구원들이 그 수준에 도달하려면 이른바 '세계 모델(World model)'이라는 것이 필요하다고 믿습니다.

세계 모델은 새로운 개념이 아닙니다. 하지만 구글 딥마인드(Google DeepMind)와 스탠퍼드 대학교 페이페이 리(Fei-Fei Li) 교수의 '월드 랩스(World Labs)', 그리고 얀 르쿤(Yann LeCun)이 메타(Meta)에서 퇴사하여 세계 모델에 집중하는 스타트업을 설립하면서 최근 AI 업계에서 가장 뜨거운 화두로 떠올랐습니다. 오픈AI(OpenAI) 역시 폐쇄된 소라(Sora) 비디오 앱의 자원을 '장기적인 세계 시뮬레이션 연구'로 재배치하며 이 경쟁에 합류했습니다.

리 교수와 르쿤 같은 지지자들은 세계 모델이 대형 언어 모델(LLM)의 잘 알려진 한계를 극복하고 로봇 공학 분야에서 AI의 진정한 잠재력을 실현해 줄 것이라고 주장합니다. '세계 모델'이라는 용어의 정의는 다양하지만, 모든 개념의 핵심은 지능형 시스템이 외부 세계를 어떻게 표상(represent)하는지에 맞춰져 있습니다. 일부 과학자들은 인간이 주변 환경을 탐색하고 행동을 안내하기 위해 우리만의 정신적 세계 모델을 사용한다고 말합니다. 우리 뇌는 환경을 매우 정확하게 시뮬레이션하여 테이블 끝에서 머그잔을 밀어 떨어뜨리거나 친구에게 솔직한 의견을 말할 때 어떤 일이 일어날지 효과적으로 예측할 수 있게 해주며, 이러한 예측은 우리가 무엇을 해야 할지 결정하는 데 도움을 줍니다.

LLM이 이미 이런 역할을 훌륭하게 수행하는 것처럼 보일 수 있습니다. 테이블에서 머그잔을 넘어뜨리면 어떻게 되는지 능숙하게 설명할 수 있으니까요. 하지만 연구 결과에 따르면 AI의 세계에 대한 '이해'는 매우 취약합니다. 한 연구에 따르면 시뮬레이션된 뉴욕시 택시 여행 데이터베이스로 학습된 언어 모델은 맨해튼의 한 지점에서 다른 지점으로 이동하는 효과적인 경로를 제공할 수 있었지만, 우회로를 선택해야 하는 상황이 주어지자 완전히 실패했습니다.

이러한 결과는 세계 모델(이 경우 뉴욕시의 정확한 정신 지도)을 갖춘 AI 시스템이 우리가 익숙해진 불안정한 LLM보다 훨씬 더 강건하고 신뢰할 수 있을 수 있음을 시사합니다. 많은 연구자들은 세계 모델이 로봇 공학의 미래에 필수적이라고 생각합니다. 월드 랩스의 설립자인 리 교수는 심해를 탐색하고 의료진을 돕는 로봇 개발을 어떻게 촉진할 수 있는지에 대해 글을 쓴 바 있습니다. 하지만 현재로서는 그 활용 사례가 좀 더 소규모입니다. 예를 들어, '포켓몬 GO'의 개발사는 게임 플레이어들이 수집한 수십억 장의 이미지를 활용해 세계 모델의 초기 단계를 구축하고 있으며, 이를 통해 배달 로봇의 길잡이 역할을 할 수 있기를 희망하고 있습니다.

구글 딥마인드와 월드 랩스는 현재 텍스트, 이미지, 그리고 월드 랩스의 경우 비디오 프롬프트를 조합하여 상호 작용이 가능한 3D 가상 환경을 생성할 수 있는 모델을 구축하는 데 주력하고 있습니다. 이러한 도구는 비디오 게임 및 몰입형 VR 경험의 설계 과제를 간소화하는 데 사용될 수 있지만, 대형 언어 모델에 비해 응용 분야가 제한적인 것으로 보입니다. 진정한 돌파구는 환경을 표현하고 행동의 결과를 예측한 다음 무엇을 할지 결정할 수 있는 유연하고 지능적인 에이전트에 이러한 시스템을 통합할 때 나올 것입니다.

원문 보기

원문 보기 (영어)

10 Things That Matter in AI Right Now See the full list AI systems have already gained impressive mastery over the digital world, but the physical world is still humanity’s domain. As it turns out, building an AI system that can compose a novel or code an app is far easier than developing one that can fold laundry or navigate a city street. To get there, many researchers believe, you need something called a world model. World models are not a new idea, but recent developments from Google DeepMind and Stanford professor Fei-Fei Li’s World Labs, as well as Yann LeCun’s splashy departure from Meta to form a world-model-focused startup, have brought them to the forefront of the AI discussion. OpenAI, too, is getting in on the action by reallocating resources from the shuttered Sora video app to “longer-term world simulation research.” Proponents like Li and LeCun argue that world models will allow researchers to overcome the well-known limitations of LLMs and realize AI’s promise for robotics. Definitions of the term “world model” vary, but they all center on the ways in which intelligent systems represent the external world. Some scientists would say that humans use our own mental world models to navigate our surroundings and guide our actions; somehow, our brains simulate our environments with enough fidelity to let us effectively predict what we will observe if we push a mug off the edge of a table or tell a friend our honest opinion, and those predictions help us decide what to do. LLMs might seem to do a good job of this already—they can certainly tell you what will happen if you knock a mug off a table. But research suggests that their “understanding” of the world is brittle. One study found that language models trained on a database of simulated New York City taxi trips can provide effective directions for how to navigate from one point in Manhattan to another—unless the model is forced to take occasional detours, in which case it fails completely. This result and others suggest that AI systems with a world model—in this case, an accurate mental map of New York City—could be far more robust and reliable than the flaky LLMs to which we have grown accustomed. Many researchers think that world models will prove essential to the future of robotics. Li, the World Labs founder, has written about how they could facilitate the development of robots that explore the deep sea and assist health-care providers, but for now, the applications are more modest. The makers of Pokémon Go, for instance, are using billions of images collected by the game’s players to build the first pieces of a world model that, they hope, could help guide delivery robots . Google DeepMind and World Labs are currently focusing their efforts on building models that can generate interactive, 3D virtual environments from a combination of text, images, and in the case of World Labs, video prompts. Such tools could be used to streamline the design of video games and immersive VR experiences, but compared with large language models, they seem to have a limited range of applications. The real breakthroughs are likely to come from integrating such systems into flexible, intelligent agents that can represent their environments, predict the consequences of their actions, and then decide what to do. Deep Dive Artificial intelligence OpenAI is throwing everything into building a fully automated researcher An exclusive conversation with OpenAI’s chief scientist, Jakub Pachocki, about his firm's new grand challenge and the future of AI. By Will Douglas Heaven archive page How Pokémon Go is giving delivery robots an inch-perfect view of the world Exclusive: Niantic's AI spinout is training a new world model using 30 billion images of urban landmarks crowdsourced from players. By Will Douglas Heaven archive page Want to understand the current state of AI? Check out these charts. According to Stanford’s 2026 AI Index, AI is sprinting, and we’re struggling to keep up. By Michelle Kim archive page This startup wants to change how mathematicians do math Axiom Math is giving away a powerful new AI tool. But it remains to be seen if it speeds up research as much as the company hopes. By Will Douglas Heaven archive page Stay connected Illustration by Rose Wong Get the latest updates from MIT Technology Review Discover special offers, top stories, upcoming events, and more. Enter your email Privacy Policy Thank you for submitting your email! Explore more newsletters It looks like something went wrong. We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.

세계 모델 LLM 한계 로봇 공학 구글 딥마인드 AI 에이전트