The Decoder • 90일 전

OpenAI 연구원들이 말하는 AGI 향한 길, '수학'

IMP

8/10

핵심 요약

OpenAI의 연구원들은 수학적 추론 능력이 AGI(범용 인공지능)로 가는 핵심 기준점이라고 설명합니다. AI 모델은 최근 2년 만에 초등 수준에서 올림피아드 및 연구원 수준으로 발전하며, 필즈상 수상자들의 연구를 돕고 42년 된 수학 미제를 해결하는 등 혁신적인 성과를 입증했습니다. 연구진은 장기간의 일관된 추론과 자체 오류 교정 능력을 수학을 통해 훈련시키면 생물학, 재료과학 등 다른 분야로도 확장될 것이라고 강조했습니다.

번역된 본문

AI 모델은 단 2년 만에 초등학생 수준의 산술에서 올림피아드 및 연구 수준의 수학으로 도약했습니다. OpenAI 팟캐스트에서 OpenAI 연구원인 세바스찬 부벡(Sebastian Bubeck)과 어니스트 류(Ernest Ryu)는 수학이 범용 인공지능(AGI)을 향한 여정에서 핵심 테스트가 된 이유를 설명합니다.

추론 모델은 불과 2년 전만 해도 존재하지 않았습니다. 4년 전, 부벡은 구글의 Minerva 모델이 좌표계에 점들을 찍고 선을 긋는 것을 보고 감명받았습니다. 그는 오늘날 이 시스템들이 필즈상 수상자들의 일상적인 연구를 돕고 있다고 밝혔습니다. 부벡은 18개월 전 한 학회에서 참석자의 80%가 대형 언어 모델(LLM)을 확장하는 것만으로는 공개 연구 문제를 풀 수 없다고 생각했다고 말합니다.

전 UCLA 수학 교수인 어니스트 류는 ChatGPT를 사용하여 최적화 이론의 네스테로프 방법(Nesterov's method)에 관한 42년 된 미해결 문제를 단 3일에 걸친 12시간 만에 해결했다고 말합니다. 그는 이미 AI 없이 40시간 이상을 들였지만 아무런 진전을 얻지 못했던 상태였습니다. 류는 검증자 역할을 하며 AI의 오류를 잡아내고 대화를 유망한 방향으로 이끌었습니다.

수학이 AGI의 벤치마크가 된 이유 부벡에게 수학이 우연히 AGI 진척의 척도가 된 것은 아닙니다. 이는 일반적인 지능 시스템이 갖춰야 할 정확히 그 kind의 능력을 요구하기 때문입니다. 수학적 증명은 몇 시간, 며칠, 심지어 몇 년에 걸친 길고 일관된 추론을 필요로 하며, 사슬의 어느 한 부분이라도 단 하나의 실수가 생기면 나머지가 아무리 올바르더라도 전체 논증이 무너집니다. 이를 처리할 수 있는 시스템은 자체 오류를 발견하고 수정할 수 있어야 합니다. 연구자들은 수학 훈련을 통해 얻은 이러한 능력을 생물학부터 재료과학에 이르는 다른 분야로 옮기고자 합니다. 부벡은 사람들의 교육 방식과 비교합니다. 학생들은 앞으로 증명을 작성하기 위해 수학을 배우는 것이 아니라, 논리적으로 사고하도록 강요받기 때문에 수학을 배웁니다.

또한 수학은 벤치마크로서 실질적인 장점도 가지고 있습니다. 문제가 명확하게 제시되고, 답을 확인할 수 있으며, 결과가 올바른지에 대해 누구도 논쟁하지 않습니다. 부벡은 'AGI 시간'이라는 개념을 소개합니다. 2년 전 모델은 학생의 사고를 몇 분 동안 시뮬레이션할 수 있었습니다. 오늘날 그들은 며칠 심지어 일주일 단위까지 도달했습니다. 다음 목표는 몇 주와 몇 달입니다. 부벡은 OpenAI의 훈련 방법이 수학에만 국한된 것이 아니라 범용적이므로 다른 과학 분야에서도 진전이 따를 것이라고 말합니다. 연구원들은 긴 시간 동안 독립적으로 문제를 풀 수 있는 '자동화된 연구원(Automated researcher)'을 구축하고 있습니다.

에르되시(Erdős) 문제와 그 의미를 둘러싼 논쟁 부벡과 류는 고(故) 헝가리 수학자가 남긴 공개 질문 모음인 에르되시 문제도 파고듭니다. 부벡은 내부 모델이 처음에 주로 심층적인 문헌 검색을 통해 '미해결'로 표시된 10개의 문제에 대한 해결책을 찾았다고 말합니다. 이에 대한 그의 오해를 불러일으킨 트윗은 많은 사람들이 이를 OpenAI가 새로운 증명을 만들어냈다는 주장으로 읽었기 때문에 구글 CEO 데미스 하사비스와의 공개적인 말다툼을 촉발했습니다. 부벡은 지금까지 ChatGPT와 내부 모델이 학술 저널에 게재할 가치가 있는 10개 이상의 완전히 새로운 해결책을 실제로 만들어냈다고 말합니다. 비현실적인 주장처럼 보였던 것이 이제 현실이 되었으며, 그 속도는 가속화되고 있습니다. 부벡은 이를 모델이 기존 지식을 재조합하는 수준을 넘어 새로운 수학을 생산하는 도약을 하고 있는 증거로 봅니다. 과학적 진보가 영리한 재조합과 약간의 추론 이상인지에 대한 철학적 질문은 여전히 열려있지만 말입니다.

위험성: 정신적 퇴화와 가짜 증명 두 연구원은 이러한 도구를 표면적으로 사용하는 것에 대해 경고합니다. 오직 훈련받은 수학자만이 모델을 생산적으로 사용할 수 있기 때문에 전문 지식이 그 어느 때보다 중요하다고 주장합니다. 소셜 미디어에 긴 AI 생성 증명을 게시하는 수학 전문가가 아닌 사람들의 주장은 대개 틀립니다. 류는 프로그래밍에서도 동일한 패턴을 보고 있습니다. 그곳에서는 한 세대가 능력을 잃어가고 있습니다.

원문 보기

원문 보기 (영어)

OpenAI researchers explain why math is the road to AGI Maximilian Schreiner View the LinkedIn Profile of Maximilian Schreiner Apr 29, 2026 Nano Banana Pro prompted by THE DECODER AI models have jumped from grade-school arithmetic to olympiad-level and research mathematics in only two years. In the OpenAI Podcast, OpenAI researchers Sebastian Bubeck and Ernest Ryu explain why math has become the key test on the road to artificial general intelligence. Reasoning models didn't exist two years ago. Four years ago, Bubeck was impressed when Google's Minerva model could draw a line through points on a coordinate system. Today, he told Andrew Mayne, these systems are helping Fields Medal winners with their daily work. At a conference 18 months ago, 80 percent of the mathematicians in the room thought it was impossible for scaled-up LLMs to crack open research problems, Bubeck says. Ernest Ryu, a former UCLA math professor, says he solved a 42-year-old open problem about Nesterov's method in optimization theory using ChatGPT - in just twelve hours spread across three evenings. He had already spent more than 40 hours on it without AI and gotten nowhere. Ryu acted as a verifier, catching errors and steering the conversation in promising directions. Why math has become the benchmark for AGI For Bubeck, math isn't the yardstick for AGI progress by accident. It demands exactly the kind of capability a generally intelligent system needs. Mathematical proofs require long, consistent reasoning over hours, days, or even years, and a single mistake anywhere in the chain destroys the entire argument, no matter how correct the rest is. Anything that can handle that has to be able to spot and fix its own errors. That's what the researchers want to carry over from math training into other fields, from biology to materials science. Bubeck draws a parallel with how people are educated: students learn math not because they'll go on to write proofs, but because the subject forces them to think logically. Math also has practical advantages as a benchmark. Problems are clearly stated, answers can be checked, and nobody argues about whether a result is correct. Bubeck introduces the idea of "AGI time": two years ago, models could simulate a student's thinking for minutes. Today, they're up to days or even a week. The next target is weeks and months. OpenAI's training methods aren't specific to math, Bubeck says, but general, which means progress in other sciences should follow. The researchers are building an "automated researcher" that can work on problems on its own over long stretches of time. The Erdős problems and the fight over what they mean Bubeck and Ryu also dig into the Erdős problems, a collection of open questions left behind by the late Hungarian mathematician. Bubeck says internal models initially found solutions to ten problems marked as open, mostly through deep literature searches. His misleading tweet about it sparked a public spat with Google CEO Demis Hassabis, since many people read it as a claim that OpenAI had produced new proofs. By now, Bubeck says, ChatGPT and internal models have actually produced more than ten genuinely new solutions worthy of publication in academic journals . What seemed like an unrealistic claim is now reality, and the pace is picking up. Bubeck sees this as evidence that the models are making the leap from recombining existing knowledge to producing new mathematics. Even if the philosophical question of whether scientific progress is anything more than clever recombination plus a bit of reasoning remains open. The risks: mental atrophy and fake proofs Both researchers warn against using these tools superficially. Expertise matters more than ever, they argue, because only trained mathematicians can put the models to productive use. Non-mathematicians who post long AI-generated proofs on social media are usually wrong. Ryu sees the same pattern in programming, where a whole generation is losing the ability to use debuggers. Bubeck says claims that scientists are no longer needed are therefore dangerous. Academic institutions need to actively reclaim their role. At the same time, AI can speed up proof verification - a process that currently takes years - and flag problems in published papers. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now --> AI news without the hype Curated by humans. More than 16% discount. Read without distractions – no Google ads. Access to comments and community discussions. Weekly AI newsletter. 6 times a year: “AI Radar” – deep dives on key AI topics. Up to 25 % off on KI Pro online events. Access to our full ten-year archive. Get the latest AI news from The Decoder. Subscribe to The Decoder -->

인공지능 AGI 수학적 추론 오픈AI AI 연구

마이크로소프트 독점 계약 종료 다음 날, OpenAI AWS 정식 상륙

마이크로소프트와의 독점 계약이 재편된 지 하루 만에, AWS가 자사 AI 플랫폼인 '베드록(Bedrock)'에 세 가지 새로운 OpenAI 서비스를 출시했습니다. 이번 조치는 최대 500억 달러 규모의 아마존-OpenAI 파트너십에 따른 것으로, 독점권 분쟁의 법적 갈등을 종식시키며 공개 AI 모델 시장의 판도를 완전히 바꿀 중요한 전환점입니다.

OpenAI AWS 마이크로소프트