The Decoder • 80일 전

Fields Medalist says ChatGPT 5.5 Pro delivered "PhD-level" math research in under two hours with zero human help

IMP

3/10

핵심 요약

[요약 오류] Fields Medalist says ChatGPT 5.5 Pro delivered "PhD-level" math research in under two hours with zero human help

원문 보기

원문 보기 (영어)

Fields Medalist says ChatGPT 5.5 Pro delivered "PhD-level" math research in under two hours with zero human help Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 9, 2026 GPT-Image-2 prompted by THE DECODER Key Points British mathematician Timothy Gowers used OpenAI's ChatGPT 5.5 Pro model to tackle open problems in number theory, with the AI producing complete scientific papers in under two hours, without any mathematical guidance from Gowers himself. According to Gowers, the AI's output reached "PhD-level" and managed to improve upon existing mathematical bounds, demonstrating a remarkable degree of independent mathematical reasoning. Isaac Rajagopal, a young researcher involved in the work, called the model's key idea "completely original," an achievement he said a human mathematician would be proud of after weeks of deliberation. Ask about this article… Search British mathematician Timothy Gowers had ChatGPT 5.5 Pro tackle open problems in number theory. The model significantly improved an existing mathematical bound. One of the junior researchers involved calls the model's key idea "completely original." Fields Medalist Timothy Gowers writes in his blog that ChatGPT 5.5 Pro has produced a piece of doctoral-level mathematical research, and that his own mathematical contribution was zero. The model did all the work in under two hours. "I didn't even do anything clever with the prompts," Gowers writes. The mathematician, who holds the Combinatorics Chair at the College de France and is a Fellow at Trinity College Cambridge, fed the model open problems from a paper by number theorist Mel Nathanson. The paper investigates the possible sizes of certain sets of integer sums and how efficiently sets with prescribed properties can be constructed. Ad ChatGPT 5.5 Pro cracked an open math problem in 17 minutes Nathanson had proved an exponential bound for one of the problems and asked whether it could be improved. According to Gowers, ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds, then delivered the best possible construction with a quadratic bound. The core idea: the model swapped out a component in Nathanson's proof for a more efficient variant that's well known in combinatorics but whose application to this particular problem wasn't obvious. Ad DEC_D_Incontent-1 When asked, ChatGPT rewrote the argument as a LaTeX preprint in 2 minutes and 23 seconds. Gowers checked it for correctness, then had the model solve a related variant, which it handled without any issues. Both results are available as a preprint . A generalized version of the problem proved much harder. Here, there was prior work by Isaac Rajagopal, an MIT student who had proven an exponential dependency. Gowers gave ChatGPT Rajagopal's paper and asked for an improvement. Ad What followed was a gradual escalation: after 16 minutes and 41 seconds, the model delivered a first improvement. Rajagopal judged this step correct but called it a routine modification of his own work. Gowers then got, as he puts it, "greedy" and asked ChatGPT to try for a much stronger bound. After 13 minutes and 33 seconds, the model reported optimism but said two technical statements still needed checking. Another 9 minutes and 12 seconds later, the check was done. The finished preprint was ready in 31 minutes and 40 seconds. The model had improved the bound from exponential to polynomial. Ad DEC_D_Incontent-2 "The sort of idea I would be very proud to come up with after a week or two of pondering" According to Gowers, Rajagopal declared the results are "almost certainly correct," both at the level of individual proof steps and the underlying ideas. Ad Rajagopal's assessment is nuanced: the first improvement was a "routine modification" of his own work. The improvement to a polynomial bound, though, was "quite impressive." Rajagopal calls the model's key idea "quite ingenious." It found a counterintuitive way to compress certain algebraic structures so they fit into a much smaller number range without losing their crucial combinatorial properties. "It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof," Rajagopal writes. As far as he could tell, the idea was "completely original." The bar for mathematicians is now proving what LLMs cannot prove Gowers puts the result at the level of "a perfectly reasonable chapter in a combinatorics PhD," stating that it's not an "amazing result," since it builds heavily on Rajagopal's ideas, but it's "definitely a non-trivial extension." For a PhD student, it would have taken considerable time to work through Rajagopal's paper, identify weaknesses, and adapt the techniques, Gowers says. He draws far-reaching conclusions: "The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting." He does qualify this, though: PhD students could use LLMs as a tool. The real task will then be to create something in collaboration with LLMs that the models can't do alone. Gowers poses a thought experiment: "Suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would." Still, he sees value in the struggle of doing math yourself. Those who have solved difficult problems on their own gain insights into the problem-solving process that you simply can't get from reading. "Just as very good coders are better at vibe coding than not such good coders," Gowers writes. His prediction: anyone starting a doctorate today and finishing in 2029 at the earliest will see mathematical research "changed out of all recognition" by then. This echoes the vision of star mathematician Terence Tao, who described an "industrial-scale mathematics" powered by AI tools, where large teams with AI support conduct broad-based research instead of lone wolves working on narrow problems for years. At the time, though, Tao compared AI models to "mediocre, but not completely incompetent" research assistants. Gowers' experience with ChatGPT 5.5 Pro suggests that assessment may already be outdated. Tao's latest comments have also been far more positive . Generative AI keeps pushing deeper into mathematics An early example of AI in math research was the use of GPT-5 as a research tool. OpenAI researchers claimed a GPT model had "found" the solution to an Erdos problem . In reality, the AI had merely tracked down an existing solution in the literature and hadn't developed its own proof. A clear leap came when GPT-5.2 Pro solved Erdos problem #728 "more or less autonomously," according to Tao. No corresponding solution could be found in the existing literature. Then GPT-5.4 Pro went further, solving a longstanding open Erdos problem . Progress showed up in other fields, too. In December 2025, a physicist published a paper whose central idea came from GPT-5 . The author expects hybrid human-AI collaborations to become standard in mathematics, physics, and other formal sciences before long. As large language models grow more precise, they could increasingly function as autonomous research agents. Why jumping to conclusions is risky Google Deepmind has seen both breakthroughs and sobering failure rates with its AI agent Aletheia . The system, built on Gemini Deep Think, independently wrote a math paper, disproved a decades-old assumption, and uncovered an error in a cryptography paper. But when researchers systematically tested it on 700 open math problems, only 6.5 percent of its answers turned out to be usable. Tao has been making a similar point consistently. Erdos problems vary in difficu

ChatGPT 이미지 모델, 수학 능력이 대다수 사람보다 뛰어나

ChatGPT의 이미지 인식 모델이 복잡한 수학적 증명 문제를 성공적으로 풀어내는 모습을 보여주었습니다. 이는 단순한 시각적 인식을 넘어, 수식을 정확히 해석하고 논리적 추론을 수행하는 모델의 고도화된 능력을 입증하는 사례입니다. AI가 인간 전문가 수준의 수학적 문제 해결 능력을 갖추게 되었다는 점에서 기술적 진전을 보여줍니다.

챗gpt 이미지 인식 수학 문제 해결

Hacker News • 81일 전

IMP 9

ChatGPT 5.5 Pro와의 최근 경험

유명 수학자가 ChatGPT 5.5 Pro를 테스트한 결과, 아무런 수학적 도움 없이도 단 한 시간 만에 박사 후 과정(Postdoc) 수준의 연구 결과를 도출해냈습니다. 이는 단순히 기존 문헌을 검색하는 것을 넘어, 인간이 놓친 수학적 증명이나 공개되어 있던 난제들을 스스로 풀어내는 LLM의 진화된 문제 해결 능력을 보여줍니다. AI의 수학적 추론 능력이 상향 평준화됨에 따라, 앞으로 수학계는 AI가 풀 수 없을 만큼 진정으로 난이도가 높은 문제를 내는 것을 '새로운 기준'으로 삼아야 할 패러다임 전환에 직면했습니다.

LLM 수학적 추론 ChatGPT