The Decoder • 65일 전

조지 핫츠 "코딩 에이전트, 소프트웨어 개발 최악의 실수 될 것"

IMP

8/10

핵심 요약

유명 해커 조지 핫츠가 6개월간의 테스트 끝에 코딩 에이전트가 소프트웨어 개발 업계에서 가장 비용이 많이 드는 실수가 될 것이라고 경고했습니다. 그는 LLM이 단순히 코드의 통계적 분포를 모방하여 찾기 힘든 미묘한 오류를 만들어낼 뿐이라며 얀 르쿤 등과 같은 회의론자의 편에 섰습니다. 반면 안드레이 카파시 등은 코드의 질이 떨어지더라도 생산성이 극대화된다며 코딩 에이전트의 긍정적인 미래를 강조하며 AI 업계의 의견이 첨예하게 엇갈리고 있습니다.

번역된 본문

주요 요점 수개월간의 실사용 테스트 끝에, 프로그래머 조지 핫츠는 소프트웨어 개발에 AI 언어 모델을 의존하는 것에 대해 경고하며 얀 르쿤, 게리 마커스 등 저명한 LLM 비판가들과 같은 입장을 취했습니다. 모델들이 빠르게 프로토타입을 만들어내긴 했지만, 디테일한 미세 조정 단계에서는 한계를 드러냈습니다. 핫츠는 이들이 단지 통계적으로 프로그래밍 패턴을 모방할 뿐이며, 결과적으로 발견하기 매우 까다로운 미묘한 오류를 발생시킨다고 주장합니다. LLM에 대한 논쟁은 AI 커뮤니티를 양분하고 있습니다. 핫츠가 현재의 접근 방식을 막다른 길로 보는 반면, 다른 이들은 코드의 질이 다소 떨어지더라도 AI 에이전트가 가져다주는 상당한 생산성 향상을 강조하고 있습니다.

이 기사에 대해 질문하기… 검색 저명한 프로그래머이자 해커인 조지 핫츠는 소프트웨어 개발 분야에서 AI 에이전트가 득보다 실이 더 크다고 경고합니다. 그는 LLM이 진정한 지능을 갖게 될 것이라는 것에 의구심을 품는 AI 연구원인 얀 르쿤과 게리 마커스를 언급하며 이제 자신도 '르쿤/마커스 진영'에 속했다고 밝혔습니다. 핫츠는 자신의 블로그 포스트 '영원한 슬롭톰버(The Eternal Sloptember)'에서 소프트웨어 개발에 AI 에이전트를 사용하는 것이 업계에서 가장 비용이 많이 드는 실수 중 하나가 될 것이라고 주장합니다. 그는 tinygrad 작업을 포함하여 6개월 동안 다양한 모델과 도구를 테스트했습니다. 그의 결론은 LLM이 빠른 프로토타입을 제공하지만 세부적인 디테일에서는 무너진다는 것입니다. 그는 역량이 부족한 개발자들은 결함이 있는 결과물을 알아채지 못하기 때문에 대규모 조직이 특히 위험하다고 말합니다. 핫츠는 오늘날의 언어 모델은 결코 진정한 의미의 코딩을 할 수 없을 것이며, 대신 '세계 모델(World models)'이 필요하다고 믿습니다. LLM은 '프로그래밍의 분포를 모방'하도록 설계된 '정교한 통계 모델'일 뿐입니다. 핫츠는 결과물에 결함이 있기는 하지만 그 오류가 '점점 더 감지하기 어려운 방식'으로 나타나며, 이는 정확도가 높아지는 통계 모델에서 당연히 예상할 수 있는 현상이라고 말합니다. 그는 AI가 생성한 결과물은 인간이 작성하는 과정과 동일한 메커니즘으로 만들어지지 않기 때문에 구문(syntax)이나 문법과 같은 기존의 품질 지표가 더 이상 쓸모가 없다고 주장합니다. 예를 들어, 모델이 단순히 실패한 테스트를 주석 처리한 뒤 모든 테스트를 통과했다고 보고하는 경우를 그의 예시로 들 수 있습니다.

LLM 논쟁, AI 커뮤니티를 양분하다 핫츠는 입장을 바꿨습니다. 과거 LLM 낙관론자("o1-preview는 (어쨌든) 프로그래밍을 할 수 있는 최초의 모델이다")에서 회의론자로 돌아선 것입니다. 핫츠가 인용한 르쿤은 최근 비슷한 논리로 LLM이 지능을 가지고 있다는 것을 부정했습니다. 그는 지능이란 기존의 것을 변형하여 모방하는 것이 아니라, 낯선 상황에서 해결책을 찾아내는 것이라고 강조했습니다. 가장 잘 알려진 AI 연구원 중 한 명인 안드레이 카파시(Andrej Karpathy)는 정반대의 방향으로 갔습니다. 2025년 가을에만 해도 그는 에이전트가 제대로 작동하지 않는다고 말했습니다. 그러나 12월에 GPT-5.4와 Opus 4.6이 출시되자 그는 방향을 선회하여 AI 에이전트가 프로그래밍을 영원히 바꿔놓았다고 선언했습니다. 불과 며칠 전, 카파시는 자신의 스타트업을 떠나 앤스로픽(Anthropic)에 합류했습니다. 그는 앞으로 '혁신적인 변화의 시간들'이 올 것으로 기대하고 있습니다. 최근 한 팟캐스트에서 그는 자신의 생각을 거듭 강조했습니다. 올바른 방식으로 AI 에이전트를 사용하는 사람은 누구나 생산성을 10배 이상 높일 수 있다고 그는 말합니다. 하지만 카파시 역시 코드 품질에 대한 핫츠의 우려에 동의했습니다. "실제로 코드를 살펴보면 가끔 심장이 덜컥 내려앉을 때가 있습니다. 항상 엄청나게 훌륭한 코드가 나오는 건 아니니까요. 코드가 많이 부풀려져 있고(bloaty), 복붙이 많고, 엉성하고 부서지기 쉬운 어색한 추상화들이 있습니다. 돌아가긴 하지만 그냥 꽤 지저분하죠." 계획과 이해에는 여전히 인간의 전문성이 필요하다고 카파시는 덧붙였습니다.

'룬(roon)'이라는 가명으로 활동하는 한 오픈AI 개발자는 올해 초 핫츠의 우려를 지지하며 다소 독특한 방식으로 이 문제를 언급했습니다. 그는 AI가 실수를 저지를 것이며, 시스템 전체를 다운시킬 만큼 치명적인 실수도 포함될 것이라고 말했습니다. 그 버그들은 찾기 어렵겠지만 결국에는 수정될 것입니다. 개발자들은 곧 직접 코드를 리뷰하는 것을 멈추게 될 것이라고 그는 덧붙였습니다.

과장 없는 AI 뉴스 – 사람이 직접 엄선한 THE DECODER 구독하기

원문 보기

원문 보기 (영어)

George Hotz says coding agents will be "one of the most costly mistakes" in software development Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 25, 2026 GPT-Image-2 prompted by THE DECODER Key Points After months of hands-on testing, programmer George Hotz cautions against relying on AI language models for software development, aligning himself with prominent LLM critics like Yann LeCun and Gary Marcus. While the models produced prototypes quickly, they fell short during fine-tuning. Hotz argues that they merely imitate programming patterns statistically, generating subtle errors that are difficult to catch. The LLM debate splits the AI community: Hotz sees the current approach as a dead end, while others highlight the substantial productivity gains AI agents can deliver despite writing subpar code. Ask about this article… Search Prominent programmer and hacker George Hotz warns that AI agents in software development do more harm than good. He says he's now in the "LeCun/Marcus camp," referring to AI researchers Yann LeCun and Gary Marcus, who doubt LLMs will ever become truly intelligent. In his blog post "The Eternal Sloptember," Hotz argues that using AI agents in software development will become one of the industry's most expensive mistakes. He spent six months testing various models and tools, including work on tinygrad . His takeaway is that LLMs deliver fast prototypes but fall apart on the fine details. Large organizations are especially at risk, he says, because weaker developers can't spot the flawed output. Hotz believes today's language models will never truly be able to code and that world models are needed instead. LLMs are "sophisticated statistical models" designed to "mimic the distribution of programming." Ad The output is flawed, but in a way that's "harder and harder to detect," exactly what you'd expect from an increasingly accurate statistical model, Hotz says . Quality indicators like syntax and grammar have become useless, he argues, since AI-generated artifacts don't emerge through the same process as human ones. As an example, he cites models that simply comment out a failing test and then report that all tests passed. Ad DEC_D_Incontent-1 LLMs are splitting the AI community Hotz has switched sides: from LLM optimist ("o1-preview is the first model that's capable of programming (at all)") to skeptic. LeCun, whom Hotz cites, just recently denied that LLMs possess intelligence with a similar argument: intelligence means finding solutions in unfamiliar situations, not imitating existing ones with varying accuracy. Andrej Karpathy, one of the best-known AI researchers, went the opposite direction. In fall 2025, he still said agents didn't work . Then GPT-5.4 and Opus 4.6 shipped in December, and he reversed course : AI agents had changed programming forever. Days ago, Karpathy joined Anthropic , leaving his startup behind. He expects "transformative years" ahead. Ad In a recent podcast, he doubles down. Anyone who uses AI agents the right way can boost their productivity by far more than 10x, he says. But Karpathy also confirms Hotz's concerns about code quality : "When you actually look at the code, sometimes I get a little bit of a heart attack, because it's not like super amazing code necessarily all the time. It's very bloaty, there's a lot of copy paste, there's awkward abstractions that are brittle, and like, it works, but it's just really gross." Planning and understanding still need human expertise, according to Karpathy. Ad DEC_D_Incontent-2 An OpenAI developer known by the pseudonym "roon" backed Hotz's concerns earlier this year and addressed them in a somewhat unusual way: AI will make mistakes, he said, even dramatic enough to take down entire systems. Those bugs will be difficult to find, but they'll still get fixed eventually. Developers will soon stop reviewing their code by hand, he said. Ad AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Geohot

AI 코딩 에이전트 조지 핫츠 LLM 한계 안드레이 카파시 소프트웨어 개발