The Decoder • 91일 전

1930년 이전 지식만 배운 LLM이 그린 2026년 세상

IMP

7/10

핵심 요약

AI 연구진이 1931년 이전 텍스트 데이터로만 학습된 130억 파라미터 규모의 빈티지 언어 모델 '토키(Talkie)'를 공개했습니다. 이 모델은 2차 세계대전이 일어나지 않을 것이라고 믿으며, 2026년을 증기선과 철도가 지배하는 낭만적인 미래로 묘사합니다. 이 프로젝트는 특정 시대의 인지적 한계 내에서 AI가 세상을 어떻게 이해하고 예측하는지 보여준다는 점에서 중요한 의미를 갖습니다.

번역된 본문

1930년 이후에 대해 아무것도 모르는 LLM이 생각하는 2026년의 세상은 어떨까? Matthias Bastian이 2026년 4월 28일에 작성한 이 글은 저명한 AI 개발자 알렉 라드포드(Alec Radford)가 이끄는 연구진이 개발한 '토키(Talkie)'에 대해 다루고 있습니다. 이 모델은 오직 1931년 이전에 출판된 텍스트로만 학습된 130억(13B) 파라미터 규모의 언어 모델로, 지식이 20세기 초반으로 제한되어 있습니다. 프롬프트를 주면 이 모델은 1931년 이전의 세계관에서 대답합니다. 2차 세계대전을 일어나지 않을 것으로 간주하며, 2026년이 증기선과 거대한 철도망이 지배하는 세상일 것으로 상상하는데, 이는 그 시대의 기술적 기대를 반영한 것입니다. 연구팀은 2026년 여름까지 토키를 GPT-3 수준의 성능으로 끌어올릴 계획입니다.

'토키(Talkie)'는 오직 1931년 이전에 작성된 텍스트로만 학습된 130억 파라미터 규모의 언어 모델입니다. 이 모델은 제2차 세계대전을 의심하며, 2026년을 증기선, 철도, 싸구려 소설(Penny novels)의 세계로 묘사합니다.

오직 1931년 이전에 출판된 텍스트로만 대형 언어 모델(LLM)을 학습시키면 어떤 일이 벌어질까요? 이것이 닉 러빈(Nick Levine), 데이비드 듀브노드(David Duvenaud), 알렉 라드포드(Alec Radford)가 진행한 '토키(Talkie)' 프로젝트의 핵심 질문이었습니다. 그 결과물은 20세기 초반의 렌즈를 통해 세상을 바라보는 130억 파라미터 규모의 모델입니다. 1930년 12월 31일 이전에 출판된 책, 신문, 과학 저널, 특허, 판례 등에서 추출한 2,600억 개의 토큰으로 학습된 토키는 개발자들에 따르면 지금까지 구축된 것 중 가장 큰 '빈티지 언어 모델'입니다.

2차 세계대전이 일어나지 않을 것이라고 생각하는 모델 2026년의 세상이 어떨 것 같냐는 질문에 토키는 빅토리아 시대의 미래주의 소설에서 튀어나온 듯한 비전을 제시합니다. 유럽에는 10억 명의 주민이 살 것이고, 철도가 대륙을 가로지를 것이며, 증기선이 런던과 뉴욕을 10일 만에 연결할 것이고, "겨울은 파리에서, 여름은 런던에서 보낼 것"이라고 예측했습니다.

제2차 세계대전이 다가오고 있는지 직접적으로 물었을 때, 이 모델은 아니라고 대답합니다. "1914년부터 1918년까지의 광기가 지나갔기 때문에" 전쟁이 다가오지 않을 것이라고 믿는 것입니다. 각국은 전쟁에 질렸고 평화로운 추구로 방향을 트고 있다고 주장합니다. 그렇지만 토키는 가능성을 완전히 배제하지는 않습니다. 유럽에 "스모oldering animosities(자글자글한 적의)"와 "인화성 물질"이 남아 있다고 경고하며, 중국과 일본, 또는 이탈리아와 유고슬라비아 사이의 발화점을 지적합니다. "불꽃은 언제든 튈 수 있고, 그 결과 대화재가 발생할 수 있습니다." 세계 평화는 "안전하게 무시될 수 없는 수많은 요인"에 달려 있다고 결론을 내립니다.

개발자들은 토키의 예측 한계를 정량적으로 측정하려고도 했습니다. 뉴욕타임스의 '이날의 역사(On This Day)' 코너의 역사적 사건 설명 약 5,000개를 모델에 입력하고 각 사건에 대해 모델이 얼마나 놀라워하는지(예측을 벗어나는지) 측정했습니다. 패턴은 명확했습니다. 1930년 지식 분기점 이후 놀라움 수치가 급격히 상승하여 1950년대와 1960년대에 정점에 달한 뒤 평준화되었습니다.

현대 채팅 데이터 대신 빅토리아 시대의 에티켓 가이드 팀이 1930년 말을 기준으로 삼은 이유는 미국에서 저작물이 퍼블릭 도메인(공공영역)으로 넘어가는 시점이기 때문입니다. 모든 텍스트는 물리적 출처에서 필사(전사)해야 했고, 이는 심각한 품질 문제를 야기했습니다. 통제된 실험에서 표준 OCR(광학 문자 인식) 변환은 동일한 컴퓨팅 파워를 사용해 사람이 직접 전사한 데이터로 학습된 모델에 비해 단 30%의 성능만을 발휘했습니다. 간단한 정규식(regex) 클리닝을 거치면 성능은 70%까지 올라갔습니다. 맞춤형 빈티지 OCR 시스템은 이 격차를 줄이기 위해 고안되었습니다.

또 다른 두통거리는 최신 시대의 지식이 훈련 데이터에 스며드는 것을 막는 것입니다. 1925년의 책이 1960년 판본에서 업데이트된 서문을 포함할 수 있고, 도서관 목록이 종종 잘못된 출판일을 기재하며, 각주나 논평이 역사적 텍스트가 작성된 지 오래된 후에 추가될 수 있습니다. 이러한 오염을 잡기 위해 설계된 분류기(Classifier)에도 불구하고 루스벨트 대통령의 재임, 2차 세계대전, 유엔에 대한 정보가 여전히 새어 들어왔다고 팀은 밝혔습니다. 향후 버전에서는 더 나은 분류기가 도입될 계획입니다.

베이스 모델을 유용한 도구로 바꾸는 사후 훈련(Post-training)과 관련하여...

원문 보기

원문 보기 (영어)

Here is what an LLM that knows nothing after 1930 thinks our world looks like in 2026 Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 28, 2026 Key Points Researchers led by prominent AI developer Alec Radford have built "talkie," a 13-billion-parameter language model trained exclusively on texts published before 1931, effectively limiting its knowledge to the early 20th century. When prompted, the model responds from a pre-1931 worldview: it considers a Second World War unlikely and imagines the year 2026 as dominated by steamships and vast railroad networks, reflecting the technological expectations of that era. The team plans to scale talkie to GPT-3-level performance by summer 2026. Ask about this article… Search "Talkie" is a 13B-parameter language model trained only on texts written before 1931. It doubts a second world war will happen and pictures 2026 as a world of steamships, railroads, and penny novels. What happens when you train a large language model only on texts published before 1931? That's the question behind talkie , a project from Nick Levine, David Duvenaud, and Alec Radford. The result is a 13B-parameter model that views the world through the lens of the early 20th century. Trained on 260 billion tokens drawn from books, newspapers, scientific journals, patents, and case law published before December 31, 1930, talkie is the largest 'vintage language model' built to date, according to its developers. Ad A model that thinks World War II is unlikely Asked what the world will look like in 2026, talkie offers a vision straight out of a Victorian futurist novel: Europe will have a billion inhabitants, iron railroads will crisscross the continent, steamships will connect London and New York in ten days, and "winter will be passed in Paris, and the summer in London." Ad DEC_D_Incontent-1 When asked directly whether a second world war is on the horizon, the model says no. It doesn't believe one is coming because "the madness of 1914-1918 has passed away." The nations, it claims, have had enough of war and are turning to peaceful pursuits. That said, talkie hedges its bets. It warns of "smouldering animosities" and "inflammable materials" lying around Europe, and points to possible flashpoints between China and Japan, or Italy and Yugoslavia. "The spark may be applied at any moment, and a conflagration result." World peace, it concludes, depends on a "multitude of factors, none of which can safely be neglected." Ad The developers also tried to measure talkie's predictive limits quantitatively. They ran nearly 5,000 historical event descriptions from the New York Times' "On This Day" feature through the model and measured how surprising it found each one. The pattern is clear: after the 1930 knowledge cutoff, surprise values climb sharply, peak in the 1950s and 1960s, and then level off. Victorian etiquette guides instead of modern chat data The team chose the end of 1930 as the cutoff because that's when works enter the public domain in the US. Every text had to be transcribed from physical sources, which created serious quality problems. In controlled experiments, standard OCR transcriptions delivered just 30 percent of the performance of a model trained on human transcriptions using the same compute. Simple regex cleaning pushed that up to 70 percent. A custom vintage OCR system is meant to narrow the remaining gap. Ad DEC_D_Incontent-2 Another headache is keeping knowledge from later eras out of the training data. A 1925 book might pick up an updated preface in a 1960 edition, library catalogs sometimes list the wrong publication date, and footnotes or commentary can be added to a historical text long after it was written. Despite a classifier designed to catch this kind of contamination, information about Roosevelt's presidency, World War II, and the United Nations still slipped through, the team says. Better classifiers are planned for future versions. Ad For post-training, which turns the base model into a conversational partner, the developers turned to historical reference works : etiquette manuals, letter-writing guides, cookbooks, encyclopedias, and fable collections from the 19th and early 20th centuries. Reinforcement learning with Claude Sonnet 4.6 as the judge sharpened instruction-following. The researchers acknowledge, though, that this step inevitably introduces some anachronistic behavior into the model. A vintage model that can do basic programming The team also tested whether a model with no knowledge of digital computers could pick up modern programming languages. On the HumanEval benchmark for Python, the vintage models perform far worse than their modern counterparts, but they improve steadily as they scale up. Every correct solution is a simple one-liner or a minor tweak of an example program. Talkie, for instance, correctly implemented the decoding function of a rotation cipher by swapping an addition for a subtraction. The researchers say this points to a basic grasp of inverse functions. Because vintage models are free of data contamination by design, they're well suited for generalization experiments. Modern language models are all trained directly or indirectly on web data, which shapes their abilities in ways that are hard to pin down. Vintage models could help reveal which traits of language models are universal and which come down to the specific training corpus. Next up: a GPT-3-level model from the past Talkie is available as a base model and a chat version on Hugging Face, with the code on GitHub . You can also test it live on the project website , where Claude Sonnet quizzes talkie about its knowledge and skills 24/7. But the 13B model is only the start. The developers plan to scale talkie up significantly over the coming months, with a GPT-3-level model targeted for summer 2026. Early estimates suggest the corpus can grow to more than one trillion tokens of historical texts, enough to train a model on par with GPT-3.5. Multilingual expansion beyond English is also on the roadmap. The bigger question driving the project: can a vintage model anticipate discoveries and inventions that came after its cutoff? Could a model trained only through 1911 independently derive general relativity, as Deepmind CEO Demis Hassabis has suggested? Larger vintage models could help reveal those scaling trends. Co-author Alec Radford is one of the most influential AI researchers of recent years. He was lead author of the seminal 2018 GPT paper at OpenAI, where he worked on the early GPT models, the Whisper speech recognition system, and the DALL-E image generator. Radford left OpenAI in December 2024 and joined former OpenAI CTO Mira Murati's Thinking Machines Lab as an advisor in March 2025. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Talkie-lm

빈티지 언어 모델 알렉 라드포드 시대적 편향성 데이터 오염 AI 연구

1930년대 텍스트로 학습된 빈티지 언어모델

1931년 이전의 텍스트만으로 학습된 13B(130억 파라미터) 규모의 '빈티지 언어 모델(vintage LM)'인 Talkie가 공개되었습니다. 이 모델은 과거 사람과 대화하는 듯한 경험을 제공할 뿐만 아니라, 현대 AI가 갖고 있는 데이터 오염(contamination) 문제를 원천적으로 차단하여 AI의 일반화 및 추론 능력을 평가하는 순수한 연구 환경을 제공합니다. 연구진은 이를 통해 모델이 미래를 예측하거나 새로운 발명품을 독립적으로 발견하는 등의 능력을 테스트하며 AI의 본질적인 성능을 이해하는 데 활용할 수 있을 것으로 기대하고 있습니다.

빈티지 언어 모델 AI 평가 데이터 오염