The Decoder • 87일 전

미국 정부 벤치마크: AI 경쟁서 중국 추락

IMP

7/10

핵심 요약

미국 국가표준기술연구소(NIST) 산하 기관의 보고서에 따르면, 중국의 최고 성능 AI 모델인 Deepseek V4 Pro는 미국의 최상위 모델들보다 약 8개월 뒤처진 것으로 나타났습니다. 비록 성능 면에서는 미국이 우위를 점하고 있으나, Deepseek는 가격 경쟁력에서 확실한 우위를 점하며 기업의 실질적인 AI 투자 수익(ROI)을 고려할 때 '충분히 좋은(Good enough)' 저렴한 모델이 오히려 매력적인 대안이 될 수 있음을 시사합니다.

번역된 본문

미국 정부의 한 벤치마크에 따르면, 중국이 AI 경쟁에서 뒤처지고 있다고 합니다.

AI 표준 및 혁신 센터(CAISI)의 새로운 보고서에 따르면, 중국 AI 모델들이 미국 모델들과의 격차에서 점차 뒤처지고 있다고 주장합니다. 이 기관은 최근 중국의 새로운 오픈 웨이트(Open-weight) 모델인 Deepseek V4 Pro를 철저한 테스트에 통과시켰습니다. 결론은 이 모델이 선도적인 미국 모델들보다 약 8개월 정도 뒤처졌다는 것입니다.

CAISI는 사이버 보안, 소프트웨어 개발, 수학, 자연 과학, 추상적 추론 등 다양한 분야에서 성능을 테스트했습니다. CAISI는 Deepseek V4를 지금까지 개발된 중국 AI 모델 중 가장 뛰어난 모델로 평가했습니다. 하지만 비공개 테스트 결과에 따르면 Deepseek의 자체 기술 보고서가 제시하는 수준보다 실제 성능은 더 낮은 것으로 알려졌습니다.

Deepseek는 이 모델이 Opus 4.6 및 GPT-5.4와 같은 현재의 미국 모델들과 대등한 수준이라고 홍보했습니다. 그러나 CAISI는 이 모델이 실제로는 구형인 GPT-5에 더 가깝다고 밝혔습니다. 특히 추상적 추론, 사이버 보안 및 소프트웨어 개발 분야에서 그러했습니다. 수학 분야는 Deepseek V4가 최고 수준의 미국 모델들과 거의 필적하는 유일한 영역이었습니다.

자체적인 정치적 의제를 가지고 있을 가능성이 높은 이 센터는 미국 국가표준기술연구소(NIST) 산하에 있습니다. 이 보고서는 미국 모델과 중국 모델 간의 격차가 벌어지고 있다는 그림을 그립니다. 하지만 독립적인 측정 결과는 격차가 대략 일정하게 유지되고 있다고 보여주며 다른 이야기를 전합니다.

DEC_D_Incontent-1

성능보다 가격이 더 중요해질 수 있다

가격 측면에서 Deepseek V4는 확실한 우위를 점합니다. 총 7개 테스트 중 5개에서 비슷한 수준의 GPT-5.4 mini보다 더 저렴했습니다. 그리고 AI 모델이 더 오래 실행되고 더 복잡한 작업을 처리해야 할 것으로 예상되면서 가격이 더 큰 요소가 되고 있습니다. 반면 최고 수준의 미국 모델들은 계속해서 비싸지고 있습니다.

이는 중요한 문제입니다. 왜냐하면 이러한 모델들이 실제로 생산성을 얼마나 향상시키는지 아직 아무도 확실히 알지 못하기 때문입니다. 특히 교육, 기술 향상, 오류 확인과 같은 다운스트림(Downstream) 효과를 고려할 때, 기업들은 투자 수익률(ROI)을 측정할 신뢰할 수 있는 방법이 없습니다.

특정 성능 임계값을 넘어서면, 프리미엄 가격에 최고 수준의 성능을 제공하는 것보다 저렴한 가격에 '충분히 좋은(Good enough)' 성능을 내는 것이 더 매력적일 수 있습니다. 최근 SpaceX에 인수되는 것으로 알려진 Cursor(Claude Code의 경쟁사)는 중국의 오픈 웨이트 모델을 기반으로 맞춤형 미세 조정(Fine-tuned)된 코딩 모델을 구축하여 OpenAI와 Anthropic이 제공하는 것보다 상당히 저렴한 비용을 자랑합니다.

OpenAI의 샘 알트만(Sam Altman) CEO는 이 문제로 갈등하는 것처럼 보입니다. 그는 최근 X(옛 트위터)에 다음과 같이 게시했습니다. "나는 모델이 더 똑똑해지는 것보다 더 저렴하고 빠르기를 원한다고 계속 생각하지만, 여전히 그저 더 똑똑해지는 것이 가장 중요한 것 같습니다."

DEC_D_Incontent-2

알트만의 견해는 더 똑똑한 AI가 스스로를 개선하는 데 도움이 되어 전반적인 발전 속도를 높일 것이라는 베팅에 기반을 둘 수도 있습니다. OpenAI, Anthropic 및 중국 개발자들은 모두 최근 자사 모델이 이미 R&D(연구 개발) 작업을 가속화하고 있다고 밝혔습니다.

과장 없는 AI 뉴스 – 전문가가 직접 엄선

THE DECODER를 구독하고 광고 없는 읽기, 주간 AI 뉴스레터, 연 6회 발행되는 독점 "AI 레이더" 프론티어 보고서, 전체 아카이브 액세스 및 댓글 섹션 액세스 혜택을 누리세요. 지금 바로 구독하세요.

출처: CAISI

원문 보기

원문 보기 (영어)

China is falling behind in the AI race, according to a US government benchmark Matthias Bastian View the LinkedIn Profile of Matthias Bastian May 3, 2026 Nano Banana Pro prompted by THE DECODER Ask about this article… Search A new report from the Center for AI Standards and Innovation (CAISI) claims Chinese AI models are losing ground to their US counterparts. The agency recently put the new Chinese open-weight model Deepseek V4 Pro through its paces. The verdict: it's roughly eight months behind the leading US models. CAISI tested performance across cybersecurity, software development, math, natural sciences, and abstract reasoning. CAISI calls Deepseek V4 the most capable Chinese AI model to date. But in private testing, it reportedly performs worse than Deepseek's own technical report suggests. Deepseek pitches the model as roughly on par with current US models like Opus 4.6 and GPT-5.4. CAISI says it's actually closer to the older GPT-5 - especially on abstract reasoning, cybersecurity, and software development. Math is the one area where Deepseek V4 nearly matches the top US models. Ad The center, which likely has its own political agenda, sits within the National Institute of Standards and Technology (NIST). Its report paints a picture of a widening gap between US and Chinese models. Independent measurements tell a different story , showing the gap has stayed roughly constant. Ad DEC_D_Incontent-1 Price might start to matter more than raw capability On price, Deepseek V4 has a clear edge. It came in cheaper than the comparable GPT-5.4 mini in five of seven tests. And price is becoming a bigger factor as AI models are expected to run longer and handle more complex tasks . Meanwhile, top-tier US models keep getting pricier . That matters because no one really knows yet how much these models actually boost productivity . Businesses don't have reliable ways to measure return on investment, especially once you factor in downstream effects like training, upskilling, and error checking. Ad Past a certain capability threshold, "good enough" performance at a low price could end up more attractive than top-tier performance at premium rates. Cursor, the Claude Code competitor reportedly being acquired by SpaceX , built its custom fine-tuned coding model on top of a Chinese open-weight model , making it significantly cheaper than what OpenAI and Anthropic offer. OpenAI CEO Sam Altman seems torn on this . In a recent post on X, he wrote: "I keep thinking I want the models to be cheaper/faster more than I want them to be smarter, but it seems that just being smarter is still the most important thing." Ad DEC_D_Incontent-2 Altman's view may also rest on the bet that smarter AI could help improve itself , speeding up progress across the board. OpenAI, Anthropic, and Chinese developers have all said recently that their own models are already accelerating their R&D work. Ad AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: CAISI

인공지능 경쟁 심층탐색(Deepseek) 미국 정부 정책 벤치마크 테스트 가격 경쟁력