r/LocalLLaMA • 94일 전

DeepSeek V4 Pro, 지능 밀도 하락 문제 대두

IMP

7/10

핵심 요약

DeepSeek의 최신 모델인 V4 Pro가 이전 버전(V3.2)에 비해 동일 수준의 성능을 내기 위해 토큰을 과도하게 낭비하는 '지능 밀도 하락' 현상을 보이고 있습니다. 특히 경쟁사인 GPT-5.4 및 GPT-5.5와 비교했을 때 유사한 성능을 달성하기 위해 약 10배나 많은 토큰을 소모하여, 이는 곧 작업 완료에 10배 더 긴 시간이 소요됨을 의미합니다.

번역된 본문

V3.2 논문에서 그들은 다음과 같이 언급했습니다:

둘째, 토큰 효율성(token efficiency)은 여전히 과제로 남아있습니다. DeepSeek-V3.2는 일반적으로 Gemini 3.0-Pro와 같은 모델의 출력 품질에 도달하기 위해 더 긴 생성 궤적(즉, 더 많은 토큰)을 필요로 합니다. 향후 연구는 효율성을 개선하기 위해 모델의 추론 체인(reasoning chains)에 대한 지능 밀도(intelligence density)를 최적화하는 데 중점을 둘 것입니다.

그러나 V4 Pro에서는 상황이 더 악화된 것으로 보입니다. 생각하지 않는 모드(non-thinking mode)조차도 V3.2보다 훨씬 더 많은 토큰을 사용하며, V4 Pro(1.6T)는 V3.2(0.67T)보다 약 2.5배 더 큽니다. 이는 모델의 지능 밀도가 개선되기는커녕 오히려 감소했음을 시사합니다!

이를 GPT-5.4 및 GPT-5.5와 비교하면 그 격차는 훨씬 더 큽니다. DeepSeek은 유사한 성능을 달성하기 위해 약 10배 더 많은 토큰을 필요로 하는 것으로 보입니다. 동일한 TPS(초당 토큰 생성 속도)를 가정할 때, 이는 DeepSeek V4 Pro가 동일한 작업을 완료하는 데 약 10배 더 오래 걸린다는 것을 의미합니다.

원문 보기

원문 보기 (영어)

In the `V3.2` paper, they mentioned: >Second, token efficiency remains a challenge; DeepSeek-V3.2 typically requires longer generation trajectories (i.e., more tokens) to match the output quality of models like Gemini 3.0-Pro. Future work will focus on optimizing the intelligence density of the model’s reasoning chains to improve efficiency. However, in `V4 Pro`, the situation seems to have worsened. Even the non-thinking mode uses significantly more tokens than `V3.2`, and `V4 Pro` (1.6T) is roughly 2.5x larger than `V3.2` (0.67T). This suggests that the intelligence density of the model has decreased rather than improved! If we compare it with `GPT-5.4` and `GPT-5.5`, the gap is even larger. DeepSeek appears to require around 10x more tokens to achieve similar performance. Assuming the same TPS, this implies roughly 10x longer for DeepSeek V4 Pro to complete the same task.

DeepSeek 모델 효율성 토큰 비용 지능 밀도 AI 벤치마크