메뉴
BL
r/singularity 37일 전

딥시크, 신규 오픈소스 모델 'DeepSeek V4 Pro' 출시

IMP
9/10
핵심 요약

중국의 AI 기업 딥시크(DeepSeek)가 자사의 최신 대규모 언어 모델인 'DeepSeek V4 Pro'를 허깅페이스(Hugging Face)에 공개했습니다. 이 모델은 MIT 라이선스를 채택한 오픈소스 프로젝트로, MMLU-Pro 벤치마크에서 1위를 차지하는 등 수학, 코딩, 추론 능력에서 최고 수준의 성능을 입증했습니다. 특히 8-bit 및 FP8 양자화를 지원하여 메모리 효율성을 높인 것이 특징이며, 상용 및 연구 목적으로의 활용이 기대됩니다.

번역된 본문

딥시크(DeepSeek)가 최신 모델인 DeepSeek-V4-Pro를 출시했습니다. 해당 모델은 허깅페이스(Hugging Face) 플랫폼에 공개되었으며, 텍스트 생성(Text Generation)을 주요 파이프라인 태그로 사용합니다. 이 모델은 Transformers 및 Safetensors 라이브러리를 기반으로 구축되었으며, 누구나 자유롭게 사용하고 수정할 수 있는 MIT 라이선스를 따릅니다.

공개된 벤치마크 평가 결과에 따르면, DeepSeek-V4-Pro는 여러 주요 지표에서 뛰어난 성능을 보여주고 있습니다. 고난도 전문가 질문 데이터셋인 GPQA Diamond에서 90.1%의 정확도를 기록했으며, 수학 문제 해결 능력을 평가하는 GSM8K에서는 92.6%를 달성했습니다. 특히 고급 지식 및 추론 능력을 테스트하는 MMLU-Pro 벤치마크에서는 87.5%의 압도적인 점수로 1위를 차지하며 모델의 우수성을 입증했습니다.

코딩 및 소프트웨어 엔지니어링 역량 면에서도 강력한 모습을 보입니다. SWE-bench Verified에서 80.6%의 해결률을 보여주었으며, 더 난이도가 높은 SWE-bench Pro에서도 55.4%의 성과를 거두었습니다. 또한, 실제 터미널 환경에서의 작업 수행 능력을 평가하는 Terminal-bench 2.0에서 67.9%의 점수를 기록했습니다.

이 모델은 기술적인 측면에서도 주목할 만합니다. FP8 및 8-bit 정밀도(8-bit precision)를 지원하여, 방대한 파라미터를 가진 대규모 언어 모델을 더 적은 VRAM으로도 효율적으로 구동할 수 있도록 최적화되었습니다. 현재 이 모델은 허깅페이스에서 30회 다운로드되었으며, AI 커뮤니티 사용자들로부터 2,180개의 '좋아요'를 받으며 뜨거운 반응을 얻고 있습니다.

원문 보기
원문 보기 (영어)
","lstrip":false,"normalized":true,"rstrip":false,"single_word":false},"eos_token":{"__type":"AddedToken","content":"<|end▁of▁sentence|>","lstrip":false,"normalized":true,"rstrip":false,"single_word":false},"pad_token":{"__type":"AddedToken","content":"<|end▁of▁sentence|>","lstrip":false,"normalized":true,"rstrip":false,"single_word":false},"unk_token":null}},"createdAt":"2026-04-22T06:04:45.000Z","discussionsDisabled":false,"discussionsSorting":"recently-created","downloads":30,"downloadsAllTime":30,"id":"deepseek-ai/DeepSeek-V4-Pro","isLikedByUser":false,"availableInferenceProviders":[],"showHuggingChatEntry":false,"inference":"","lastModified":"2026-04-24T10:00:14.000Z","likes":2180,"pipeline_tag":"text-generation","library_name":"transformers","librariesOther":[],"trackDownloads":true,"model-index":null,"evalResults":[{"dataset":{"id":"Idavidrein/gpqa","isBenchmark":true},"value":90.1,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/gpqa.yaml","verified":false,"pullRequest":110,"rank":20,"label":"Diamond"},{"dataset":{"id":"openai/gsm8k","isBenchmark":true},"value":92.6,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/gsm8k.yaml","verified":false,"pullRequest":110,"rank":3,"label":"Gsm8k"},{"dataset":{"id":"cais/hle","isBenchmark":false},"value":37.7,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/hle.yaml","verified":false,"pullRequest":110,"rank":105,"label":"Hle"},{"dataset":{"id":"TIGER-Lab/MMLU-Pro","isBenchmark":false},"value":87.5,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/mmlu-pro.yaml","verified":false,"pullRequest":110,"rank":1,"label":"Mmlu Pro"},{"dataset":{"id":"ScaleAI/SWE-bench_Pro","isBenchmark":true},"value":55.4,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/swe-bench_pro.yaml","verified":false,"pullRequest":110,"rank":57,"label":"SWE Bench Pro"},{"dataset":{"id":"SWE-bench/SWE-bench_Verified","isBenchmark":true},"value":80.6,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/swe-bench_verified.yaml","verified":false,"pullRequest":110,"rank":0,"label":"Swe Bench Resolved"},{"dataset":{"id":"harborframework/terminal-bench-2.0","isBenchmark":true},"value":67.9,"source":{"url":"https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro","name":"Model Card","isExternal":false},"filename":".eval_results/terminal-bench-2.0.yaml","verified":false,"pullRequest":110,"rank":37,"label":"Terminalbench 2"}],"private":false,"repoType":"model","gated":false,"tags":["transformers","safetensors","deepseek_v4","text-generation","license:mit","eval-results","endpoints_compatible","8-bit","fp8","region:us"],"tag_objs":[{"id":"text-generation","label":"Text Generation","type":"pipeline_tag","subType":"nlp"},{"id":"transformers","label":"Transformers","type":"library"},{"id":"safetensors","label":"Safetensors","type":"library"},{"id":"deepseek_v4","label":"deepseek_v4","type":"other","clickable":true},{"id":"eval-results","label":"Eval Results","type":"other","clickable":true},{"id":"endpoints_compatible","label":"Inference Endpoints","type":"other","clickable":true},{"id":"8-bit","label":"8-bit precision","type":"other","clickable":true},{"id":"fp8","label":"fp8","type":"other","clickable":true},{"id":"license:mit","label":"mit","type":"license"},{"type":"region","label":"🇺🇸 Region: US","id":"region:us"}],"transformersInfo":{"auto_model":"AutoModelForCausalLM","pipeline_tag":"text-generation"},"widgetData":[{"text":"My name is Julien and I like to"},{"text":"I like traveling by train because"},{"text":"Paris is an amazing place to visit,"},{"text":"Once upon a time,"}],"safetensors":{"parameters":{"BF16":2816899328,"I64":2327040,"F32":87776414,"F8_E8M0":49150268416,"F8_E4M3":23169335296,"I8":786381668352},"total":861608274846,"sharded":true,"totalFileSize":864732335428},"hasBlockedOids":false,"region":"us","isQuantized":false,"licenseFilePath":"LICENSE"},"discussionsStats":{"closed":4,"open":141,"total":145},"query":{},"inferenceContextData":{"billableEntities":[],"entityName2Providers":{}},"hasQuantizations":true}"> DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence Technical Report 👁️ Introduction We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens . DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: Hybrid Attention Architecture: We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections (mHC): We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity. Muon Optimizer: We employ the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model. DeepSeek-V4-Pro-Max , the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile, DeepSeek-V4-Flash-Max achieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows. Model Downloads Model #Total Params #Activated Params Context Length Precision Download DeepSeek-V4-Flash-Base 284B 13B 1M FP8 Mixed HuggingFace | ModelScope DeepSeek-V4-Flash 284B 13B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope DeepSeek-V4-Pro-Base 1.6T 49B 1M FP8 Mixed HuggingFace | ModelScope DeepSeek-V4-Pro 1.6T 49B 1M FP4 + FP8 Mixed* HuggingFace | ModelScope *FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8. Evaluation Results Base Model Benchmark (Metric) # Shots DeepSeek-V3.2-Base DeepSeek-V4-Flash-Base DeepSeek-V4-Pro-Base Architecture - MoE MoE MoE # Activated Params - 37B 13B 49B # Total Params - 671B 284B 1.6T World Knowledge AGIEval (EM) 0-shot 80.1 82.6 83.1 MMLU (EM) 5-shot 87.8 88.7 90.1 MMLU-Redux (EM) 5-shot 87.5 89.4 90.8 MMLU-Pro (EM) 5-shot 65.5 68.3 73.5 MMMLU (EM) 5-shot 87.9 88.8 90.3 C-Eval (EM) 5-shot 90.4 92.1 93.1 CMMLU (EM) 5-shot 88.9 90.4 90.8 MultiLoKo (EM) 5-shot 38.7 42.2 51.1 Simple-QA verified (EM) 25-shot 28.3 30.1 55.2 SuperGPQA (EM) 5-shot 45.0 46.5 53.9 FACTS Parametric (EM) 25-shot 27.1 33.9 62.6 TriviaQA (EM) 5-shot 83.3 82.8 85.6 Language & Reasoning BBH (EM) 3-shot 87.6 86.9 87.5 DROP (F1) 1-shot 88.2 88.6 88.7 HellaSwag (EM) 0-shot 86.4 85.7 88.0 WinoGrande (EM) 0-shot 78.9 79.5 81.5 CLUEWSC (EM) 5-sh
관련 소식