Hacker News • 104일 전

클로드 오푸스 4.7 공개

IMP

9/10

핵심 요약

Anthropic이 최신 AI 모델인 클로드 오푸스 4.7을 전면 공개했습니다. 이 모델은 복잡한 소프트웨어 엔지니어링 및 장시간 실행되는 코딩 작업에서 오푸스 4.6 대비 눈에 띄는 성능 향상을 보여주며, 고해상도 이미지 처리 능력도 강화되었습니다. 특히 강력한 사이버 보안 능력을 가진 '미토스(Mythos)' 모델의 안전한 출시를 위해 새로운 안전장치와 합법적 보안 전문가를 위한 검증 프로그램을 도입한 것이 중요한 포인트입니다.

번역된 본문

제품 발표: 클로드 오푸스 4.7 소개 (2026년 4월 16일)

당사의 최신 모델인 클로드 오푸스 4.7이 이제 정식 출시되었습니다. 오푸스 4.7은 고급 소프트웨어 엔지니어링 분야에서 오푸스 4.6보다 눈에 띄게 향상되었으며, 특히 가장 어려운 작업에서 큰 성과를 보여줍니다. 사용자들은 이전에는 긴밀한 감독이 필요했던 가장 까다로운 코딩 작업을 오푸스 4.7에 안심하고 맡길 수 있다고 평가하고 있습니다. 오푸스 4.7은 복잡하고 오래 실행되는 작업을 엄격하고 일관되게 처리하며, 지시 사항에 세심하게 주의를 기울이고 결과를 보고하기 전에 자체적인 검증 방법을 고안합니다.

또한 이 모델은 비전 능력이 크게 향상되어 이미지를 훨씬 더 높은 해상도로 볼 수 있습니다. 전문적인 작업을 완료할 때 더욱 세련되고 창의적으로 평가받으며, 고품질의 인터페이스, 슬라이드, 문서를 생성합니다. 그리고 당사의 가장 강력한 모델인 '클로드 미토스 프리뷰(Claude Mythos Preview)'만큼 광범위한 능력을 갖추지는 않았음에도 불구하고, 다양한 벤치마크에서 오푸스 4.6보다 더 나은 결과를 보여줍니다.

지난주 당사는 '프로젝트 글래스윙(Project Glasswing)'을 발표하며 사이버 보안 분야에서 AI 모델의 위험성과 이점을 강조한 바 있습니다. 당사는 클라우드 미토스 프리뷰의 출시를 제한하고, 능력이 낮은 모델에서 새로운 사이버 보안 안전장치를 먼저 테스트할 것이라고 밝혔습니다. 오푸스 4.7이 바로 그 첫 번째 모델입니다. 이 모델의 사이버 보안 능력은 미토스 프리뷰만큼 뛰어나지 않습니다(실제로 학습 과정에서 이러한 능력을 차별적으로 감소시키는 실험을 진행했습니다). 당사는 금지되거나 위험도가 높은 사이버 보안 목적의 요청을 자동으로 감지하고 차단하는 안전장치를 적용하여 오푸스 4.7을 출시합니다. 이러한 안전장치의 실제 배포를 통해 얻은 교훈은 궁극적으로 미토스급 모델을 폭넓게 출시한다는 목표를 향해 나아가는 데 도움이 될 것입니다. 취약점 연구, 침투 테스트, 레드팀(Red-teaming)과 같은 합법적인 사이버 보안 목적으로 오푸스 4.7을 사용하려는 보안 전문가들은 당사의 새로운 '사이버 검증 프로그램(Cyber Verification Program)'에 참여하실 것을 권장합니다.

오푸스 4.7은 오늘부터 모든 클로드 제품과 당사의 API, 아마존 베드락(Amazon Bedrock), 구글 클라우드 버텍스 AI(Vertex AI), 마이크로소프트 파운드리(Microsoft Foundry)에서 사용할 수 있습니다. 가격은 오푸스 4.6과 동일하게 백만 입력 토큰당 5달러, 백만 출력 토큰당 25달러로 유지됩니다. 개발자들은 클라우드 API를 통해 claude-opus-4-7 모델을 사용할 수 있습니다.

클로드 오푸스 4.7 테스트 결과 클로드 오푸스 4.7은 얼리 액세스 테스터들로부터 강력한 피드백을 받았습니다:

"초기 테스트 결과, 클로드 오푸스 4.7이 당사 개발자들에게 상당한 도약의 잠재력을 보여주고 있습니다. 이 모델은 기획 단계에서 자체적인 논리적 오류를 포착하고 실행 속도를 높여주며, 이전 클로드 모델들을 훨씬 능가합니다. 수백만 명의 소비자와 기업에게 대규모 서비스를 제공하는 핀테크 플랫폼으로서, 이러한 속도와 정확도의 결합은 판도를 바꿀 수 있습니다. 고객이 매일 의존하는 신뢰할 수 있는 금융 솔루션을 더 빠르게 제공하여 개발 속도를 가속화할 수 있을 것입니다."
"Anthropic은 이미 코딩 모델의 표준을 설정했으며, 클라우드 오푸스 4.7은 시장에서 최고 수준의 모델로서 그 기준을 의미 있게 한 차원 더 높입니다. 당사 내부 평가에서 이 모델은 단순히 원시적인 성능뿐만 아니라 자동화, CI/CD, 장기 실행 작업과 같은 실제 비동기 워크플로우를 얼마나 잘 처리하는지에서도 두드러집니다. 또한 문제에 대해 더 깊이 생각하고 사용자에게 단순히 동의하는 대신 더 주관적이고 확고한 관점을 제시합니다."
"클로드 오푸스 4.7은 Hex가 평가한 모델 중 가장 강력합니다. 데이터가 누락되었을 때 그럴듯하지만 잘못된 대체 답변을 제공하는 대신 올바르게 누락 사실을 보고하며, 오푸스 4.6조차 빠져드는 불일치 데이터 함정을 피합니다. 이는 더 지능적이고 효율적인 오푸스 4.6이라고 할 수 있습니다. 적은 노력을 기울인 오푸스 4.7의 결과물은 보통 수준의 노력을 기울인 오푸스 4.6과 거의 동등합니다."
"당사의 93개 작업 코딩 벤치마크에서 클로드 오푸스 4.7은 오푸스 4.6 대비 해결률을 13% 끌어올렸으며, 여기에는 오포스 4.6과 소네트(Sonnet) 4.6 모두 해결하지 못한 4개의 작업이 포함됩니다. 더 빠른 중간 지연 시간과 엄격한 지시 사항 준수가 결합되어, 복잡하고 장기간 실행되는 코딩 워크플로우에 특히 의미가 있습니다. 다단계 작업에서의 마찰을 줄여주어 개발자들이 흐름을 유지하고 집중할 수 있도록 돕습니다."

원문 보기

원문 보기 (영어)

Product Announcements Introducing Claude Opus 4.7 Apr 16, 2026 Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back. The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks: Last week we announced Project Glasswing , highlighting the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus 4.7 is the first such model: its cyber capabilities are not as advanced as those of Mythos Preview (indeed, during its training we experimented with efforts to differentially reduce these capabilities). We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models. Security professionals who wish to use Opus 4.7 for legitimate cybersecurity purposes (such as vulnerability research, penetration testing, and red-teaming) are invited to join our new Cyber Verification Program . Opus 4.7 is available today across all Claude products and our API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens and $25 per million output tokens. Developers can use claude-opus-4-7 via the Claude API . Testing Claude Opus 4.7 Claude Opus 4.7 has garnered strong feedback from our early-access testers: In early testing, we’re seeing the potential for a significant leap for our developers with Claude Opus 4.7. It catches its own logical faults during the planning phase and accelerates execution, far beyond previous Claude models. As a financial technology platform serving millions of consumers and businesses at significant scale, this combination of speed and precision could be game-changing: accelerating development velocity for faster delivery of the trusted financial solutions our customers rely on every day. Anthropic has already set the standard for coding models, and Claude Opus 4.7 pushes that further in a meaningful way as the state-of-the-art model on the market. In our internal evals, it stands out not just for raw capability, but for how well it handles real-world async workflows—automations, CI/CD, and long-running tasks. It also thinks more deeply about problems and brings a more opinionated perspective, rather than simply agreeing with the user. Claude Opus 4.7 is the strongest model Hex has evaluated. It correctly reports when data is missing instead of providing plausible-but-incorrect fallbacks, and it resists dissonant-data traps that even Opus 4.6 falls for. It’s a more intelligent, more efficient Opus 4.6: low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, it’s particularly meaningful for complex, long-running coding workflows. It cuts the friction from those multi-step tasks so developers can stay in the flow and focus on building. Based on our internal research-agent benchmark, Claude Opus 4.7 has the strongest efficiency baseline we’ve seen for multi-step work. It tied for the top overall score across our six modules at 0.715 and delivered the most consistent long-context performance of any model we tested. On General Finance—our largest module—it improved meaningfully on Opus 4.6, scoring 0.813 versus 0.767, while also showing the best disclosure and data discipline in the group. And on deductive logic, an area where Opus 4.6 struggled, Opus 4.7 is solid. Claude Opus 4.7 extends the limit of what models can do to investigate and get tasks done. Anthropic has clearly optimized for sustained reasoning over long runs, and it shows with market-leading performance. As engineers shift from working 1:1 with agents to managing them in parallel, this is exactly the kind of frontier capability that unlocks new workflows. We’re seeing major improvements in Claude Opus 4.7’s multimodal understanding, from reading chemical structures to interpreting complex technical diagrams. The higher resolution support is helping Solve Intelligence build best-in-class tools for life sciences patent workflows, from drafting and prosecution to infringement detection and invalidity charting. Claude Opus 4.7 takes long-horizon autonomy to a new level in Devin. It works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn't reliably run before. For Replit, Claude Opus 4.7 was an easy upgrade decision. For the work our users do every day, we observed it achieving the same quality at lower cost—more efficient and precise at tasks like analyzing logs and traces, finding bugs, and proposing fixes. Personally, I love how it pushes back during technical discussions to help me make better decisions. It really feels like a better coworker. Claude Opus 4.7 demonstrates strong substantive accuracy on BigLaw Bench for Harvey, scoring 90.9% at high effort with better reasoning calibration on review tables and noticeably smarter handling of ambiguous document editing tasks. It correctly distinguishes assignment provisions from change-of-control provisions, a task that has historically challenged frontier models. Substance was consistently rated as a strength across our evaluations: correct, thorough, and well-cited. Claude Opus 4.7 is a very impressive coding model, particularly for its autonomy and more creative reasoning. On CursorBench, Opus 4.7 is a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%. For complex multi-step workflows, Claude Opus 4.7 is a clear step up: plus 14% over Opus 4.6 at fewer tokens and a third of the tool errors. It’s the first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold. This is the reliability jump that makes Notion Agent feel like a true teammate. In our evals, we saw a double-digit jump in accuracy of tool calls and planning in our core orchestrator agents. As users leverage Hebbia to plan and execute on use cases like retrieval, slide creation, or document generation, Claude Opus 4.7 shows the potential to improve agent decision-making in these workflows. On Rakuten-SWE-Bench, Claude Opus 4.7 resolves 3x more production tasks than Opus 4.6, with double-digit gains in Code Quality and Test Quality. This is a meaningful lift and a clear upgrade for the engineering work our teams are shipping every day. For CodeRabbit’s code review workloads, Claude Opus 4.7 is the sharpest model we’ve tested. Recall improved by over 10%, surfacing some of the most difficult-to-detect bugs in our most complex PRs, while precision remained stable despite the increased coverage. It’s a bit faster than GPT-5.4 xhigh on our harness, and we’re lining it up for our heaviest review work at launch. For Genspark’s Super Agent, Claude Opus 4.7 nails the three production differentiators that matter most: l

Anthropic 클로드 오푸스 AI 코딩 사이버 보안 모델 출시