Hacker News • 61일 전

클로드 오푸스 4.8 발표

IMP

8/10

핵심 요약

앤스로픽이 최신 AI 모델인 '클로드 오푸스 4.8'을 발표했습니다. 코딩, 에이전트 기능, 추론 등 전반적인 벤치마크에서 성능이 향상되었으며, 동일한 가격으로 제공됩니다. 특히 빠른 모드(Fast mode)의 비용이 3배 저렴해졌고, 클로드 코드(Claude Code) 내 대규모 작업을 수행하는 '동적 워크플로우' 등 다양한 신규 기능이 함께 도입되었습니다.

번역된 본문

제품 발표: 클로드 오푸스 4.8 소개 (2026년 5월 28일)

저희는 클로드 오푸스(Claude Opus)의 새로운 버전인 클로드 오푸스 4.8로 업그레이드합니다. 오푸스 4.7을 기반으로 벤치마크 전반에 걸쳐 성능이 개선되었으며, 더욱 효과적인 협업이 가능해졌습니다. 오늘부터 기존과 동일한 가격으로 이용할 수 있습니다.

오푸스 4.8은 여러 가지 새로운 기능과 함께 출시됩니다. claude.ai 사용자는 이제 클로드가 작업에 얼마나 많은 노력을 기울일지 직접 제어할 수 있습니다. 또한 클로드 코드(Claude Code)는 매우 대규모의 문제를 해결할 수 있는 새로운 '동적 워크플로우(Dynamic Workflows)' 기능을 갖추게 되었습니다. 그리고 오푸스 4.8의 빠른 모드(모델이 2.5배 빠른 속도로 작동)는 이전 모델에 비해 비용이 3분의 1로 줄어들었습니다.

오푸스 4.8의 기능 아래 표는 코딩, 에이전트(Agentic) 기술, 추론 및 실무 지식 작업 테스트에서 오푸스 4.8이 이전 모델 및 다른 모델들과 비교하여 어떤 성능을 보여주는지 나타냅니다. 더 자세한 내용과 훨씬 더 광범위한 기능 평가는 클로드 오푸스 4.8 시스템 카드(System Card)에서 확인할 수 있습니다.

오푸스 4.8과 협업하기 초기 테스터들은 클로드 오푸스 4.8이 에이전트 작업을 수행할 때 판단력이 더욱 날카롭고 신뢰할 수 있다고 평가했습니다. 다음은 이 테스터들이 오푸스 4.8과 협업하며 남긴 실제 후기들입니다.

"클로드 오푸스 4.8은 판단력이 눈에 띄게 향상되었습니다. 클로드 코드에서 올바른 질문을 던지고, 스스로의 실수를 파악하며, 계획이 타당하지 않을 때는 반박하고, 큰 변경을 가하기 전에 복잡한 다중 서버 탐색 과정에서 확신을 쌓습니다. 이는 개발에 매우 훌륭한 모델입니다."

"저희의 슈퍼 에이전트(Super-Agent) 벤치마크에서 클로드 오푸스 4.8은 모든 케이스를 엔드투엔드(End-to-End)로 완료한 유일한 모델이며, 동일한 비용으로 이전 오푸스 모델과 GPT-5.5를 능가했습니다. 번역, 심층 리서치, 슬라이드 제작, 분석 등 에이전트 기반 제품에 강력한 안정성을 제공합니다."

"CursorBench에서 클라우드 오푸스 4.8은 모든 난이도 수준에서 이전 오푸스 모델을 능가합니다. 도구 호출(Tool calling)이 훨씬 더 효율적이며, 동일한 지능적 수행을 위해 더 적은 단계를 거치고 작업을 끝까지 완수해 냅니다."

"클로드 오푸스 4.8은 저희의 법률 에이전트 벤치마크에서 역대 최고 점수를 기록했으며, 올패스(All-pass) 기준에서 10%를 돌파한 최초의 모델입니다. 실질적인 법률 업무에 있어서 이러한 정확도 향상은 고객들이 자신감을 가지고 실제 변호사 업무를 얼마나 더 많이 맡길 수 있는지와 직결됩니다."

"클로드 오푸스 4.8은 오푸스 4.7에 비해 삶의 질(Quality-of-life)을 크게 높여주는 업데이트처럼 느껴집니다. 더 빠르고 협업이 쉬워졌으며, 긴 세션 동안 맥락과 스타일 방향성을 유지하는 능력이 탁월합니다. 오푸스 4.8은 목소리(Tone), 취향, 기술적 실행력이 모두 조화를 이루어야 하는 제 작업에서 계속 신뢰했던 모델입니다."

"클로드 오푸스 4.8은 저희가 테스트한 컴퓨터 사용 및 브라우저 에이전트 모델 중 가장 뛰어납니다. Online-Mind2Web에서 84%를 기록했으며, 이는 오푸스 4.7과 GPT-5.5 모두에 비해 의미 있는 도약입니다. 고객의 에이전트 워크로드가 엔드투엔드로 안정적으로 작동해야 하는 요구사항에 맞게 반성적이고 작업에 집중하는 모습을 보여줍니다."

"클로드 오푸스 4.8은 도구를 깔끔하게 사용하고, 무인으로 계속 실행되어야 하는 자율 엔지니어링 워크로드에 필요한 일관된 지시 수행 능력을 갖추었습니다. 오푸스 4.6을 개선했을 뿐만 아니라 오푸스 4.7에서 나타났던 주석 장황화(Comment-verbosity) 및 도구 호출 문제를 해결했습니다. 앤스로픽의 이번 릴리스는 데빈(Devin)을 기반으로 구축하는 엔지니어들의 역량 향상 속도를 직접적으로 높여줍니다."

"장기 실행 평가(Long-running evals)에서 클로드 오푸스 4.8의 분석은 이전 오푸스 모델들보다 지속적으로 더 높은 품질을 보여주었습니다. 더 빠르게 완료되고 더 풍부하며 정보 밀도가 높은 결과물을 생성했습니다. 전반적으로 신호 대 잡음비(Signal to noise ratio)가 눈에 띄게 개선되었습니다. 가장 큰 차별점은 오푸스 4.8이 분석의 입력 및 출력과 관련된 문제를 자발적으로 지적하는 성향이 있다는 점인데, 이는 다른 모델들이 사용자가 직접 발견하도록 놓치는 경우가 많았던 부분입니다."

"CoCounsel Legal 전반에 걸쳐 클로드 오푸스 4.8은 이전 오푸스 모델에 비해 일관성과 추론 품질에서 의미 있는 개선을 가져왔습니다. 고객이 의존하는 고위험 전문 워크플로우에서 이러한 안정성은 매우 중요합니다. 저희가 법률 및 세무 전문가를 위해 수탁자 등급(Fiduciary-grade)의 AI 시스템을 구축함에 있어, 이러한 발전은 업계 표준을 높이는 데 기여합니다."

원문 보기

원문 보기 (영어)

Product Announcements Introducing Claude Opus 4.8 May 28, 2026 We’re upgrading Claude Opus to a new version: Claude Opus 4.8. It builds on Opus 4.7 with improvements across benchmarks, and is a more effective collaborator. It’s available today for the same price. Opus 4.8 launches alongside several new features. Users on claude.ai now have control over the amount of effort Claude puts into a task. Claude Code has a new “dynamic workflows” feature that allows it to tackle very large-scale problems. And fast mode for Opus 4.8—where the model can work at 2.5× the speed—is now three times cheaper than it was for previous models. Opus 4.8’s capabilities The table below shows how Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks. More details and a much wider range of capability evaluations are provided in the Claude Opus 4.8 System Card . Collaborating with Opus 4.8 Early testers have found Claude Opus 4.8 to be more reliable and sharper in its judgement when it’s performing agentic tasks. Below are quotes from many of these testers about their experience collaborating with Opus 4.8: Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes. It’s a great model to build with. On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost. For agent products in translation, deep research, slide-building, and analysis, it delivers powerful reliability. On CursorBench, Claude Opus 4.8 exceeds prior Opus models across every effort level. Tool calling is meaningfully more efficient, using fewer steps for the same intelligence, and it carries end-to-end tasks through. Claude Opus 4.8 delivers the highest score recorded on our Legal Agent Benchmark, and is the first model to break 10% overall on the all-pass standard. For substantive legal work, that’s the kind of accuracy lift that translates directly into how much real attorney work our customers can hand off with confidence. Claude Opus 4.8 feels like a major quality-of-life update over Opus 4.7: faster, easier to collaborate with, and better at carrying context and style direction across a long session. Opus 4.8 is the model I kept trusting for work where voice, taste, and technical execution all have to happen side-by-side. Claude Opus 4.8 is the strongest computer-use and browser-agent model we’ve tested, scoring 84% on Online-Mind2Web, which is a meaningful jump over both Opus 4.7 and GPT-5.5. It stays reflective and on-task in the way our customers’ agent workloads need to be reliable end-to-end. Claude Opus 4.8 uses tools cleanly and follows instructions with the consistency our autonomous engineering workloads need to keep running unattended. It improves on Opus 4.6 and fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7. This release from Anthropic translates directly into faster capability gains for engineers building on Devin. On our long-running evals, Claude Opus 4.8’s analysis was consistently higher quality than prior Opus models. It finished faster and produced richer, more information dense outputs. Overall, a noticeably better signal to noise ratio. The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch. Across CoCounsel Legal, Claude Opus 4.8 delivered meaningful improvements in consistency and reasoning quality compared to prior Opus models. For the high-stakes professional workflows our customers depend on, that reliability matters. As we build fiduciary-grade AI systems for legal and tax professionals, advances like these help raise the standard for trusted AI performance in real-world workflows. Claude Opus 4.8 sets a new bar for enterprise AI. In Genie, Databricks’ AI agent for data and knowledge work, the new Opus model unlocks a step change in agentic reasoning, tackling deeper, multistep questions faster than any prior Opus. Its multimodal strength also lets Genie reason directly over PDFs, diagrams, and other unstructured content at 61% cheaper token cost than Opus 4.7. For financial-document workflows in Hebbia’s orchestrator, Claude Opus 4.8 delivers the same strong quality as Opus 4.7 with noticeably better citation precision and more token efficiency on retrieval, which works incredibly well for the kinds of dense filings our customers run every day. 01 / 11 One of the most prominent improvements in Opus 4.8 is its honesty . We train all our models to be honest—for instance, to avoid making claims that they can’t support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. This is borne out in our evaluations , which show that Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. As always, we ran a detailed alignment assessment on the model before release. In terms of positive traits, our Alignment team concluded that Opus 4.8 “reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user’s best interest.” The assessment also showed Opus 4.8 to have rates of misaligned behavior (such as deception or cooperation with misuse) that are substantially lower than Opus 4.7, and similar to our best-aligned model, Claude Mythos Preview. The full alignment assessment, accompanied by a suite of pre-deployment safety tests, is reported in the Claude Opus 4.8 System Card. Also launching today In addition to Claude Opus 4.8, we’re making the following updates: Dynamic workflows . This new feature, available in research preview, allows Claude to take on even bigger tasks in Claude Code. Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer). It then verifies its outputs before reporting back to the user. For example, Claude Code with Opus 4.8 can now carry out codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge, with the existing test suite as its bar. You can read more about dynamic workflows—available in Claude Code for Enterprise, Team, and Max plans—in this post . Effort control in claude.ai and Cowork . A new control alongside the model selector lets users choose how much effort Claude puts into a response. On higher effort settings, Claude will think more frequently and more deeply to give better responses. On lower effort settings, Claude will respond faster and use up a user’s rate limits more slowly. Users now have this choice—the effort control is available on all plans. The Messages API now accepts system entries inside the messages array. Developers can update Claude’s instructions mid-task without breaking the prompt cache or routing the update through a user turn. This can be used in a given harness to update permissions, token budgets, or environment context as an agent runs. A note on effort Opus 4.8 defaults to high effort, which we judge to be the best overall balance of quality and user experience. On coding tasks, this effort level spends a similar number of tokens as Opus 4.7’s default, but with better performance. Users can choose “extra” (“ xhigh ” in Claude Code) or “max,” and the model will spend more tokens to get better results; we recommend using “extra” for difficult tasks and long-running asynchronous workflows. We have increased rate limits in Claude Code

클로드 오푸스 4.8 앤스로픽 에이전트 클로드 코드 성능개선