The Decoder • 96일 전

앤스로픽, 클로드 코드 품질 저하 사과

IMP

8/10

핵심 요약

최근 한 달간 지속된 코딩 에이전트 '클로드 코드(Claude Code)'의 품질 저하 문제에 대해 앤스로픽이 공식 사과 및 원인을 발표했습니다. 회사는 추론 깊이 축소, 캐싱 최적화 버그, 시스템 프롬프트 길이 제한 등 3가지 독립적인 오류를 원인으로 지목하고 이를 모두 해결했습니다. 재발 방지를 위해 내부 테스트를 강화하고, 피해를 입은 모든 구독자의 사용량 한도를 초기화하는 보상 조치를 취했습니다.

번역된 본문

앤스로픽, 클로드 코드 문제 확인 및 더 엄격한 품질 관리 약속

막시밀리안 슈라이너(Maximilian Schreiner) 2026년 4월 24일

핵심 요약:

사용자들의 클로드 코드(Claude Code) 품질 저하 불만 이후, 앤스로픽은 추론 깊이, 캐싱(Caching), 텍스트 길이 제한에 영향을 미친 3가지 버그를 식별하고 수정했습니다.
유사한 사고를 방지하기 위해 업데이트 배포 전 내부 테스트를 강화하고 있습니다.
보상 조치로 모든 구독자의 사용량 한도를 초기화했습니다.
이번 문제는 업계 전반의 컴퓨팅 자원 부족(Compute Crunch) 문제를 부각시킵니다. 이 병목 현상은 점점 더 잦은 서비스 중단을 유발하고 AI 제공업체들이 컴퓨팅 집약적 도구의 가격을 인상하게 만들고 있습니다.

사용자들이 클로드 코드(Claude Code)의 품질 저하에 대해 불만을 제기했습니다. 앤스로픽은 3가지 개별적인 오류 원인을 식별하고 수정했습니다. 회사는 앞으로 더 엄격한 품질 관리를 약속했습니다.

지난 한 달 동안 점점 더 많은 사용자들이 앤스로픽의 코딩 도구인 클로드 코드의 성능이 눈에 띄게 저하되었다고 보고했습니다. 앤스로픽은 이제 상세한 사후 분석(Post-mortem)을 통해 원인을 밝혔습니다. 클로드 코드, 클로드 에이전트 SDK(Claude Agent SDK), 클로드 코워크(Claude Cowork)에 이루어진 3가지 독립적인 변경 사항이 결합하여 광범위하게 체감되는 품질 저하를 초래했습니다. 앤스로픽에 따르면 API 자체는 영향을 받지 않았습니다.

이 세 가지 문제는 모두 4월 20일 버전 2.1.116을 통해 해결되었습니다.

낮아진 추론 노력, 캐싱 버그 및 프롬프트 제한이 문제 발생

첫 번째 문제는 3월 4일로 거슬러 올라갑니다. 일부 사용자가 높은 모드에서 극심한 지연 시간(Latency)을 겪고 있었기 때문에, 앤스로픽은 기본 추론 노력(Reasoning effort)을 '높음(High)'에서 '중간(Medium)'으로 낮추었습니다. 내부 테스트 결과 중간 모드는 대부분의 작업에서 약간 낮은 결과를 보여줄 뿐 지연 시간을 크게 줄여주는 것으로 나타났습니다.

하지만 이 트레이드오프는 실패했습니다. 사용자들은 즉시 클로드 코드가 덜 똑똑해진 것 같다고 보고했습니다. 4월 7일, 앤스로픽은 이 변경 사항을 영구적으로 롤백했습니다.

두 번째 문제는 3월 26일에 배포된 캐싱 최적화 버그였습니다. 세션을 재개할 때 지연 시간을 줄이기 위해 1시간 비활동 후 오래된 추론 섹션을 한 번만 삭제하려는 계획이었습니다. 그러나 코딩 오류로 인해 이후의 모든 턴마다 추론 기록이 지워지는 결과를 낳았습니다.

결과적으로 클로드는 자체 결정에 대한 컨텍스트를 점진적으로 잃어버렸습니다. 사용자들은 기억력 상실, 반복 및 이상한 도구 선택을 발견했습니다. 게다가, 결과적인 캐시 미스(Cache misses)로 인해 예상보다 빠르게 사용량 한도를 소진했습니다. 앤스로픽에 따르면 이 버그는 리뷰 과정에서 발견되지 않고 넘어갔으며 4월 10일이 되어서야 수정되었습니다.

세 번째 문제는 4월 16일에 나타났습니다. 바로 Opus 4.7의 잘 알려진 장황함(Verbosity)을 억제하기 위한 시스템 프롬프트 지침이었습니다. 해당 지침은 다음과 같았습니다. "길이 제한: 도구 호출 사이의 텍스트는 25단어 이하로 유지하고, 작업에 자세한 설명이 필요한 경우를 제외하고 최종 응답은 100단어 이하로 유지하세요." 이후 더 광범위한 평가 슈트(Eval suite)를 사용한 테스트에서 3%의 품질 저하가 나타났습니다. 앤스로픽은 4월 20일에 이 변경 사항을 롤백했습니다.

앤스로픽의 품질 관리 강화

각 변경 사항이 서로 다른 시간대에 다른 사용자 그룹에게 영향을 미쳤기 때문에, 결합된 효과는 모호하고 점진적인 저하처럼 느껴졌으며 처음에는 정상적인 편차와 구별하기가 어려웠습니다.

앞으로 앤스로픽은 더 많은 직원이 내부 테스트 버전 대신 클로드 코드의 정확한 퍼블릭 빌드(Public build)를 사용할 것이라고 밝혔습니다. 또한 모든 시스템 프롬프트 변경은 이제 광범위하고 모델에 특화된 평가 슈트(Eval suite)를 통과해야 합니다.

지능에 영향을 미칠 수 있는 변경 사항의 경우, 앤스로픽은 안정화 기간(Soak periods)과 점진적 출시(Gradual rollouts)를 도입할 계획입니다. 보상으로 회사는 모든 구독자의 사용량 한도를 초기화했습니다.

또한 앤스로픽은 제품 결정을 더 투명하게 전달하기 위해 X(옛 트위터) 계정 @ClaudeDevs를 개설했습니다.

업계 전반에 걸쳐 계속되는 체감 품질 저하 문제

사용자들이 AI 품질 저하에 대해 불만을 제기한 것은 이번이 처음이 아닙니다. 2023년 하반기에도 사용자들은 OpenAI가 시간이 지남에 따라 GPT-4를 '더 멍청하게' 만들었다고 비난한 바 있습니다. OpenAI는 GPT-4에 중대한 변경을 가했다는 주장을 부인했습니다.

원문 보기

원문 보기 (영어)

Anthropic confirms Claude Code problems and promises stricter quality controls Maximilian Schreiner View the LinkedIn Profile of Maximilian Schreiner Apr 24, 2026 Nano Banana Pro prompted by THE DECODER Key Points Following user complaints about declining quality in Claude Code, Anthropic identified and fixed three separate bugs affecting reasoning depth, caching, and text length restrictions. To prevent similar incidents, the company is tightening internal testing before shipping updates. As compensation, Anthropic has reset usage limits for all subscribers. The problems highlight an industry-wide compute crunch. This bottleneck is increasingly causing outages and forcing AI providers to raise prices for compute-intensive tools. Ask about this article… Search Users complained about declining quality in Claude Code. Anthropic identified and fixed three separate sources of error. The company promises stricter quality controls going forward. Over the past month, a growing number of users reported that Anthropic's coding tool Claude Code was producing noticeably worse results. Anthropic has now laid out the causes in a detailed post-mortem: three independent changes to Claude Code, the Claude Agent SDK, and Claude Cowork combined to create a widely felt quality drop. The API itself was not affected, according to Anthropic. All three issues have been fixed as of April 20 with version 2.1.116. Lower reasoning effort, caching bugs, and prompt restrictions caused the problems The first issue dates back to March 4. Anthropic lowered the default reasoning effort from "high" to "medium" because some users were experiencing extreme latency in high mode. Internal testing had shown that medium mode delivered only slightly worse results on most tasks while significantly reducing latency. The trade-off didn't pay off: users quickly reported that Claude Code felt less intelligent. On April 7, Anthropic permanently rolled back the change. Ad The second problem was a bug in a caching optimization shipped on March 26. The plan was to delete older reasoning sections once after an hour of inactivity to reduce latency when resuming a session. A coding error caused the reasoning history to be wiped on every subsequent turn instead. Ad DEC_D_Incontent-1 Claude progressively lost context about its own decisions. Users noticed forgetfulness, repetition, and strange tool choices. On top of that, the resulting cache misses burned through usage limits faster than expected. According to Anthropic, the bug slipped through reviews undetected and wasn't fixed until April 10. A third issue appeared on April 16: a system prompt instruction meant to curb the well-known verbosity of Opus 4.7. The line read: "Length limits: keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail." Later testing with a broader eval suite revealed a 3 percent quality drop. Anthropic rolled back the change on April 20. Ad Anthropic tightens quality controls Because each change affected different user groups at different times, the combined effect felt like a vague, gradual decline that was initially hard to distinguish from normal variation. Going forward, Anthropic says more employees will use the exact public build of Claude Code instead of internal test versions. Every system prompt change will now have to pass a broad, model-specific eval suite. Ad DEC_D_Incontent-2 For changes that could impact intelligence, Anthropic plans to introduce soak periods and gradual rollouts. As compensation, the company has reset usage limits for all subscribers. Ad Anthropic also set up the X account @ClaudeDevs to communicate product decisions more transparently. Perceived quality drops remain a recurring theme across the AI industry This isn't the first time users have complained about declining AI quality. Back in the second half of 2023, users accused OpenAI of making GPT-4 "dumber" over time. OpenAI denied making significant changes to its models after release. Claude has faced similar complaints before , with infrastructure bugs as the culprit. The current case reinforces a pattern: what users perceive as model regressions often turns out to be changes in the tooling layer or infrastructure rather than the models themselves. In real-world use, users benefit from scaffolding like Claude Code because it steers model capabilities and provides the right context. When that scaffolding breaks, the opposite happens. Add in vendor-side tweaks like Anthropic's reasoning depth adjustment, and the effect compounds. The motivation behind such changes increasingly ties back to an industry-wide compute crunch . Anthropic's API availability recently sat at just 98.95 percent - well below the cloud industry standard of 99.99 percent. GPU hourly prices on the spot market rose 48 percent according to the Ornn Compute Price Index, and Bank of America analysts expect demand to outstrip supply through at least 2029. OpenAI is shutting down its video generation app Sora to free up compute for coding and enterprise products. GitHub also paused new signups for several Copilot tiers. This pressure is also shaking up pricing models. Anthropic's head of growth recently acknowledged that the existing Pro and Max plans weren't built for current agentic workloads since they were created before compute-intensive tools like Claude Code existed. The company even briefly tested removing Claude Code access for new Pro subscribers but reversed course after backlash. OpenAI, meanwhile, doubled API prices with GPT-5.5 compared to its predecessor, charging $5 per million input tokens and $30 per million output tokens. The era of cheap flat rates for the most powerful agentic AI tools appears to be coming to an end. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: Anthropic

앤스로픽 클로드 코드 품질 관리 버그 수정 코딩 AI