The Decoder • 95일 전

GPT-5.5 벤치마크 1위, 환각 문제와 20% 인상된 비용

IMP

8/10

핵심 요약

OpenAI의 최신 모델 GPT-5.5가 Artificial Analysis 지능 지수 60점을 기록하며 클로드 오퍼스 4.7(Claude Opus 4.7)과 제미나이 3.1 프로 프리뷰(Gemini 3.1 Pro Preview)를 제치고 종합 1위를 탈환했습니다. 토큰(token) 소모량 감소에도 불구하고 API 가격이 실질적으로 약 20% 인상되었으며, 특히 정답률은 높음에도 불구하고 모르는 것을 인정하지 않고 답변을 지어내는 '환각(hallucination)' 비율이 86%에 달해 개선이 시급한 과제로 꼽힙니다.

번역된 본문

GPT-5.5, 벤치마크 석권… 그러나 잦은 환각 현상과 20% 인상된 API 비용이 발목 작성자: Matthias Bastian | 2026년 4월 24일

핵심 요약:

OpenAI의 GPT-5.5가 60점을 기록하며 Artificial Analysis 지능 지수(Intelligence Index) 정상을 차지했으며, Claude Opus 4.7과 Gemini 3.1 Pro Preview 등 경쟁 모델들을 크게 앞섰습니다.
API 가격은 표면상 두 배로 올랐으나, 이전 모델인 GPT-5.4 대비 토큰(token) 소비량이 약 40% 감소하여 실질적인 가격 인상률은 약 20% 수준에서 그쳤습니다.
가장 큰 약점은 여전히 86%에 달하는 높은 환각(hallucination) 비율입니다. 팩트 벤치마크에서 최고 수준의 정확도를 기록했음에도 불구하고, GPT-5.5는 자신의 지식 부재를 인정하는 대신 거짓 답변을 지어내는 빈도가 높습니다.

GPT-5.5의 API 비용은 GPT-5.4 대비 약 20% 더 비싸졌습니다. 이 모델은 AI 순위 정상을 차지했지만, 여전히 환각(hallucination) 문제를 안고 있습니다.

공식적으로 GPT-5.5의 API 가격은 백만 토큰당 입력 5달러, 출력 30달러로 GPT-5.4와 비교해 두 배 인상되었습니다. 하지만 벤치마크 평가 서비스인 Artificial Analysis에 따르면, 이 모델은 약 40% 더 적은 토큰을 소모하여 실질적인 가격 상승 폭은 약 20%로 완화되었습니다.

이는 전작과 동일한 가격표를 책정했지만 토큰 소모량이 35~40% 증가한 Anthropic의 Opus 4.7보다는 여전히 더 적은 비용 상승 폭입니다.

GPT-5.5는 또한 OpenAI를 다시 AI 순위 정상으로 끌어올렸으며, Artificial Analysis 지능 지수에서 2위를 3점 차로 제치고 1위를 기록했습니다.

GPT-5.5는 60점으로 Artificial Analysis 지능 지수 정상을 차지했으며, 동점인 57점의 Claude Opus 4.7과 Gemini 3.1 Pro Preview를 3점 차로 앞섰습니다. | 이미지: Artificial Analysis

강력한 가성비, 그러나 벤치마크가 전부는 아니다 중간 수준의 컴퓨팅(compute) 환경에서 GPT-5.5는 Claude Opus 4.7이 최대 컴퓨팅에서 기록한 점수와 동일한 성능을 4분의 1 비용(약 4,800달러 대신 약 1,200달러)으로 달성했습니다. 구글의 Gemini 3.1 Pro Preview는 약 900달러라는 더 저렴한 비용으로 비슷한 수치를 기록했습니다.

하지만 벤치마크 점수가 모든 것을 말해주지는 않습니다. 테스트 결과와 개발자 피드백에 따르면, Gemini는 주로 구글 제품군 전반의 일상적인 다목적 활용 및 비전(vision) 작업에서 빛을 발하는 반면, 최신 OpenAI 및 Anthropic 모델들은 코딩(coding) 및 에이전틱(agentic) 작업에서 더 뛰어난 성능을 보여주는 경향이 있습니다.

여전한 최대 약점, 환각 현상 OpenAI의 새로운 모델은 환현상(hallucination) 문제에서 발목을 잡혔습니다. 실제 사실 회상을 보상하고 오답에 벌점을 부여하는 Artificial Analysis의 'AA 전지전능(AA Omniscience)' 벤치마크에서 GPT-5.5는 57%의 정확도로 모든 모델 중 가장 높은 기록을 세웠습니다.

하지만 이 모델의 환각 비율은 86%에 달하며, 이는 Claude Opus 4.7(36%) 및 Gemini 3.1 Pro Preview(50%)와 비교해 현저히 높은 수치입니다. 이번 벤치마크에서 GPT-5.4 대비 14점이나 상승한 점수는 주로 사실 회상 능력 향상에서 비롯되었으며, 환각 개선에 의한 향상은 미미했습니다.

질문에 답할 수 없을 때 답변을 건너뛰거나 불확실성을 인정할 줄 아는 것은 AI 모델에게 매우 바람직한 특성입니다. 그 기준을 놓고 볼 때, GPT-5.5는 진일보했다기보다는 오히려 후퇴한 것처럼 보입니다.

원문 보기

원문 보기 (영어)

GPT-5.5 tops benchmarks but still hallucinates frequently and costs 20 percent more over the API Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 24, 2026 Nano Banana Pro prompted by THE DECODER Key Points OpenAI's GPT-5.5 tops the Artificial Analysis Intelligence Index with 60 points, pulling ahead of competitors like Claude Opus 4.7 and Gemini 3.1 Pro Preview. While the API price has nominally doubled, roughly 40 percent lower token consumption compared to its predecessor GPT-5.4 softens the blow, resulting in a net price increase of about 20 percent. A significant weakness remains the model's 86 percent hallucination rate. Despite achieving the highest accuracy in the fact benchmark, GPT-5.5 frequently fabricates answers rather than acknowledging gaps in its knowledge. Ask about this article… Search GPT-5.5 costs about 20 percent more than GPT-5.4 over the API. The model tops the AI rankings, but it has a hallucination problem. On paper, GPT-5.5's API price has doubled to $5 and $30 per million input and output tokens compared to 5.4. But according to benchmarking service Artificial Analysis, the model uses about 40 percent fewer tokens, bringing the net price hike down to roughly 20 percent. That's still a smaller jump than Anthropic's Opus 4.7 , which lists at the same price as its predecessor but burns through 35 to 40 percent more tokens. GPT-5.5 also puts OpenAI back on top of the AI rankings, leading the Artificial Analysis Intelligence Index by three points. Ad GPT-5.5 tops the Artificial Analysis Intelligence Index with 60 points, three points ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview, which are tied at 57. | Image: Artificial Analysis Ad DEC_D_Incontent-1 Strong price-performance, but benchmarks only tell part of the story At medium compute, GPT-5.5 matches the score Claude Opus 4.7 puts up at maximum for a quarter of the cost: around $1,200 instead of $4,800. Google's Gemini 3.1 Pro Preview hits comparable numbers even cheaper, at around $900. But benchmarks don't tell the whole story: Our tests and developer feedback suggest Gemini mainly shines at everyday versatility across Google products and at vision tasks, while the latest OpenAI and Anthropic models tend to outperform it on coding and agentic work. Hallucinations remain the weak spot OpenAI's new model stumbles on hallucinations. On Artificial Analysis' AA Omniscience benchmark, which rewards factual recall and penalizes wrong answers, GPT-5.5 posts the highest accuracy of any model at 57 percent. But its hallucination rate sits at 86 percent, compared to 36 percent for Claude Opus 4.7 and 50 percent for Gemini 3.1 Pro Preview. The 14-point jump over GPT-5.4 on this benchmark came mostly from better factual recall, with only modest gains on hallucination. Ad Knowing when to pass or admit uncertainty is a trait you want in an AI model. By that measure, GPT-5.5 looks more like a step backward than a step forward. Ad DEC_D_Incontent-2 AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: AA via X

GPT-5.5 벤치마크 API 비용 환각 현상 인공지능 모델

오픈라우터에 GPT-5.5 및 GPT-5.5 Pro 모델 포착

오픈소스 AI 플랫폼인 오픈라우터(OpenRouter)에 오픈AI의 차기 모델로 추정되는 'GPT-5.5'와 'GPT-5.5 Pro'가 목록에 등록된 것이 포착되었습니다. 모델명에 포함된 날짜 태그를 바탕으로 곧 공식 발표가 있을 것이라는 기대가 AI 커뮤니티에서 급격히 확산 중입니다.

오픈AI GPT-5.5 오픈라우터