The Decoder • 94일 전

오픈AI "GPT-5.5, 구형 프롬프트가 발목 잡는다"

IMP

8/10

핵심 요약

오픈AI가 새롭게 공개한 GPT-5.5 프롬프트 가이드에서는 기존 모델용으로 작성된 복잡한 프롬프트를 그대로 재사용하지 말 것을 권장합니다. 대신 최소한의 결과 중심적 지시어로 처음부터 새록 작성해야 모델의 성능을 극대화할 수 있다고 설명했습니다. 이는 최신 모델의 추론 능력이 향상되어, 과도한 과정 통보나 제약 조건이 오히려 모델의 탐색 공간을 제한하고 성능을 저하시킬 수 있기 때문입니다.

번역된 본문

오픈AI는 새로운 GPT-5.5 가이드에서 사용자들이 기존의 오래된 프롬프트를 재사용하지 말고, 최소한의 결과 중심적인 지시어로 새롭게 시작할 것을 권장했습니다. 구형 모델에서 그대로 가져온 지나치게 상세한 과정 지정은 새 모델의 성능을 오히려 제한할 수 있는데, GPT-5.5는 복잡한 지시 사항이 적을수록 더 효율적으로 작동하기 때문입니다. 복잡한 사용 사례의 경우, 오픈AI는 명확한 역할 정의로 시작하는 7단계 스키마를 제안하며, 사용자들이 기존 방식을 수정하는 대신 프롬프트를 처음부터 다시 작성할 것을 권장합니다.

오픈AI는 GPT-5.5용 프롬프팅 가이드를 출시했는데, 가장 핵심적인 내용은 바로 '기존 프롬프트를 재사용하지 말라'는 것입니다. 최소한의 결과 중심적인 지시어로 새롭게 시작해야 합니다. 그리고 한때 구식으로 치부되었던 '역할 정의(Role definitions)'가 오픈AI의 프롬프트 구조 최상단으로 다시 돌아왔습니다.

새로운 프롬프팅 가이드에서 오픈AI는 개발자들에게 GPT-5.5를 GPT-5.2나 GPT-5.4 같은 이전 모델의 단순한 대체품으로 취급하지 말라고 당부합니다. 마이그레이션(이전)은 작동하는 가장 작은 프롬프트에서부터 처음부터 시작해야 합니다. 그런 다음에야 개발자는 대표적인 예제를 사용하여 추론 노력(Reasoning effort), 범위, 도구 설명 및 출력 형식을 세밀하게 조정해야 합니다. 오픈AI에 따르면 GPT-5.5는 이전 모델들보다 더 효율적으로 추론하므로, 높은 설정에 도달하기 전에 '낮음(Low)' 및 '중간(Medium)' 노력 수준을 먼저 테스트해야 합니다. 짧고 결과 중심적인 프롬프트가 과정을 과도하게 명시한 프롬프트 더미보다 더 나은 성능을 발휘하는 경향이 있습니다.

구형 프롬프트는 모델의 발목을 잡을 수 있습니다

이 가이드는 이전 프롬프트 스택의 모든 지시어를 그대로 가져오는 것에 대해 명시적으로 경고합니다. 레거시(기존) 프롬프트는 이전 모델에 더 많은 개입이 필요했기 때문에 종종 과정을 과도하게 세세하게 지정했다고 오픈AI는 말합니다. GPT-5.5에서는 그러한 추가적인 세부 사항이 노이즈를 만들고, 모델의 탐색 공간을 좁히거나, 기계적으로 들리는 답변을 생성하게 됩니다.

대신 프롬프트는 최종 목표 결과, 성공 기준, 제약 조건 및 사용 가능한 컨텍스트를 명확히 설명한 다음, 모델이 그곳에 도달하는 방법을 스스로 알아내도록 해야 합니다. 가이드의 긍정적인 예시는 목표만을 정의하는 고객 서비스 프롬프트입니다: '고객의 문제를 종단간(end to end)으로 해결하라.'

성공의 기준은 다음과 같습니다: 사용 가능한 정책 및 계정 데이터를 바탕으로 자격 결정이 내려져야 합니다. 허용된 조치는 응답하기 전에 완료되어야 합니다. 최종 답변은 completed_actions, customer_message 및 blockers를 포함해야 합니다. 증거가 누락된 경우, 누락된 가장 작은 필드를 요청해야 합니다.

반면 부정적인 예시는 모든 단계를 마이크로매니징(미세 관리)합니다: 먼저 A를 검사하고, 그다음 B를 검사하며, 모든 필드를 비교하고, 가능한 모든 예외를 생각해 본 뒤, 어떤 도구를 호출할지 결정하고, 도구를 호출한 다음, 전체 과정을 사용자에게 설명하는 식입니다.

"항상(ALWAYS)"이나 "절대로(NEVER)"와 같은 단어를 사용하는 절대적인 규칙은 보안 규칙이나 필수 출력 필드와 같은 진정한 불변 항목에만 사용해야 합니다. 재량이 필요한 판단의 경우, 오픈AI는 대신 '의사결정 규칙(Decision rules)'을 권장합니다.

명시적인 중지 조건은 모델이 불필요한 도구 루프를 반복하지 않도록 막아줍니다: '가장 적은 유용한 도구 루프 안에서 사용자 쿼리를 해결하되, 루프 최소화가 정확성, 접근 가능한 대체 증거, 계산 또는 사실적 주장에 대한 필수 인용 태그보다 우선하지 않게 하십시오. 각 결과가 나온 후 스스로에게 묻십시오. "이제 유용한 증거와 사실적 주장에 대한 인용문과 함께 사용자의 핵심 요청에 답할 수 있습니까?" 그렇다면, 답변하십시오.'

역할 정의가 다시 최상단으로 돌아왔습니다

프롬프팅 커뮤니티에서는 최신 모델에서도 역할 정의가 여전히 의미 있는 역할을 하는지에 대해 논쟁을 벌여왔습니다. 일부는 이를 불필요하거나 심지어 역효과가 나는 것으로 보았습니다. GPT-5.5 가이드는 이에 반박합니다. 권장되는 프롬프트 구조는 역할 정의와 컨텍스트로 시작합니다:

역할(Role): [모델의 기능, 컨텍스트 및 역할을 정의하는 1~2문장]

성격(Personality): [어조, 태도 및 협업 스타일]

목표(Goal): [사용자에게 보이는 결과]

성공 기준(Success criteria): [중단하기 전에 참이어야 하는 조건]

원문 보기

원문 보기 (영어)

OpenAI says old prompts are holding GPT-5.5 back and developers need a fresh baseline Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 26, 2026 OpenAI Key Points OpenAI recommends in a new guide for GPT-5.5 that users should not reuse old prompts but instead start fresh with minimal, result-oriented instructions. Overly detailed process specifications carried over from older models can actually limit the new model's performance, as GPT-5.5 operates more efficiently with less prescriptive guidance. For complex use cases, OpenAI suggests a seven-part schema that begins with a clear role definition, encouraging users to rebuild their prompts from scratch rather than iterating on legacy approaches. Ask about this article… Search OpenAI has released a prompting guide for GPT-5.5 with one major takeaway: don't reuse your old prompts. Start fresh with minimal, result-focused instructions. And role definitions, once dismissed as outdated, are back at the top of OpenAI's prompt structure. In its new prompting guide , OpenAI tells developers not to treat GPT-5.5 as a drop-in replacement for earlier models like GPT-5.2 or GPT-5.4. Migration should start from scratch with the smallest prompt that still gets the job done. Only then should developers tune reasoning effort, scope, tool descriptions, and output format using representative examples. OpenAI says GPT-5.5 reasons more efficiently than its predecessors, so you should test the "low" and "medium" effort levels first before reaching for higher settings. Short, outcome-driven prompts tend to outperform process-heavy prompt stacks. Ad Old prompts can hold the model back The guide warns explicitly against carrying over every instruction from older prompt stacks. Legacy prompts often overspecify the process because earlier models needed more hand-holding, OpenAI says. With GPT-5.5, that extra detail creates noise, narrows the model's search space, or produces mechanical-sounding answers. Ad DEC_D_Incontent-1 Instead, the prompt should spell out the target outcome, success criteria, constraints, and available context, then let the model figure out how to get there. The guide's positive example is a customer service prompt that defines only the goal: Resolve the customer's issue end to end. Ad Success means: the eligibility decision is made from the available policy and account data Ad DEC_D_Incontent-2 any allowed action is completed before responding Ad the final answer includes completed_actions, customer_message, and blockers if evidence is missing, ask for the smallest missing field The negative example micromanages every step: First inspect A, then inspect B, then compare every field, then think through all possible exceptions, then decide which tool to call, then call the tool, then explain the entire process to the user. Absolute rules using words like "ALWAYS" or "NEVER" should be reserved for real invariants such as security rules or required output fields. For judgment calls, OpenAI recommends decision rules instead. Explicit stop conditions keep the model from cycling through unnecessary tool loops: Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims. After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer. Role definitions are back at the top The prompting community has been arguing over whether role definitions still do anything meaningful in newer models. Some had written them off as unnecessary or even counterproductive . The GPT-5.5 guide pushes back: the recommended prompt structure opens with a role definition and context. Role: [1-2 sentences defining the model's function, context, and job] # Personality [tone, demeanor, and collaboration style] # Goal [user-visible outcome] # Success criteria [what must be true before the final answer] # Constraints [policy, safety, business, evidence, and side-effect limits] # Output [sections, length, and tone] # Stop rules [when to retry, fallback, abstain, ask, or stop] For customer-facing assistants, support workflows, or coaching tools, the guide recommends splitting two distinct dimensions within this schema: personality and collaboration style. Personality covers how the assistant sounds: tone, warmth, formality, or humor. Collaboration style covers how it works, when to ask questions, when to make assumptions, and how to handle uncertainty. OpenAI offers two contrasting examples. First, a factual, task-focused personality block: You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness. Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow. And a more expressive, collaborative style: Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context. Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints. Each section should stay short. Details should only be added where they actually shift behavior, OpenAI says, and the prompt structure should be treated as a starting point, not a rigid template. Setting retrieval budgets and citation rules in the prompt For fact-based answers, citation behavior belongs in the prompt itself. Developers should spell out which claims need evidence, what counts as sufficient evidence, and how the model should respond when evidence is missing. A lack of evidence shouldn't automatically turn into a factual "no." The guide describes retrieval budgets that act as stop rules for searches: For ordinary Q&A, start with one broad search using short, discriminative keywords. If the top results contain enough citable support for the core request, answer from those results instead of searching again. Make another retrieval call only when: The top results do not answer the core question. A required fact, parameter, owner, date, ID, or source is missing. The user asked for exhaustive coverage, a comparison, or a comprehensive list. A specific document, URL, email, meeting, record, or code artifact must be read. The answer would otherwise contain an important unsupported factual claim. Do not search again to improve phrasing, add examples, cite nonessential details, or support wording that can safely be made more generic. For drafting tasks like presentations, summaries, or marketing copy, OpenAI recommends drawing a clear line in the prompt between claims that need sources and parts that can be written more freely: Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims. Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger. If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics. Preambles to cut perceived latency in streaming In streaming apps, every second before the first visible response counts. GPT-5.5 can spend noticeable time on reasoning, planning, or tool calls before any text appears. For longer or tool-heavy tasks, the guide recommends a short "preamble:" a visible update that confirms t

GPT-5.5 프롬프트 엔지니어링 오픈AI AI 개발 모델 마이그레이션

오픈AI, 전용 코딩 모델 '코덱스'를 GPT-5.5로 통합

오픈AI가 전용 코딩 모델인 '코덱스(Codex)' 라인을 폐지하고, 해당 기능을 메인 모델인 GPT-5.5에 통합했습니다. 이에 따라 GPT-5.3이 사실상 마지막 독립형 코딩 모델이 되며, GPT-5.5는 에이전트 코딩 및 범용 성능이 향상되었지만 API 사용료는 약 20% 인상되었습니다.

오픈AI 코덱스 GPT-5.5

The Decoder • 94일 전

IMP 8

GPT-5.5, 벤치마크 1위이지만 환각 여전... API 비용은 20% 상승

OpenAI의 최신 모델 GPT-5.5가 다시 한번 종합 AI 성능 평가 1위를 차지했지만, 여전히 높은 수준의 환각(Hallucination) 현상을 보이는 것으로 나타났습니다. 놀랍게도 모델의 추론 능력이 향상되었음에도 불구하고, 말도 안 되는 질문을 사실처럼 포장하거나 잘못된 정보를 확신하는 경향이 이전 버전과 비슷하거나 오히려 더 악화된 부분도 존재합니다. API 호출 시 사용하는 토큰(TOKEN) 소모량은 줄었으나 단가 인상으로 인해 결과적으로 순비용은 약 20% 상승하여 실무자들은 도입 시 비용 대비 성능과 모델의 신뢰도를 신중하게 따져야 합니다.

GPT-5.5 AI 환각 API 비용