메뉴
BL
The Decoder 39일 전

OpenAI's ChatGPT Images 2.0 thinks before it generates, adding reasoning and web search to image creation

IMP
5/10
핵심 요약

원문 보기
원문 보기 (영어)
OpenAI's ChatGPT Images 2.0 thinks before it generates, adding reasoning and web search to image creation Matthias Bastian View the LinkedIn Profile of Matthias Bastian Apr 21, 2026 Ask about this article… Search OpenAI is adding reasoning and web search to its ChatGPT Images 2.0 image generator. The model can now create up to eight consistent images from a single prompt and handles text in general, and especially in non-Latin scripts, significantly better. Update: OpenAI's new image model is official. ChatGPT Images 2.0 runs on the new GPT Image 2 model and shares the same core capability as Google's Nano Banana Pro : the model "thinks" before it generates, spending more or less time reasoning depending on the selected mode, and can even search the web during that process. Ad According to a blog post from the company , this should lead to greater variety and accuracy in generated images. Extended outputs with thinking are only available to ChatGPT Plus, Pro, and Business users, though. Ad DEC_D_Incontent-1 With thinking mode enabled, ChatGPT Images 2.0 can generate up to eight images at once from a single prompt. Characters, objects, and styles are supposed to stay consistent across all scenes. OpenAI lists page-long mangas generated from a single picture and a text prompt, series of social media graphics, and design plans for different rooms in a house as example use cases. All users get better image quality Regardless of thinking mode, all ChatGPT users get improvements to image quality. OpenAI says the generator now better captures the "characteristic features of photos" and delivers improvements for pixel art, manga, film stills, and other image types. The model is also designed to handle fine-grained elements that previous image models consistently struggled with: small text, iconography, UI elements, dense compositions, and subtle stylistic instructions. Ad Aspect ratio support ranges from 3:1 (ultra-wide) to 1:3 (ultra-tall), covering formats from banners and presentation slides to mobile screens. Resolution goes up to 2K through the API. API pricing is token-based and tied to quality Developers can plug the model into their own products via the API under the name gpt-image-2 . OpenAI charges on a token basis: $8 per million image input tokens and $30 per million image output tokens. Text tokens cost $5 (input) and $10 (output) per million. Cached inputs are cheaper. Ad DEC_D_Incontent-2 In practice, per-image costs vary widely depending on quality and resolution. According to OpenAI's pricing overview , a 1024 x 1024 image at low quality costs just $0.006, at medium quality $0.053, and at high quality $0.211. Larger resolutions like 1024 x 1536 actually come in slightly cheaper at $0.005, $0.041, and $0.165, respectively. Ad Model Quality 1024 x 1024 1024 x 1536 1536 x 1024 GPT Image 2 Low $0.006 $0.005 $0.005 Additional sizes available Medium $0.053 $0.041 $0.041 High $0.211 $0.165 $0.165 GPT Image 1.5 Low $0.009 $0.013 $0.013 Medium $0.034 $0.05 $0.05 High $0.133 $0.2 $0.2 At larger resolutions, GPT Image 2 is cheaper than its predecessors: 1024 x 1536 at high-quality costs $0.165, compared to $0.20 for GPT Image 1.5 and $0.25 for GPT Image 1.5. At the standard 1024 x 1024 resolution in high quality, however, the new model is actually more expensive at $0.211 versus $0.133 for GPT Image 1.5. API outputs above 2K are still in beta and may produce inconsistent results. OpenAI highlights localized advertising, infographics, educational content, design tools, and creative platforms as target use cases. In Codex, image generation will be available directly in the workspace without a separate API key. In our own benchmark prompt, ChatGPT Image 2 does a great job. Both modes - instant and thinking - handle the complex, abstract prompt with strong attention to detail. A hyper-realistic DSLR photo. A monkey holding a pink banana is sitting on a tiger in the foreground. In the background, a HORSE is RIDING AN ASTRONAUT. The astronaut is underneath like a living "spacesuit horse saddle," and the HORSE is clearly on top, in control, as the rider. Make it 100% unambiguous: the HORSE is the rider and the ASTRONAUT is being ridden, NOT the other way around. High-resolution, sharp focus, realistic lighting. The instant mode output has a slightly artificial look to it, while the thinking version nails the DSLR-quality look much better. Original article: OpenAI's new ChatGPT image model is almost here. Codenamed "gpt-image-2," it's already with select ChatGPT testers and appearing on leaderboards. Recent generations—many nearly indistinguishable from real photos—have surfaced on X and Reddit . So far, access appears limited to testers in the US or with US-based accounts. The model is reportedly much stronger at complex images and diagrams with text, including detailed screenshots; a good fit for advertising and educational use cases like infographics, where reliable text rendering matters. The model is also said to fix the telltale "AI look:" the overly smooth skin and perfect lighting that still showed up in GPT-image 1.5 , where Google's Nano Banana Pro held a clear edge. OpenAI will unveil the model tonight in a livestream starting at 12 pm PT. AI News Without the Hype – Curated by Humans Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section. Subscribe now Source: via X | OpenAI
관련 소식