Transformative Updates: ChatGPT’s Enhanced Image Generation with GPT-4o
OpenAI has made remarkable advancements in the image generation functionality within ChatGPT by updating it to the GPT-4o model, which was revealed last May. This refined AI image generator is now available to all users, although those using the free version will need to navigate certain restrictions. Conversely, the $20/month ChatGPT Plus subscription offers enhanced limits. It’s worth noting that initial image generation for free users was swiftly halted shortly after its launch on March 25 due to overwhelming server demand.
The specific limitations for both free and Plus users currently remain somewhat ambiguous. However, Sam Altman, the CEO of OpenAI, has mentioned previously that the aim is for free users to have access to three images daily.
While generating images through ChatGPT has been possible for some time, it utilized the DALL-E 3 model behind the scenes. With the latest upgrade, image creation will now occur entirely through GPT-4o, promoting a more seamless and cohesive user experience. Users have particularly enjoyed the generator’s capability to emulate the unique art style of Studio Ghibli, though this trend has attracted its share of criticism as well.
From a technical perspective, numerous enhancements have been made. These improvements target typical challenges faced by AI image generation tools, such as accurately rendering text, ensuring character consistency across different images, and creating clear diagrams. OpenAI claims that users can now expect results that are “more precise, accurate, and photorealistic” based on their prompts.
Creating More Realistic and Accurate Images

Credit: DailyHackly via ChatGPT
AI-generated images often display a distinct artificial quality, but such characteristics are expected to be less noticeable in creations from GPT-4o. A demo image presented by OpenAI depicts a woman writing on a whiteboard, complete with an authentic-looking reflection, showcasing significant realism, though it’s worth mentioning that it was the best result out of eight attempts.
The fidelity of AI artwork to user prompts has also seen enhancements, ensuring that specific objects and people are placed as directed. A particularly impressive example highlighted a four-panel comic created by ChatGPT, notable for its lack of obvious errors or inconsistencies.
<pIn trials where requests were made to transform an Austen novel into comic format or generate a photorealistic depiction of a grand home with a garden, the results were striking, albeit not absolutely perfect. These images marked a clear improvement over prior iterations, though the rendering process tends to take several minutes, as opposed to mere seconds.
Significant Enhancements in Text and Diagrams

Credit: DailyHackly via ChatGPT
For a long time, achieving accurate text and diagram representation with AI tools was a major hurdle. This is because these systems are fundamentally more adept at creating and remixing pre-existing images rather than reproducing specific text or simple graphics like shapes and connectors.
The latest GPT-4o model addresses these challenges, generating text and diagrams with improved clarity and precision, thus significantly minimizing errors. OpenAI’s showcase featured a menu, an invitation, a boarding pass, and a diagram illustrating Newton’s prism experiment, all crafted from a single text input.
In a practical test where an infographic explaining DNA was requested, together with a book cover design including a specific title and author, the results closely aligned with the request—yielding basic but accurate visuals, and the cover resembled an actual product available in stores. Importantly, no strange errors or artifacts were present in the generated images.
Enhancements in Consistency and Editing Features

Credit: DailyHackly via ChatGPT
Previous discussions have highlighted the limitations in ChatGPT’s image editing capabilities, but notable advancements have now been achieved. It is now easier to maintain consistency among characters and scenes across different images, make selective alterations to specific elements of a picture, and layer multiple components seamlessly. Features also extend to the creation of transparent backgrounds and the option to specify colors using hex codes.
Innovations have also been made in the way ChatGPT can adapt and remix users’ images while integrating various elements drawn from both the internet and its training data. For instance, one demo result effectively visualized the prompt “create an infographic explaining why San Francisco is so foggy.”
Experiments have shown that ChatGPT has improved significantly in editing and stylizing images. However, some inconsistencies remain, especially with intricate characters and objects. Although there has been a noticeable enhancement in this area, there’s still a tendency for over-editing, which might detract from the tool’s utility in generating cohesive image sequences.
Copyright and Safety Concerns

Credit: OpenAI
As with all generative AI advancements, concerns regarding copyright, the potential for misuse, and the ecological costs of operation arise. OpenAI has publicly acknowledged that creating these tools often requires training on copyrighted materials. Recently, they have begun forming content agreements with sources like Shutterstock. As stated by Brad Lightcap, OpenAI’s COO, in an interview with the Wall Street Journal, the GPT-4o image generator is designed to refuse any requests aimed at imitating the creations of living artists.
Regarding safety features, every generated image is embedded with C2PA metadata to mark it as AI-created; however, this metadata can be easily erased through methods like screenshots. The AI is also programmed to reject attempts to produce harmful content, including “child sexual abuse materials and sexual deepfakes,” along with other requests that breach content policies.
This represents a notable advancement in AI image generation. The upgraded system regularly produces astonishing results, with many previously obvious signs of AI manipulation and common errors fading away. Nonetheless, it prompts serious contemplation regarding a future wherein forged images are easily manufactured, and creative output may increasingly rely on AI rather than human creativity. This raises questions about how generative AI will continue to secure its training data in such an evolving landscape.