Apple has introduced a groundbreaking image editing model, MGIE (MLLM-Guided Image Editing), in collaboration with the University of California, Santa Barbara. This innovative model enables users to describe desired changes in a photo using plain language, eliminating the need for traditional photo editing software.
Developed with a focus on user-friendly functionality, the MGIE model supports various image editing tasks such as cropping, resizing, flipping, and adding filters—all initiated through text prompts. MGIE, short for MLLM-Guided Image Editing, utilizes multimodal language models to interpret user prompts effectively and generate visual edits based on user instructions. This includes modifications like changing specific object shapes or enhancing brightness levels, showcasing the model's versatility in handling both simple and complex editing tasks.
When using MGIE to edit a photo, users only need to type out their desired changes, providing an intuitive and straightforward approach to image manipulation. The model's capabilities are demonstrated through examples such as transforming a pepperoni pizza image into a healthier version by adding vegetable toppings or enhancing the brightness of a dark photo of tigers in the Sahara by instructing the model to "add more contrast to simulate more light."
The researchers behind MGIE emphasize its departure from brief and ambiguous guidance, highlighting its ability to derive explicit visual-aware intentions, resulting in more precise and reasonable image editing outcomes. Through extensive studies across various editing aspects, the researchers demonstrate the effectiveness of MGIE in improving performance while maintaining competitive efficiency. They also express optimism about the MLLM-guided framework's potential contributions to future vision-and-language research.
Apple has made the MGIE model available for download on GitHub and provided a web demo on Hugging Face Spaces. Despite the release, Apple has not detailed specific plans for the model beyond research, leaving its future applications open-ended.
While Apple may not have been a dominant player in the generative AI space compared to industry giants like Microsoft, Meta, or Google, Apple CEO Tim Cook has expressed the company's commitment to incorporating more AI features into its devices in the coming year. In December, Apple researchers further demonstrated their dedication to advancing AI capabilities by releasing the open-source machine learning framework MLX. This framework is specifically designed to simplify the training of AI models on Apple Silicon chips, showcasing Apple's ongoing efforts to stay at the forefront of AI innovation.