DeepMind unveils Genie 2: AI model creating immersive 3D worlds from text prompts – how it works | Mint
Source: Live Mint
DeepMind has introduced Genie 2, an innovative artificial intelligence model capable of generating playable and immersive 3D worlds. Building on its predecessor, Genie, which could transform single images into interactive environments, Genie 2 takes this concept further by crafting dynamic and realistic virtual worlds from text prompts or images.
In a recent blog post, Google’s DeepMind described Genie 2 as a large-scale foundation world model designed to create intricate 3D simulations. A simple prompt like “a warrior in snow” can result in an expansive interactive world where users explore a snowy environment as a warrior character. The generated settings even include physics-based interactions such as jumping, swimming, and object manipulation, all while maintaining realistic lighting effects.
Genie 2’s advanced capabilities stem from its training on a vast dataset of videos, enabling it to generate coherent and visually rich environments. According to DeepMind, the AI can create consistent worlds with varying perspectives — including first-person and isometric views — that last up to a minute, with most spanning 10 to 20 seconds.
The model operates through an auto-regressive process, crafting videos frame by frame based on prior frames and user actions. When given a text or image prompt, Genie 2 works with Imagen3, another generative model, to produce a corresponding visual representation. Users can then navigate and interact with the virtual environment via keyboard inputs.
One standout feature is Genie 2’s action control capabilities. It interprets user commands intelligently, ensuring that pressing directional keys moves a robot character rather than unrelated objects like clouds or trees. Its long-term memory allows it to recall and render previously unseen parts of the world when they reappear, enhancing the continuity and realism of the experience.
While Genie 2 has significant implications for gaming, DeepMind positions it as a creative and research tool. The model’s ability to transform concept art or drawings into interactive environments opens new possibilities for digital art, design, and simulation.
DeepMind also highlights Genie 2’s potential for creating entirely novel video games where characters and worlds could be dynamically generated in real time.