Explore 3D Wonderland in a single image: Wonderland makes high-quality 3D scene generation more efficient

#News ·2025-01-07

The lead author of the paper is from a research team at the University of Toronto, Snap Inc., and UCLA. The first authors are Hanwen Liang, a PhD student at the University of Toronto, and Junli Cao of Snap Inc., who focus on video generation and 3D/4D scene generation and reconstruction to create more realistic, high-quality 3D and 4D scenes. Team members look forward to communicating and cooperating with more like-minded researchers.

The ability to perceive and imagine a three-dimensional world from a single image is a natural part of human cognition. We can intuitively estimate the distance, shape, and guess the geometry of the occluded area. However, entrusting this complex cognitive process to machines is challenging. Recently, a research team from the University of Toronto, Snap Inc., and UCLA introduced a new model, Wonderland, which is capable of generating high-quality, wide-range 3D scenes from a single image, making a breakthrough in the field of single-view 3D scene generation.

图片

  • Address: https://arxiv.org/abs/2412.12091
  • Project home page: https://snap-research.github.io/wonderland/

图片

Technological breakthroughs: Key innovations from single images to three-dimensional worlds

Traditional 3D reconstruction techniques often rely on multi-view data or per-scene optimization, and are prone to distortion when dealing with background and invisible areas. To solve these problems, Wonderland innovatively combines video generation models and large-scale 3D reconstruction models to achieve efficient and high-quality large-scale 3D scene generation:

  1. Embedding 3D awareness into the video diffusion model: By introducing camera pose control into the video diffusion model, Wonderland embeds multi-view information of the scene in the video latent space and ensures 3D consistency. Under the precise control of camera motion trajectory, the video generation model expands a single image into a multi-view video containing rich spatial relationships.
  2. Dual-branch camera control mechanism: Using ControlNet and LoRA modules, Wonderland achieves precise control of rich camera Angle changes during video generation, significantly improving video quality, geometric consistency, and static characteristics of multi-view generation.
  3. Large-scale latent based 3D Reconstruction Model (LaLRM) : Wonderland innovatively introduced the 3D reconstruction model LaLRM, which utilizes latent feed-forward reconstruction of 3D scenes generated by video generation models. The reconstruction model is trained using an efficient step-up training strategy to transform the information in video latent space into 3D Gaussian Splatting (3DGS), which significantly reduces the memory requirement and reconstruction time cost. With this design, LaLRM is able to effectively align generation and reconstruction tasks while building a bridge between image space and 3D space, enabling more efficient and consistent construction of vast 3D scenes.

Effects display - video generation

Accurate viewing Angle control for video generation based on a single image and camera condition:

The Camera-guided video generation model can accurately follow the conditions of the trajectory and generate high-quality 3D-geometry videos with strong generalization, which can follow a variety of complex trajectories and is suitable for various styles of input images.

Some more examples:

Different input pictures, same three camera tracks, generated video:

图片

图片

图片

Given the input picture and multiple camera tracks, the generated video can explore the scene in depth:

图片

图片

图片

图片

Effects display - 3D scene generation

Based on a single image, Wonderland can generate high-quality, expansive 3D scenes with LaLRM:

(The following shows the results from the created 3DGS Rendering)

图片

图片

图片

图片

Based on a single map and multiple camera tracks, Wonderland can deeply explore and generate high-quality, expansive 3D scenes:

Excellent performance: Excellent performance in multiple dimensions such as visual quality and production efficiency

The main characteristics of Wonderland are its precise perspective control, excellent scene generation quality, generation efficiency and wide applicability. The experimental results show that the model outperforms the existing methods on multiple data sets, including view control of video generation, visual quality of video generation, geometric consistency of 3D reconstruction, image quality of rendering, and end-to-end generation speed.

  1. Two-branch camera conditional strategy: By introducing a two-branch camera conditional control strategy, the video diffusion model can generate 3D-geometry consistent multi-view scene capture, and achieve more accurate attitude control than the existing methods.
  2. Zero-shot 3D scene generation: With a single image input, Wonderland performs efficient forward reconstruction of 3D scenes, performing better than existing methods on multiple benchmark datasets, such as RealEstate10K, DL3DV, and tank-and-temples.
  3. Wide coverage scene generation capability: Unlike the past 3D forward reconstruction, which is usually limited to a small viewing Angle range or object-level reconstruction, Wonderland can efficiently generate a wide range of complex scenes. The generated 3D scene not only has a high degree of geometric consistency, but also has a strong generalization, and can deal with out-of-domain scenes.
  4. High efficiency: Under the problem setting of a single image input, Wonderland can generate a complete 3D scene in about 5 minutes with a single A100. This speed is 3.2 times faster than Cat3D, which takes 16 minutes, and 36 times faster than ZeroNVS, which takes 3 hours.

Application Scenarios: New tools for video and 3D scene content creation

Wonderland provides a new solution for the creation of video and 3D scenes. In the fields of architectural design, virtual reality, film and television special effects and game development, the technology has shown broad application potential. Through its precise video pose control and wide-angle, high-definition 3D scene generation capabilities, Wonderland is able to meet the demand for high-quality content in complex scenes and bring more possibilities to creators.

Future outlook

Despite the model's excellent performance, the Wonderland research and development team knows that there are still many directions to improve and explore. For example, further optimizing the adaptation ability of dynamic scenes and improving the restoration degree of real scene details are the focus of future efforts. It is hoped that through continuous improvement and improvement, this research and development idea will not only promote the progress of single-view 3D scene generation technology, but also contribute to the widespread popularity of video generation and 3D technology in practical applications.

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link