InfiniCube: A high-fidelity, highly controllable, large-scale dynamic 3D driving scenario generation method from Nvidia

#News ·2025-01-03

This article is reproduced by the authorization of the 3D Vision Heart public number, please contact the source.

图片

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Introduction: https://research.nvidia.com/labs/toronto-ai/infinicube/

Paper: https://arxiv.org/abs/2412.03934v1

InfiniCube is a new 3D generation method led by Nvidia for generating dynamic 3D driving scenarios that are unbounded and controllable.

图片

InfiniCube uses the latest 3D representation and video modeling technologies to generate dynamic scenes on a large scale by combining high-definition maps, vehicle bounding boxes and text descriptions.

图片

This method not only generates 3D structures with high fidelity and a consistent appearance, but also maintains consistency in geometry and appearance, which is particularly important for simulation training and testing of autonomous vehicles.

图片

The key feature of InfiniCube is its ability to build a 3D representation of the world based on semantic voxels and use this as a guide for video generation models.

This innovation enables InfiniCube to generate dynamic 3D driving scenes at scale, rich in detail and consistent with the physical world. In addition, InfiniCube proposes a fast feedforward method for converting dynamic video and voxel worlds into dynamic 3D Gaussian scenes, while retaining the ability to control the dynamic vehicle. Technical interpretation

The idea behind InfiniCube technology is to use advanced 3D representations and video models, combined with high-definition maps, vehicle bounding boxes and text descriptions, to generate dynamic 3D driving scenes that are unbounded and controllable.

图片

The technology generates a large-scale semantic voxel world by building a sparse voxel 3D generation model based on high-definition map conditions, then utilizes a video model and a series of pixel-aligned guide buffers to synthesize a consistent appearance, and finally transforms the video and voxel world into a dynamic 3D Gaussian scene through a fast feedforward approach, enabling precise control of a dynamic vehicle.

图片

InfiniCube's specific process consists of three main stages:

  • Firstly, in the unbounded voxel world generation stage, the corresponding 3D voxel world and semantic label are generated by HD map and vehicle boundary box as input.
  • Secondly, in the world-guided Video generation stage, based on the Stable Video Diffusion model, long video generation is assisted by the geometry and camera trajectory conditions provided by the generated voxel world.
  • Finally, in the generation stage of the dynamic 3DGS scene, the two-branch reconstruction method is used to combine the syntaxin and pixel information to generate the dynamic 3D Gaussian scene.

Its technical features mainly include:

  • Capable of handling large-scale scenes, supporting about 100,000 square meters of 3D dynamic scene generation;
  • High fidelity and controllability for flexible control of scene layout, appearance and vehicle behavior;
  • Consistency, maintaining the consistency of geometry and appearance in the generated sequence;
  • The fast feedforward method improves the efficiency of scene reconstruction.

InfiniCube technology provides a highly realistic and controllable virtual environment for the training and testing of autonomous vehicles, which is particularly important for simulating complex traffic scenarios and adversarial scenarios, and is expected to achieve wider applications in the field of autonomous driving. In addition, it also has a wide range of applications in the fields of mixed reality and robotics. Paper interpretation

The paper introduces a system called InfiniCube, which is a method for generating dynamic 3D driving scenarios that are unbounded and controllable. The following is a summary of the main points of the paper:

Abstract

  • InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability, is proposed.
  • The method uses high-definition maps, vehicle bounding boxes and text descriptions to achieve flexible control.
  • By combining the latest advances in 3D representation and video models, large-scale dynamic scene generation is achieved.

introduction

  • Generating 3D scenes that can be simulated and controlled is critical for areas such as mixed reality, robotics, and the training and testing of autonomous vehicles.
  • InfiniCube is designed to meet key needs: fidelity and consistency, large-scale scene generation, and controllability.

Related work

  • The research progress in 3D generation, controllable video generation and driving scene reconstruction are reviewed.

Preparatory knowledge

  • The Latent diffusion model (LDM) and sparse voxel LDM are introduced, which are the basis of InfiniCube method.

method

  • InfiniCube's goal is to generate large-scale, dynamic 3D scenes through the input of high-definition maps, vehicle bounding boxes, and text prompts.
  • 4.1 Unbounded voxel world generation: Semantic voxel world generation based on HD map and vehicle boundary box.
  • 4.2 World-guided video generation: Use a video model to generate a look consistent with the voxel world.
  • 4.3 Dynamic 3DGS scene generation: Synthesize voxels and videos into dynamic 3D Gauss scenes.

experiment

  • 5.1 Data processing: The Waymo Open Dataset was used for training, extracting ground reality scenes and supervising semantic voxel generation.
  • 5.2 Implementation Details: The network architecture and training details of each stage are described in detail.
  • 5.3 Large-scale Dynamic Scenario Generation: The scenario generated by the complete pipeline is shown and the importance of each component is analyzed.
  • 5.4 Analysis of major components: The effectiveness of the HD map condition design was validated through ablation studies and compared with baseline methods.
  • 5.5 Applications: InfiniCube supports applications such as new perspective synthesis, crash simulation, and showcases advanced applications such as vehicle insertion and weather control.

discuss

  • The limitations of InfiniCube are discussed, including the limitations of geometric diversity and the complexity of the pipeline.
  • The contributions of InfiniCube are summarized and future research directions are suggested, including scaling up training data and accelerating the generation process.

conclusion

  • InfiniCube is able to generate realistic 3D scenes with rich appearance details and full controllability by combining a genin world generation model, a world-guided video model, and a dynamic 3DGS generation model.

TAGS:

  • 13004184443

  • Room 607, 6th Floor, Building 9, Hongjing Xinhuiyuan, Qingpu District, Shanghai

  • gcfai@dongfangyuzhe.com

  • wechat

  • WeChat official account

Quantum (Shanghai) Artificial Intelligence Technology Co., Ltd. ICP:沪ICP备2025113240号-1

friend link