Zero-to-Hero

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

¹Technion, ²UC Berkeley, ³NVIDIA

Abstract

Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still struggle with inconsistencies and implausibility in new views generation, especially for challenging changes in viewpoint. In this work, we propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps during the denoising process of Zero-1-to-3. By drawing an analogy between the denoising process and stochastic gradient descent (SGD), we implement a filtering mechanism that aggregates attention maps, enhancing generation reliability and authenticity. This process improves geometric consistency without requiring retraining or significant computational resources. Additionally, we modify the self-attention mechanism to integrate information from the source view, reducing shape distortions. These processes are further supported by a specialized sampling schedule. Experimental results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.

@article{sobol2024zero2hero, author={Ido Sobol and Chenfeng Xu and Or Litany}, title = {Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering}, journal = {NeurIPS}, year = {2024}, }

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

We introduce Zero-to-Hero, a test-time filtering method for attention maps that significantly enhances image-conditioned diffusion models.

Our method, focused on single-image novel view synthesis, not only outperforms strong baselines but also proves to be highly applicable to other tasks, including multi-view generation, and pose- and segmentation-conditioned text-to-image synthesis.

Abstract

Attention Maps Need Denoising Too!

Introducing: Attention Maps Filtering

Enforcing Appearance Consistency Between Source and Target Views

Zero-to-Hero is a Universal Generation Enhancement Tool 🔨

BibTeX