ISTI-TALK – LoomNet: Pose-Agnostic and Spatially Consistent Multi-View Synthesis via Latent Space Weaving
-
Day - Time:
13 May 2026, h.12:00
-
Place:
Area della Ricerca CNR di Pisa - Room: C-29
Speakers
Referent
Davide Rucci
Abstract
State-of-the-art 3D generation from a single image typically follows a two-stage paradigm: multi-view synthesis followed by 3D reconstruction. However, generating spatially consistent images across arbitrary, user-defined viewpoints remains a critical bottleneck. In this work, we focus exclusively on maximizing the coherence of this foundational first stage by proposing LoomNet, a unified multi-branch diffusion architecture designed to inherently generate consistent multi-view images from any camera orientation. We structurally modify a diffusion decoder, allowing multiple parallel generation branches to intercommunicate during the denoising process. LoomNet ensures view consistency through a shared latent representation built collaboratively across parallel branches. Crucially, it grants the user full flexibility to decide views from arbitrary angulations, avoiding the rigid constraints of fixed-viewpoint models. Each branch hypothesis is projected into a shared space, merged and refined through a weaving stage to interpolate unobserved regions. LoomNet efficiently produces 16 structurally coherent 256x256 views in just 15 seconds. Experiments show that LoomNet sets a new state-of-the-art in structural and perceptual multi-view fidelity (PSNR, SSIM and LPIPS). Unlike current 3D reconstructors limited to sparse, fixed inputs (eg, exactly 4 orthogonal views), LoomNet generates a flexible prior of 16 arbitrary views. By solving consistency directly at the image level, we provide a robust foundation for future, pose-agnostic 3D pipelines.