ID-Pose

Abstract

Given sparse views of a 3D object, estimating their camera poses is a long-standing and intractable problem. Toward this goal, we consider harnessing the pre-trained diffusion model of novel views conditioned on viewpoints (Zero-1-to-3). We present ID-Pose which inverses the denoising diffusion process to estimate the relative pose given two input images. ID-Pose adds a noise to one image, and predicts the noise conditioned on the other image and a hypothesis of the relative pose. The prediction error is used as the minimization objective to find the optimal pose with the gradient descent method. We extend ID-Pose to handle more than two images and estimate each pose with multiple image pairs from triangular relations. ID-Pose requires no training and generalizes to open-world images. We conduct extensive experiments using casually captured photos and rendered images with random viewpoints. The results demonstrate that ID-Pose significantly outperforms state-of-the-art methods.

👉 Open Interactive Examples.

BibTeX

@article{cheng2023id, 
   title={ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models}, 
   author={Cheng, Weihao and Cao, Yan-Pei and Shan, Ying}, 
   journal={arXiv preprint arXiv:2306.17140}, 
   year={2023}
}