End2End Multi-View Feature Matching using Differentiable Pose Optimization

Technical University of Munich

End2End Multi-View Feature Matching connects feature matching and pose optimization in an end-to-end trainable pipeline, where the matching is informed by the pose estimation objective, which increases both pose estimation and matching accuracy.

Abstract

Learning-based approaches have become indispensable for camera pose estimation. However, feature detection, description, matching, and pose optimization are often approached in an isolated fashion. In particular, erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional measures such as outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly: first, we integrate information from multiple views into the matching by spanning a graph attention network across multiple frames to predict their matches all at once. Second, the resulting matches along with their predicted confidences are used for robust pose optimization with a differentiable Gauss-Newton solver. End-to-end training combined with multi-view feature matching boosts the pose estimation metrics compared to SuperGlue by 8.9% on ScanNet and 10.7% on MegaDepth on average. Our approach improves both pose estimation and matching accuracy over state-of-the-art matching networks. Training feature matching across multiple views with gradients from pose optimization naturally learns to disregard outliers, thereby rendering additional outlier handling unnecessary, which is highly desirable for pose estimation systems.

Video

BibTeX

@misc{roessle2022e2efeaturematching,
      title={End2End Multi-View Feature Matching using Differentiable Pose Optimization}, 
      author={Barbara Roessle and Matthias Nie{\ss}ner},
      year={2022}
      eprint={2205.01694},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}