LoFTR-DKM-Network-with-TAA-for-Image-Matching
1. Background of the problem to be solved
For the majority of us, our best camera is built into our phone. We may take a photograph of a landmark, such as Rome's Trevi Fountain, and share it with friends. That photo is only two-dimensional and includes the perspective of our shooting location. Of course, many people have photographed the fountain. We might be able to create a more complete, three-dimensional view if we work together. What if machine learning could aid in better capturing the richness of the world by utilizing the vast amounts of unstructured image collections that are freely available on the internet?
Structure-from-Motion (SfM) is a method for reconstructing 3D objects and buildings from images. Typically, these images are captured under controlled conditions by skilled operators, ensuring homogeneous, high-quality data. Building 3D models from disparate images is much more difficult, given the wide range of viewpoints, lighting and weather conditions, occlusions from people and vehicles, and even user-applied filters.
The first step is to determine which parts of two images capture the same physical points of a scene, such as window corners. This is typically accomplished through the use of local features (key locations in an image that can be reliably identified across different views). Local features include short description vectors that capture the appearance in the vicinity of the point of interest. By comparing these descriptors, it is possible to establish likely correspondences between the pixel coordinates of image locations in two or more images. This "image registration" allows for the recovery of the point's 3D location via triangulation.
Our task is to develop a machine learning algorithm for registering two images from different perspectives. To train and test our model, we have access to a dataset of thousands of images. Figure below shows the sample of image matching and the trainset sample.
2. Testset Sample
3. Kornia tool
We choose to use Kornia [Link] to develop our mode, this is a tool powered by PyTorch and good at dealing with image matching problem.
4. Process of solving this problem
As for estimator, we use OpenCV estimator MAGSAC++ [Link].
5. BackBone
5.1 LoFTR Achitecture
The figure below shows the achitecture of LoFTR [Paper Link]
5.2 DKM Achitecture
The figure below shows the achitecture of Deep Kernelized Dense Geometric Matching (DKM) [Paper Link]
6. Test Time Augmentation (TTA)
Adopt size transformation and flip and Test Time Augmentation (TTA) on the inference stage of LoFTR.