Planetary-scale image geolocalization, the process of identifying the geographic location of an image, represents a significant challenge in computer vision due to the enormous diversity and complexity of global images. Traditional methods that primarily focus on landmark images have had difficulty generalizing to unfamiliar locations.
The game “Geoguessr,” which has 65 million players, highlights this challenge by challenging players to identify the location of street view images anywhere in the world. Research paper titled “PIGEON: PREDICTING IMAGE GEOLOCATIONS”We describe in detail how to solve this problem in ” Researchers at Standord University have developed two innovative models, PIGEON and PIGEOTTO, which represent a significant advance in image geolocation technology.
PIGEON (Predicting Image Geolocations) is a model trained on planetary-scale street view data that predicts geographic locations by inputting four image panoramas. Remarkably, PIGEON is able to place more than 40% of its predictions within a 25 km radius of the exact location globally, a remarkable achievement in the field. The model competes against Geoguessr’s top human players, ranking in the top 0.01%. It has proven its excellence by ranking and consistently outperforming them.
In contrast, PIGEOTTO does not rely on street view data, but is trained on a more diverse dataset of over 4 million photos from Flickr and Wikipedia. The model uses a single image input and achieves state-of-the-art results on a variety of image geolocation benchmarks, significantly reducing median distance error and demonstrating robustness to location and image distribution shifts.
The technical backbone of these systems includes sophisticated methodologies such as semantic geocell generation, multi-task contrastive dictionary learning, novel loss functions, and downstream guessing improvements. These methods contribute to minimizing distance errors and increasing the accuracy of location information prediction.
The learning process of these models is complex. While PIGEON is trained on a specially designed dataset utilizing 100,000 randomly sampled locations from Geoguessr, PIGEOTTO’s training dataset is much larger and more diverse. The evaluation of these models uses a metric system that focuses on intermediate distance errors and varying kilometer-based distance accuracies from street level to continental level.
The advances these models bring are important, but they also raise important ethical considerations. The precision and functionality of these technologies can have both beneficial applications and potential for misuse. This duality requires a careful balance in the development and deployment of image geolocation technologies.
In conclusion, PIGEON and PIGEOTTO represent a major leap forward in image geolocation technology, achieving state-of-the-art results while adapting to changes in deployment. Their developments highlight the importance of a variety of technological innovations and point to a potential future in which image geolocation technologies are indeed planetary scale or focused on narrowly defined distributions.
Image source: Shutterstock