Assessing the spatial accuracy of geocoding flood-related imagery using Vision Language Models

While the capabilities of large language models and visual language models for various classification tasks have advanced significantly, their potential for location inference remains largely underexplored. Therefore, this study evaluates the performance of four prominent models — BLIP-2, LLaVA1.6,...

Full description

Saved in:

Bibliographic Details
Published in	Spatial information research (Online) Vol. 33; no. 2; p. 15
Main Authors	Schmidt, Sebastian, Díaz Fragachan, Eleonor, Arifi, Dorian, Hanny, David, Resch, Bernd
Format	Journal Article
Language	English
Published	대한공간정보학회 01.04.2025
Subjects	공학일반
Online Access	Get full text

Cover

Loading…

More Information
Summary:	While the capabilities of large language models and visual language models for various classification tasks have advanced significantly, their potential for location inference remains largely underexplored. Therefore, this study evaluates the performance of four prominent models — BLIP-2, LLaVA1.6, OpenFlamingo, and GPT-4o — for geocoding flood-related images from Flickr. Model inferences are compared against the original photo locations and human-labelled assessments. Our findings reveal that GPT-4o achieves the highest spatial accuracy (median deviation of 89.12 km). OpenFlamingo geocodes the highest number of images (90.7%), albeit with fluctuating quality (median 408.35 km), while still outperforming the human annotators. LLaVA1.6 geocodes only 18.9% of all images, while BLIP-2 exhibits the highest median deviation (1,781 km). We observe a spatial bias in our results, with inferences being most accurate in Central Europe. Additionally, model results improve when images feature recognisable landmarks. The proposed workflow could significantly increase the amount of geocoded web-based data available for disaster management, though further research is required to enhance accuracy across diverse geographic contexts.
ISSN:	2366-3286 2366-3294
DOI:	10.1007/s41324-025-00609-0