Do You Know Your Neighborhood? Integrating Street View Images and Multi-task Learning for Fine-Grained Multi-Class Neighborhood Wealthiness Perception Prediction

The assessment of urban wealthiness is fundamental to effective urban planning and development. However, conventional methodologies often rely on aggregated datasets, such as census data, with a coarse-grained resolution at the census tract level, impeding accurate evaluation of wealthiness in indiv...

Full description

Saved in:

Bibliographic Details
Published in	Cities Vol. 158; p. 105703
Main Authors	Qiu, Yang, Wu, Meiliu, Huang, Qunying, Kang, Yuhao
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2025
Subjects	Deep learning Human perception Social sensing Transformer Urban perception Urban sensing Wealthiness prediction Deep learning Wealthiness prediction Social sensing Transformer Urban perception Urban sensing Human perception
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The assessment of urban wealthiness is fundamental to effective urban planning and development. However, conventional methodologies often rely on aggregated datasets, such as census data, with a coarse-grained resolution at the census tract level, impeding accurate evaluation of wealthiness in individual neighborhoods and failing to capture spatial heterogeneity. This study proposes a novel approach to predict urban wealthiness at a point-scale spatial resolution by utilizing geo-tagged street view images as input for deep learning models, thereby simulating human perception of urban built environments. Using the Place Pulse 2.0 dataset, which contains over 1.2 million pairwise comparisons of 110,988 street view images from 56 cities worldwide for different urban environment perception factors (e.g., safety and wealthiness), we developed deep learning models based on the Swin Transformer and Multi-gate Mixture-of-Experts (MMOE), a multi-task learning architecture. These models extract and integrate visual features of surrounding elements, including buildings, parks, and vehicles, to classify the wealthiness of specific geo-locations into three categories: Impoverished, Middle, and Affluent. To enhance model training and ground truth data, we modified and enhanced the TrueSkill Rating System, used for scoring neighborhoods via pairwise street view image comparisons, by considering temporal decay and spatial autocorrelation factors. These modifications improved the normality of wealthiness score distribution, reducing the standard deviation from 5.385 to 4.302 and skewness from −0.055 to −0.024. Consequently, model performance improved consistently, with accuracy increases observed in Swin Transformer (63 % to 68 %), ViT (54 % to 58 %), and ResNet50 (51 % to 56 %). In addition, proposed MMOE model demonstrates a significant improvement in the differentiation and classification of wealth categories within a three-class classification system (Impoverished, Middle, Affluent). It achieves an overall accuracy of 82 %, outperforming baseline models, Swin Transformer, ViT, and ResNet50, by 14 %, 24 %, and 26 % respectively. Additionally, we compared our model's predictions with average household income data at the census block group level to elucidate its strengths and limitations. Experimental results demonstrated the efficacy of using geo-tagged street view images for predicting urban wealthiness across diverse geographic and environmental contexts. Our findings also highlight the importance of integrating both quantitative and qualitative evaluations in the prediction of urban environmental factors. By synthesizing human perceptions with advanced deep learning techniques, our approach offers a nuanced understanding of urban wealthiness, providing valuable insights for urban planning and development strategies. •Measuring the urban wealthiness at the most fine-grained level (i.e., the geo-location points of images).•Enhancing the TrueSkill Rating System by considering and incorporating temporal decay and spatial autocorrelation factors.•Leveraging Multi-gate Mixture-of-Experts (MMOE) to improve the differentiation and classification of wealth categories.•Conducting a comprehensive evaluation to demonstrate the performance and generalizability of the proposed model.
ISSN:	0264-2751
DOI:	10.1016/j.cities.2025.105703