Urban walkability through different lenses: A comparative study of GPT-4o and human perceptions.
Journal:
PloS one
PMID:
40299853
Abstract
Urban environments significantly shape our well-being, behavior, and overall quality of life. Assessing urban environments, particularly walkability, has traditionally relied on computer vision and machine learning algorithms. However, these approaches often fail to capture the subjective and emotional dimensions of walkability, due to their limited ability to integrate human-centered perceptions and contextual understanding. Recently, large language models (LLMs) have gained traction for their ability to process and analyze unstructured data. With the increasing reliance on LLMs in urban studies, it is essential to critically evaluate their potential to accurately capture human perceptions of walkability and contribute to the design of more pedestrian-friendly environments. Therefore, a critical question arises: can large language models (LLMs), such as GPT-4o, accurately reflect human perceptions of urban environments? This study aims to address this question by comparing GPT-4o's evaluations of visual urban scenes with human perceptions, specifically in the context of urban walkability. The research involved human participants and GPT-4o evaluating street-level images based on key dimensions of walkability, including overall walkability, feasibility, accessibility, safety, comfort, and liveliness. To analyze the data, text mining techniques were employed, examining keyword frequency, coherence scores, and similarity indices between the participants and GPT-4o-generated responses. The findings revealed that GPT-4o and participants aligned in their evaluations of overall walkability, feasibility, accessibility, and safety. In contrast, notable differences emerged in the assessment of comfort and liveliness. Human participants demonstrated broader thematic diversity and addressed a wider range of topics, whereas GPT-4o had more focused and cohesive responses, particularly in relation to comfort and safety. In addition, similarity scores between GPT-4o and the responses of participants indicated a moderate level of alignment between GPT-4o's reasoning and human judgments. The study concludes that human input remains essential for fully capturing human-centered evaluations of walkability. Furthermore, it underscores the importance of refining LLMs to better align with human perceptions in future walkability studies.