Data quality issues in data used in species distribution models: A systematic literature review

Species distribution models (SDM) are important tools for decision-making in several application areas, being essential for managing biodiversity resources in the world. The ability of these models to represent the reality is strongly dependent on the fitness of the data from which they are generate...

Full description

Saved in:
Bibliographic Details
Published inEcological informatics Vol. 91; p. 103378
Main Authors Barbosa, Wesley Lourenco, Alves-Souza, Solange Nice
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.11.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Species distribution models (SDM) are important tools for decision-making in several application areas, being essential for managing biodiversity resources in the world. The ability of these models to represent the reality is strongly dependent on the fitness of the data from which they are generated. Although scientific literature recognizes the occurrence of several data quality (DQ) problems, little work has focused on conducting a comprehensive survey to identify and quantify these challenges. Thus, this paper conducts a systematic review of the literature to examine the DQ problems observed in species occurrence and environmental data applied to the SDM context. It also identifies and discusses solutions that have been proposed to address these problems. A total of 212 articles were selected and analyzed to identify 14 recurring DQ problems. Misidentification errors and spatial or geographical bias were the most prevalent. Data gathered through Citizen Science initiatives continue to be a subject of scrutiny, with observer skill identified as the third most frequent challenge. Resolving data quality issues remains a significant research challenge due to the specific characteristics of the data types involved. Our findings highlight the need for a more detailed examination of the impact of data quality on SDMs and call for the development of robust methodologies for data quality assessment and improvement. The paper emphasizes the importance of context-specific knowledge for the effective management of data quality, which is essential for enhancing the reliability of SDMs and supporting more accurate ecological forecasting and conservation planning. Consequently, a substantial body of research remains to be conducted, particularly at the intersection of computational methodologies and the specialized domain of biogeography. •Review of 212 studies on DQ issues in SDM occurrence and env. data.•Misidentification and spatial bias are top DQ issues in SDM data.•Citizen Science data face DQ concerns; observer skill ranks third.•Solving DQ issues in SDMs needs deeper study and better methods.•Context-aware DQ handling boosts SDM reliability and forecasting.
ISSN:1574-9541
DOI:10.1016/j.ecoinf.2025.103378