A methodological framework for identifying potential sources of soil heavy metal pollution based on machine learning: A case study in the Yangtze Delta, China

It is a great challenge to identify the many and varied sources of soil heavy metal pollution. Often little information is available regarding the anthropogenic factors and enterprises that could potentially pollute soils. In this study we use freely available geographical data from a search engine...

Full description

Saved in:
Bibliographic Details
Published inEnvironmental pollution (1987) Vol. 250; pp. 601 - 609
Main Authors Jia, Xiaolin, Hu, Bifeng, Marchant, Ben P., Zhou, Lianqing, Shi, Zhou, Zhu, Youwei
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 01.07.2019
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It is a great challenge to identify the many and varied sources of soil heavy metal pollution. Often little information is available regarding the anthropogenic factors and enterprises that could potentially pollute soils. In this study we use freely available geographical data from a search engine in conjunction with machine learning methodologies to identify and classify potentially polluting enterprises in the Yangtze Delta, China. The data were classified into 31 separate and four integrated industry types by five different machine learning approaches. Multinomial naive Bayesian (NB) methods achieved an accuracy of 87% and Kappa coefficient of 0.82 and were used to classify the geographic data from more than 260,000 enterprises. The relationship between the different industry classes and measurements of soil cadmium (Cd) and mercury (Hg) concentrations was explored using bivariate local Moran's I analysis. The analysis revealed areas where different industry classes had led to soil pollution. In the case of Cd, elevated concentrations also occurred in some areas because of excessive fertilization and coal mining. This study provides a new approach to investigate the interaction between anthropogenic pollution and natural sources of soil heavy metals to inform pollution control and planning decisions regarding the location of industrial sites. [Display omitted] •Potentially polluting enterprises are identified and classified.•Spatial correlation of Cd and Hg with polluting enterprises is observed.•Industry pollution affects the distribution of Cd and Hg pollution in soils.•High contents of Cd occur in some areas due to fertilization and coal mining.•Hg pollution caused by chemical industry is more serious than other industries. Capsule: This work provides a new way based on machine learning methods using geographic dataset for pollution source identification and risk mitigation.
ISSN:0269-7491
1873-6424
DOI:10.1016/j.envpol.2019.04.047