A taxonomy-free approach based on machine learning to assess the quality of rivers with diatoms

Diatoms are a compulsory biological quality element in the ecological assessment of rivers according to the Water Framework Directive. The application of current official indices requires the identification of individuals to species or lower rank under a microscope based on the valve morphology. Thi...

Full description

Saved in:
Bibliographic Details
Published inThe Science of the total environment Vol. 722; p. 137900
Main Authors Feio, Maria João, Serra, Sónia R.Q., Mortágua, Andreia, Bouchez, Agnès, Rimet, Frédéric, Vasselon, Valentin, Almeida, Salomé F.P.
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 20.06.2020
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Diatoms are a compulsory biological quality element in the ecological assessment of rivers according to the Water Framework Directive. The application of current official indices requires the identification of individuals to species or lower rank under a microscope based on the valve morphology. This is a highly time-consuming task, often susceptible of disagreements among analysts. In alternative, the use of DNA metabarcoding combined with High-Throughput Sequencing (HTS) has been proposed. The sequences obtained from environmental DNA are clustered into Operational Taxonomic Units (OTUs), which can be assigned to a taxon using reference databases, and from there calculate biotic indices. However, there is still a high percentage of unassigned OTUs to species due to the incompleteness of reference libraries. Alternatively, we tested a new taxonomy-free approach based on diatom community samples to assess rivers. A combination of three machine learning techniques is used to build models that predict diatom OTUs expected in test sites, under reference conditions, from environmental data. The Observed/Expected OTUs ratio indicates the deviation from reference condition and is converted into a quality class. This approach was never used with diatoms neither with OTUs data. To evaluate its efficiency, we built a model based on OTUs lists (HYDGEN) and another based on taxa lists from morphological identification (HYDMORPH), and also calculated a biotic index (IPS). The models were trained and tested with data from 81 sites (44 reference sites) from central Portugal. Both models were considered accurate (linear regression for Observed and Expected richness: R2 ≈ 0.7, interception ≈ 0.8) and sensitive to global anthropogenic disturbance (Rs2 > 0.30 p < 0.006 for global disturbance). Yet, the HYDGEN model based on molecular data was sensitive to more types of pressures (such as, changes in land use and habitat quality), which gives promising insights to its use for bioassessment of rivers. [Display omitted] •A combined Machine Learning (ML) approach is tested for the bioassessment of rivers with diatoms•OTUs were predicted (E) from environmental data for each river site•Observed/Expected (OE) OTUs at a site indicate the deviation to reference condition•More types of disturbances were detected with OTUs than with species-based methods•The OTU ML model overcomes the problem of incomplete data-bases to convert OTUs to taxa
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0048-9697
1879-1026
DOI:10.1016/j.scitotenv.2020.137900