Modelling lake trophic state: A random forest approach

Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predict...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ preprints
Main Authors	Hollister, Jeffrey W, Milstead, W Bryan, Kreakie, Betty J
Format	Journal Article
Language	English
Published	San Diego PeerJ, Inc 03.12.2015
Subjects	Algae Algal blooms Chlorophyll Data processing Ecosystems Lakes Land use Morphometry Nutrients Water quality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Productivity of lentic ecosystems is well studied and it is widely accepted that as nutrient inputs increase, productivity increases and lakes transition from lower trophic state (e.g. oligotrophic) to higher trophic states (e.g. eutrophic). These broad trophic state classifications are good predictors of ecosystem condition, services, and disservices (e.g. recreation, aesthetics, and harmful algal blooms). While the relationship between nutrients and trophic state provides reliable predictions, it requires in situ water quality data in order to parameterize the model. This limits the application of these models to lakes with existing and, more importantly, available water quality data. To address this, we take advantage of the availability of a large national lakes water quality database (i.e. the National Lakes Assessment), land use/land cover data, lake morphometry data, other universally available data, and apply data mining approaches to predict trophic state. Using this data and random forests, we first model chlorophyll a, then classify the resultant predictions into trophic states. The full model estimates chlorophyll a with both in situ and universally available data. The mean squared error and adjusted R2 of this model was 0.09 and 0.8, respectively. The second model uses universally available GIS data only. The mean squared error was 0.22 and the adjusted R2 was 0.48. The accuracy of the trophic state classifications derived from the chlorophyll a predictions were 69% for the full model and 49% for the “GIS only” model. Random forests extend the usefulness of the class predictions by providing prediction probabilities for each lake. This allows us to make trophic state predictions and also indicate the level of uncertainity around those predictions. For the full model, these predicted class probabilites ranged from 0.42 to 1. For the GIS only model, they ranged from 0.33 to 0.96. It is our conclusion that in situ data are required for better predictions, yet GIS and universally available data provide trophic state predictions, with estimated uncertainty, that still have the potential for a broad array of applications. The source code and data for this manuscript are available from https://github.com/USEPA/LakeTrophicModelling.
ISSN:	2167-9843
DOI:	10.7287/peerj.preprints.1319v3