Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data—perhaps as a result of changing analyti...

Full description

Saved in:

Bibliographic Details
Published in	Computers & geosciences Vol. 33; no. 5; pp. 696 - 704
Main Authors	Lee, Lopaka, Helsel, Dennis
Format	Journal Article
Language	English
Published	Oxford Elsevier Ltd 01.05.2007 Elsevier Science
Subjects	Censored data Earth sciences Earth, ocean, space Engineering and environment geology. Geothermics Exact sciences and technology Geochemistry Hydrogeology Hydrology. Hydrogeology Kaplan–Meier Mineralogy Pollution, environment geology S-Plus Silicates Survival analysis Water geochemistry Censored data Survival analysis R Kaplan–Meier S-Plus software ground water accuracy pollution hydrochemistry maximum likelihood Kaplan-Meier dissolved materials computers models probability testing data processing concentration arsenic contamination extrapolation interpolation water quality interpretation prediction statistical analysis statistics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data—perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan–Meier (K–M) method. This method has seen widespread usage in the medical sciences within a general framework termed “survival analysis” where it is employed with right-censored time-to-failure data. However, K–M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K–M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K–M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K–M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0098-3004 1873-7803
DOI:	10.1016/j.cageo.2006.09.006