Statistical Tests for Large Tree-Structured Data

We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model fo...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the American Statistical Association Vol. 112; no. 520; pp. 1733 - 1743
Main Authors	Bharath, Karthik, Kambadur, Prabhanjan, Dey, Dipak. K., Rao, Arvind, Baladandayuthapani, Veerabhadran
Format	Journal Article
Language	English
Published	United States Taylor & Francis 01.01.2017 Taylor & Francis Group,LLC Taylor & Francis Ltd
Subjects	Asymptotic properties Brain cancer Brain tumors Conditioned Galton-Watson trees Consistent statistical models Dyck path Goodness of fit Goodness-of-fit tests Heterogeneity Invariants Magnetic resonance imaging Random variables Regression analysis Statistical inference Statistical methods Statistical models Statistical tests Statistics Structured data Theory and Methods Trees Tumors Conditioned Galton-Watson trees Goodness-of-fit tests Dyck path Consistent statistical models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the continuum random tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton-Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as χ 2 and F random variables. We illustrate our methods on an important application of detecting tumor heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients. Supplementary materials for this article are available online.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-1459 1537-274X
DOI:	10.1080/01621459.2016.1240081