Capacitated clustering problem in computational biology: Combinatorial and statistical approach for sibling reconstruction

The capacitated clustering problem (CCP) has been studied in a wide range of applications. In this study, we investigate a challenging CCP in computational biology, namely, sibling reconstruction problem (SRP). The goal of SRP is to establish the sibling relationship (i.e., groups of siblings) of a...

Full description

Saved in:
Bibliographic Details
Published inComputers & operations research Vol. 39; no. 3; pp. 609 - 619
Main Authors Chou, Chun-An, Chaovalitwongse, Wanpracha Art, Berger-Wolf, Tanya Y., DasGupta, Bhaskar, Ashley, Mary V.
Format Journal Article
LanguageEnglish
Published Kidlington Elsevier Ltd 01.03.2012
Elsevier
Pergamon Press Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The capacitated clustering problem (CCP) has been studied in a wide range of applications. In this study, we investigate a challenging CCP in computational biology, namely, sibling reconstruction problem (SRP). The goal of SRP is to establish the sibling relationship (i.e., groups of siblings) of a population from genetic data. The SRP has gained more and more interests from computational biologists over the past decade as it is an important and necessary keystone for studies in genetic and population biology. We propose a large-scale mixed-integer formulation of the CCP for SRP that is based on both combinatorial and statistical genetic concepts. The objective is not only to find the minimum number of sibling groups, but also to maximize the degree of similarity of individuals in the same sibling groups while each sibling group is subject to genetic constraints derived from Mendel's laws. We develop a new randomized greedy optimization algorithm to effectively and efficiently solve this SRP. The algorithm consists of two key phases: construction and enhancement. In the construction phase, a greedy approach with randomized perturbation is applied to construct multiple sibling groups iteratively. In the enhancement phase, a two-stage local search with a memory function is used to improve the solution quality with respect to the similarity measure. We demonstrate the effectiveness of the proposed algorithm using real biological data sets and compare it with state-of-the-art approaches in the literature. We also test it on larger simulated data sets. The experimental results show that the proposed algorithm provide the best reconstruction solutions.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0305-0548
1873-765X
0305-0548
DOI:10.1016/j.cor.2011.04.017