Feature analysis for classification of trace fluorescent labeled protein crystallization images

Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract...

Full description

Saved in:
Bibliographic Details
Published inBioData mining Vol. 10; no. 1; p. 14
Main Authors Sigdel, Madhav, Dinc, Imren, Sigdel, Madhu S, Dinc, Semih, Pusey, Marc L, Aygun, Ramazan S
Format Journal Article
LanguageEnglish
Published England BioMed Central 27.04.2017
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Large number of features are extracted from protein crystallization trial images to improve the accuracy of classifiers for predicting the presence of crystals or phases of the crystallization process. The excessive number of features and computationally intensive image processing methods to extract these features make utilization of automated classification tools on stand-alone computing systems inconvenient due to the required time to complete the classification tasks. Combinations of image feature sets, feature reduction and classification techniques for crystallization images benefiting from trace fluorescence labeling are investigated. Features are categorized into intensity, graph, histogram, texture, shape adaptive, and region features (using binarized images generated by Otsu's, green percentile, and morphological thresholding). The effects of normalization, feature reduction with principle components analysis (PCA), and feature selection using random forest classifier are also analyzed. The time required to extract feature categories is computed and an estimated time of extraction is provided for feature category combinations. We have conducted around 8624 experiments (different combinations of feature categories, binarization methods, feature reduction/selection, normalization, and crystal categories). The best experimental results are obtained using combinations of intensity features, region features using Otsu's thresholding, region features using green percentile thresholding, region features using green percentile thresholding, graph features, and histogram features. Using this feature set combination, 96% accuracy (without misclassifying crystals as non-crystals) was achieved for the first level of classification to determine presence of crystals. Since missing a crystal is not desired, our algorithm is adjusted to achieve a high sensitivity rate. In the second level classification, 74.2% accuracy for (5-class) crystal sub-category classification. Best classification rates were achieved using random forest classifier. The feature extraction and classification could be completed in about 2 s per image on a stand-alone computing system, which is suitable for real time analysis. These results enable research groups to select features according to their hardware setups for real-time analysis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1756-0381
1756-0381
DOI:10.1186/s13040-017-0133-9