Mucopolysaccharidosis type II detection by Naïve Bayes Classifier: An example of patient classification for a rare disease using electronic medical records from the Canadian Primary Care Sentinel Surveillance Network

Identifying patients with rare diseases associated with common symptoms is challenging. Hunter syndrome, or Mucopolysaccharidosis type II is a progressive rare disease caused by a deficiency in the activity of the lysosomal enzyme, iduronate 2-sulphatase. It is inherited in an X-linked manner result...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 13; no. 12; p. e0209018
Main Authors Ehsani-Moghaddam, Behrouz, Queenan, John A, MacKenzie, Jennifer, Birtwhistle, Richard V
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 19.12.2018
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identifying patients with rare diseases associated with common symptoms is challenging. Hunter syndrome, or Mucopolysaccharidosis type II is a progressive rare disease caused by a deficiency in the activity of the lysosomal enzyme, iduronate 2-sulphatase. It is inherited in an X-linked manner resulting in males being significantly affected. Expression in females varies with the majority being unaffected although symptoms may emerge over time. We developed a Naïve Bayes classification (NBC) algorithm utilizing the clinical diagnosis and symptoms of patients contained within their de-identified and unstructured electronic medical records (EMR) extracted by the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). To do so, we created a training dataset using published results in the scientific literature and from all MPS II symptoms and applied the training dataset and its independent features to compute the conditional posterior probabilities of having MPS II disease as a categorical dependent variable for 506497 male patients. The classifier identified 125 patients with the highest likelihood for having the disease and 18 features were selected to be necessary for forecasting. Next, a Recursive Backward Feature Elimination algorithm was employed, for optimal input features of the NBC model, using a k-fold Cross-Validation with 3 replicates. The accuracy of the final model was estimated by the Validation Set Approach technique and the bootstrap resampling. We also investigated that whether the NBC is as accurate as three other Bayesian networks. The Naïve Bayes Classifier appears to be an efficient algorithm in assisting physicians with the diagnosis of Hunter syndrome allowing optimal patient management.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing Interests: This project was entirely funded by a grant from Shire Canada (https://www.shirecanada.com). Dr. MacKenzie has received grant funding, honoraria and travel support from Shire Canada. There are no patents, products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0209018