Medical Provider Embeddings for Healthcare Fraud Detection

Advances in data mining and machine learning continue to transform the healthcare industry and provide value to medical professionals and patients. In this study, we address the problem of encoding medical provider types and present four techniques for learning dense, semantic embeddings that captur...

Full description

Saved in:

Bibliographic Details
Published in	SN computer science Vol. 2; no. 4; p. 276
Main Authors	Johnson, Justin M., Khoshgoftaar, Taghi M.
Format	Journal Article
Language	English
Published	Singapore Springer Singapore 01.07.2021 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence for HealthCare Big Data Classification Coding Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data mining Data Structures and Information Theory Datasets Electronic health records Evaluation Fraud Fraud prevention Health care Health care policy Information Systems and Communication Service Machine learning Medical research Medicare fraud Multilayer perceptrons Original Research Patient satisfaction Pattern Recognition and Graphics Semantics Similarity Software Engineering/Programming and Operating Systems Statistical analysis Variables Vision United States > US Healthcare Semantic embeddings Big data Fraud detection Machine learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Advances in data mining and machine learning continue to transform the healthcare industry and provide value to medical professionals and patients. In this study, we address the problem of encoding medical provider types and present four techniques for learning dense, semantic embeddings that capture provider specialty similarities. The first two methods (GloVe and Med-W2V) use pre-trained word embeddings to convert provider specialty descriptions to phrase embeddings. Next, HcpsVec and RxVec embeddings are constructed from publicly available big data using specialty-procedure and specialty-drug occurrence matrices, respectively. We evaluate the learned provider type embeddings on two real-world medicare fraud classification problems using logistic regression (LR), random forest (RF), gradient boosted tree (GBT), and multilayer perceptron (MLP) learners. Through repetition, statistical analysis, and feature importance measures, we confirm that semantic embeddings for provider types significantly improve fraud classification results. Finally, t-SNE visualizations are used to show that the learned provider type embeddings capture meaningful specialty characteristics and provider type similarities. Our primary contributions are two novel methods for encoding medical specialties using procedure-level statistics and the evaluation of four encoding techniques on two large-scale healthcare fraud classification tasks. Since all data sources are publicly available, these encoding techniques can be readily adopted and applied in future machine learning applications in the healthcare industry.
ISSN:	2662-995X 2661-8907
DOI:	10.1007/s42979-021-00656-y