Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs

The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 25; no. 1; pp. 169 - 16
Main Authors	Paolo, Domenico, Greco, Carlo, Cortellini, Alessio, Ramella, Sara, Soda, Paolo, Bria, Alessandro, Sicilia, Rosa
Format	Journal Article
Language	English
Published	London BioMed Central 18.04.2025 BioMed Central Ltd BMC
Subjects	Attention mechanism Automation Carcinoma, Non-Small-Cell Lung - mortality Cell survival Decision making Deep learning Effectiveness Electronic health records Electronic Health Records - statistics & numerical data Electronic medical records Electronic records Embedding Feature extraction Female Health aspects Health Informatics Humans Information processing Information Systems and Communication Service Lung cancer Lung cancer, Non-small cell Lung Neoplasms - mortality Male Management of Computing and Information Systems Medical history Medical prognosis Medical records Medicine Medicine & Public Health Methods Natural Language Processing Natural language processing in medical informatics NER Non-small cell lung carcinoma Patient outcomes Patients Prediction models Prognosis Representations Small cell lung carcinoma Statistical analysis Statistical models Subject specialists Survival Survival Analysis Transformer Unstructured data Unstructured EHRs Italy Attention mechanism Survival analysis NER Unstructured EHRs Transformer Lung cancer
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-025-02998-6

Cover

Loading…

More Information
Summary:	The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-025-02998-6