DIR - A semantic information resource for healthcare datasets
It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of infor...
Saved in:
Published in | 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 805 - 810 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, there is a lack of information resources that focus on specific learning needs of some targeted audiences. To address this gap, we have been developing a semantic Dataset Information Resource (DIR) framework to specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets with an initial focus on healthcare. The DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses. The framework leverages Semantic Web technologies and the W3C Dataset Description Standard for knowledge integration and representation and includes natural language processing (NLP)-based methods to enable knowledge extraction and question answering. The prototype DIR implementation includes four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. And the DIR currently includes information on three commonly-used large and complex healthcare datasets: HCUP, MarketScan, and MIMIC. Initial usage evaluation based on health informatics students is encouraging. Further development is underway. |
---|---|
DOI: | 10.1109/BIBM.2017.8217758 |