Discovery and Recommendation Systems for Scientific Data
This thesis describes discovery and recommendation systems for HydroShare, a platform for scientific data sharing. We describe the process for making and improving each design and implementation for the discovery and recommendation systems. We discuss lessons we learned in building these two systems...
Saved in:
Main Author | |
---|---|
Format | Dissertation |
Language | English |
Published |
ProQuest Dissertations & Theses
01.01.2021
|
Subjects | |
Online Access | Get full text |
ISBN | 9798515226589 |
Cover
Summary: | This thesis describes discovery and recommendation systems for HydroShare, a platform for scientific data sharing. We describe the process for making and improving each design and implementation for the discovery and recommendation systems. We discuss lessons we learned in building these two systems. Data discovery refers to the process of locating pre-existing data for use in new research. In the HydroShare collaboration environment for water science, there are many kinds of data that can be discovered, including data from specific sites on the globe, data corresponding to regions on the globe, and data with no geospatial meaning, such as laboratory experiment results. This was a surprisingly difficult problem; default behaviors of software components were unacceptable, use cases suggested conflicting approaches, and crafting a geographic view of a large number of candidate resources was subject to the limits imposed by web browsers, existing software capabilities, human perception, and software performance. The resulting software was a complex melding of user needs, software capabilities, and performance requirements.The recommendation system is intended to enable scientific workflows and encourage data reuse between related projects. We discuss similarities, differences, and challenges for implementing recommendation systems for scientific water data sharing. We discuss and analyze the behaviors that scientists exhibit in using HydroShare as documented by users’ activity logs. Unlike entertainment system users, users on HydroShare have been observed to be task-oriented, where the set of tasks of interest can change over time, and older interests are sometimes no longer relevant. By validating recommendation approaches against user behavior as expressed in activity logs, we conclude that a combination of content-based filtering and a Latent Dirichlet Allocation (LDA) topic modeling of user behavior – rather than and instead of LDA classification of dataset topics – provides a workable solution for HydroShare, and compare this approach to existing recommendation methods. |
---|---|
Bibliography: | SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12 |
ISBN: | 9798515226589 |