Scalable Scientific Interest Profiling Using Large Language Models
Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current. In this study, we design and evaluate two Large Language Models (LLMs)-bas...
Saved in:
Published in | ArXiv.org |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
Cornell University
19.08.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 2331-8422 |
Cover
Summary: | Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current.
In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles-one summarizing researchers' PubMed abstracts and the other generating a summary using their publications' Medical Subject Headings (MeSH) terms-and compare these machine-generated profiles with researchers' self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher's interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles.
The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as "good" or "excellent," with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts.
LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas. |
---|---|
Bibliography: | ObjectType-Working Paper/Pre-Print-3 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2331-8422 2331-8422 |