Scalable Scientific Interest Profiling Using Large Language Models

Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current. In this study, we design and evaluate two Large Language Models (LLMs)-bas...

Full description

Saved in:

Bibliographic Details
Published in	ArXiv.org
Main Authors	Liang, Yilun, Zhang, Gongbo, Sun, Edward, Idnay, Betina, Fang, Yilu, Chen, Fangyi, Ta, Casey, Peng, Yifan, Weng, Chunhua
Format	Journal Article
Language	English
Published	United States Cornell University 19.08.2025
Subjects	Researcher Profiling Natural Language Generation Kullback-Leibler Divergence Large Language Models
Online Access	Get full text
ISSN	2331-8422 2331-8422

Cover

More Information
Summary:	Research profiles highlight scientists' research focus, enabling talent discovery and fostering collaborations, but they are often outdated. Automated, scalable methods are urgently needed to keep these profiles current. In this study, we design and evaluate two Large Language Models (LLMs)-based methods to generate scientific interest profiles-one summarizing researchers' PubMed abstracts and the other generating a summary using their publications' Medical Subject Headings (MeSH) terms-and compare these machine-generated profiles with researchers' self-summarized interests. We collected the titles, MeSH terms, and abstracts of PubMed publications for 595 faculty members affiliated with Columbia University Irving Medical Center (CUIMC), for 167 of whom we obtained human-written online research profiles. Subsequently, GPT-4o-mini, a state-of-the-art LLM, was prompted to summarize each researcher's interests. Both manual and automated evaluations were conducted to characterize the similarities and differences between the machine-generated and self-written research profiles. The similarity study showed low ROUGE-L, BLEU, and METEOR scores, reflecting little overlap between terminologies used in machine-generated and self-written profiles. BERTScore analysis revealed moderate semantic similarity between machine-generated and reference summaries (F1: 0.542 for MeSH-based, 0.555 for abstract-based), despite low lexical overlap. In validation, paraphrased summaries achieved a higher F1 of 0.851. A further comparison between the original and paraphrased manually written summaries indicates the limitations of such metrics. Kullback-Leibler (KL) Divergence of term frequency-inverse document frequency (TF-IDF) values (8.56 and 8.58 for profiles derived from MeSH terms and abstracts, respectively) suggests that machine-generated summaries employ different keywords than human-written summaries. Manual reviews further showed that 77.78% rated the overall impression of MeSH-based profiling as "good" or "excellent," with readability receiving favorable ratings in 93.44% of cases, though granularity and factual accuracy varied. Overall, panel reviews favored 67.86% of machine-generated profiles derived from MeSH terms over those derived from abstracts. LLMs promise to automate scientific interest profiling at scale. Profiles derived from MeSH terms have better readability than profiles derived from abstracts. Overall, machine-generated summaries differ from human-written ones in their choice of concepts, with the latter initiating more novel ideas.
Bibliography:	ObjectType-Working Paper/Pre-Print-3 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2331-8422 2331-8422