Using large language models to extract plant functional traits from unstructured text
Premise Functional plant ecology seeks to understand how functional traits govern species distributions, community assembly, and ecosystem functions. While global trait datasets have advanced the field, substantial gaps remain, and extracting trait information from text in books, research articles,...
Saved in:
Published in | Applications in plant sciences Vol. 13; no. 3; pp. e70011 - n/a |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
John Wiley & Sons, Inc
01.05.2025
John Wiley and Sons Inc Wiley |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Premise
Functional plant ecology seeks to understand how functional traits govern species distributions, community assembly, and ecosystem functions. While global trait datasets have advanced the field, substantial gaps remain, and extracting trait information from text in books, research articles, and online sources via machine learning offers a valuable complement to costly field campaigns.
Methods
We propose a natural language processing pipeline that extracts traits from unstructured species descriptions by using classification models for categorical traits and question‐answering models for numerical traits. The pipeline's performance is evaluated on two large databases with over 50,000 species descriptions, utilizing approaches ranging from a keyword search to large language models.
Results
Our final optimized pipeline used a transformer architecture and obtained a mean precision of 90.8% (range 81.6–97%) and a mean recall of 88.6% (77.4–97%) across five categorical traits, representing a 9.83% increase in precision and 42.35% increase in recall over a regular expression‐based approach. The question‐answering model yielded a normalized mean absolute error of 10.3% averaged across three numerical traits.
Discussion
The natural language processing pipeline we propose has the potential to facilitate the digitization and extraction of large amounts of plant functional trait information residing in scattered textual descriptions. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 2168-0450 2168-0450 |
DOI: | 10.1002/aps3.70011 |