Large language models and the future of soil health: Bridging knowledge gaps through scalable semantic intelligence
Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argu...
Saved in:
Published in | Soil Advances Vol. 4; p. 100065 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.12.2025
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argues that large language models (LLMs), exemplified by tools like GPT-4 and domain-specific models such as GeoGalactica, offer transformative potential for soil health science. We highlight emerging applications—including automated indicator extraction, synthesis of management practices, policy analysis, and knowledge democratization—that leverage LLMs’ semantic capabilities to bridge disciplinary silos and scale qualitative insight generation. These applications are synthesized in a conceptual framework that demonstrates how LLMs integrate textual data for soil health assessment. While acknowledging limitations such as hallucinations and lack of numerical reasoning, we present a conceptual framework to guide responsible integration of LLMs into soil health research workflows. We conclude that embracing LLMs not only enhances scientific synthesis but also aligns with urgent calls for more inclusive, anticipatory, and systems-based approaches in soil and ecological governance. |
---|---|
AbstractList | Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argues that large language models (LLMs), exemplified by tools like GPT-4 and domain-specific models such as GeoGalactica, offer transformative potential for soil health science. We highlight emerging applications—including automated indicator extraction, synthesis of management practices, policy analysis, and knowledge democratization—that leverage LLMs’ semantic capabilities to bridge disciplinary silos and scale qualitative insight generation. These applications are synthesized in a conceptual framework that demonstrates how LLMs integrate textual data for soil health assessment. While acknowledging limitations such as hallucinations and lack of numerical reasoning, we present a conceptual framework to guide responsible integration of LLMs into soil health research workflows. We conclude that embracing LLMs not only enhances scientific synthesis but also aligns with urgent calls for more inclusive, anticipatory, and systems-based approaches in soil and ecological governance. |
ArticleNumber | 100065 |
Author | Wu, Yu |
Author_xml | – sequence: 1 givenname: Yu orcidid: 0009-0001-1565-8222 surname: Wu fullname: Wu, Yu email: wuyu20@mails.ucas.ac.cn organization: Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China |
BookMark | eNp9kcFu2zAMhoWiBZq1eYMe9ALJJNqO7R0KdEXXFgjQy3YWaIlylClSITkb-vZV5mHYqScSBP-PP_h_YuchBmLsRoq1FHLzeb_O0Xk0axDQlJEQm-aMLaBvxAq6fnP-X3_Jljnvy0pVgRQgFyxvMY3EPYbxiKU5REM-cwyGTzvi9jgdE_Fo-ekI3xH6afeFf03OjC6M_GeIvz2ZIhzxNRdJisdxx7NGj4MnnumAYXKauzCR926koOmaXVj0mZZ_6xX78e3h-_3Tavvy-Hx_t11pkNCsrO76VlrdghwMDtC3WhtoLQwCZXFPddNDRb2Evu80ddiIoa2obo3uDNZDdcWeZ66JuFevyR0wvamITv0ZxDQqTMWcJ6Wx4GvZyhqbAqiQ5AAWNtARycrqwqpnlk4x50T2H08KdcpB7dWcgzrloOYciux2lpWn0i9HSWXtTj8wLpGeihH3MeAdcHSV1Q |
Cites_doi | 10.1071/SR23138 10.1111/ejss.70093 10.18653/v1/D19-1371 10.1145/3616855.3635772 10.1038/s41598-025-96216-y 10.1145/3442188.3445922 10.1038/s43247-023-01199-1 10.1038/s43247-024-01341-7 10.1038/s41598-024-53916-1 10.1038/s43017-020-0080-8 |
ContentType | Journal Article |
Copyright | 2025 |
Copyright_xml | – notice: 2025 |
DBID | 6I. AAFTH AAYXX CITATION DOA |
DOI | 10.1016/j.soilad.2025.100065 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2950-2896 |
ExternalDocumentID | oai_doaj_org_article_ca29741714a547d3ae1b2f2628ee13fc 10_1016_j_soilad_2025_100065 S2950289625000338 |
GroupedDBID | 0R~ 6I. AAFTH AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AFPUW AIGII AITUG AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS APXCP FDB GROUPED_DOAJ M41 M~E ROL AAYXX CITATION |
ID | FETCH-LOGICAL-c2125-fc8971fc721bdab297ccd27f2b0a1102e45923e912998ce8a50b73e47dc8da4b3 |
IEDL.DBID | DOA |
ISSN | 2950-2896 |
IngestDate | Wed Aug 27 01:29:32 EDT 2025 Wed Jul 16 16:48:29 EDT 2025 Sat Aug 09 17:32:34 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | AI in ecology Policy synthesis Large language models Semantic analysis Soil health Knowledge integration Environmental informatics GPT-4 |
Language | English |
License | This is an open access article under the CC BY-NC license. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c2125-fc8971fc721bdab297ccd27f2b0a1102e45923e912998ce8a50b73e47dc8da4b3 |
ORCID | 0009-0001-1565-8222 |
OpenAccessLink | https://doaj.org/article/ca29741714a547d3ae1b2f2628ee13fc |
ParticipantIDs | doaj_primary_oai_doaj_org_article_ca29741714a547d3ae1b2f2628ee13fc crossref_primary_10_1016_j_soilad_2025_100065 elsevier_sciencedirect_doi_10_1016_j_soilad_2025_100065 |
PublicationCentury | 2000 |
PublicationDate | December 2025 2025-12-00 2025-12-01 |
PublicationDateYYYYMMDD | 2025-12-01 |
PublicationDate_xml | – month: 12 year: 2025 text: December 2025 |
PublicationDecade | 2020 |
PublicationTitle | Soil Advances |
PublicationYear | 2025 |
Publisher | Elsevier B.V Elsevier |
Publisher_xml | – name: Elsevier B.V – name: Elsevier |
References | Deng, C., Zhang, T., He, Z., Chen, Q., Shi, Y., Xu, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., 2024. K2: a foundation language model for geoscience knowledge understanding and utilization. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining. Xu, Fan, Tao, Jiang, You, Houlton, Sun, Gomes, Luo (bib13) 2025 Lin, Chen, Wang, Liu, Piao (bib8) 2024; 5 Beltagy, I., Lo, K., Cohan, A., 2019. SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676. Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R., 2022. Galactica: a large language model for science. arXiv preprint arXiv:2211.09085. Lehmann, Bossio, Kögel-Knabner, Rillig (bib6) 2020; 1 Lin, Z., Deng, C., Zhou, L., Zhang, T., Xu, Y., Xu, Y., He, Z., Shi, Y., Dai, B., Song, Y., 2023. Geogalactica: a scientific large language model in geoscience. arXiv preprint arXiv:2401.00434. Ibrahim, Senthilkumar, Saito (bib4) 2024; 14 Novielli, Magarelli, Romano, Di Bitonto, Stellacci, Monaco, Amoroso, Bellotti, Tangaro (bib11) 2025; 15 Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S., 2021. On the dangers of stochastic parrots: can language models be too big?? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency Minasny, McBratney (bib9) 2025; 76 Ng, Evangelista, Padarian, Pachon, O’Donoghue, Xue, Francos, McBratney (bib10) 2024; 62 Koldunov, Jung (bib5) 2024; 5 Zeng, Brown, Raymond, Byari, Hotz, Rounsevell (bib14) 2024; 2024 10.1016/j.soilad.2025.100065_bib1 10.1016/j.soilad.2025.100065_bib2 Koldunov (10.1016/j.soilad.2025.100065_bib5) 2024; 5 Novielli (10.1016/j.soilad.2025.100065_bib11) 2025; 15 Minasny (10.1016/j.soilad.2025.100065_bib9) 2025; 76 Zeng (10.1016/j.soilad.2025.100065_bib14) 2024; 2024 Ibrahim (10.1016/j.soilad.2025.100065_bib4) 2024; 14 Xu (10.1016/j.soilad.2025.100065_bib13) 2025 10.1016/j.soilad.2025.100065_bib7 Ng (10.1016/j.soilad.2025.100065_bib10) 2024; 62 10.1016/j.soilad.2025.100065_bib3 10.1016/j.soilad.2025.100065_bib12 Lin (10.1016/j.soilad.2025.100065_bib8) 2024; 5 Lehmann (10.1016/j.soilad.2025.100065_bib6) 2020; 1 |
References_xml | – reference: Lin, Z., Deng, C., Zhou, L., Zhang, T., Xu, Y., Xu, Y., He, Z., Shi, Y., Dai, B., Song, Y., 2023. Geogalactica: a scientific large language model in geoscience. arXiv preprint arXiv:2401.00434. – volume: 1 start-page: 544 year: 2020 end-page: 553 ident: bib6 article-title: The concept and future prospects of soil health publication-title: Nat. Rev. Earth Environ. – reference: Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R., 2022. Galactica: a large language model for science. arXiv preprint arXiv:2211.09085. – reference: Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S., 2021. On the dangers of stochastic parrots: can language models be too big?? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, – volume: 14 start-page: 3407 year: 2024 ident: bib4 article-title: Evaluating responses by ChatGPT to farmers’ questions on irrigated lowland rice cultivation in Nigeria publication-title: Sci. Rep. – reference: Beltagy, I., Lo, K., Cohan, A., 2019. SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676. – volume: 5 start-page: 13 year: 2024 ident: bib5 article-title: Local climate services for all, courtesy of large language models publication-title: Commun. Earth Environ. – year: 2025 ident: bib13 article-title: Biogeochemistry-informed neural network (BINN) for improving accuracy of model prediction and scientific understanding of soil organic carbon publication-title: arXiv Prepr. – volume: 2024 start-page: 1 year: 2024 end-page: 35 ident: bib14 article-title: Exploring the opportunities and challenges of using large language models to represent institutional agency in land system modelling publication-title: EGUsphere – volume: 15 year: 2025 ident: bib11 article-title: Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation publication-title: Sci. Rep. – volume: 76 year: 2025 ident: bib9 article-title: Machine learning and artificial intelligence applications in soil science publication-title: Eur. J. Soil Sci. – reference: Deng, C., Zhang, T., He, Z., Chen, Q., Shi, Y., Xu, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., 2024. K2: a foundation language model for geoscience knowledge understanding and utilization. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining. – volume: 62 year: 2024 ident: bib10 article-title: Estimating surrogates, utility graphs and indicator sets for soil capacity and security assessments using legacy data publication-title: Soil Res. – volume: 5 start-page: 168 year: 2024 ident: bib8 article-title: Large language models reveal big disparities in current wildfire research publication-title: Commun. Earth Environ. – ident: 10.1016/j.soilad.2025.100065_bib12 – year: 2025 ident: 10.1016/j.soilad.2025.100065_bib13 article-title: Biogeochemistry-informed neural network (BINN) for improving accuracy of model prediction and scientific understanding of soil organic carbon publication-title: arXiv Prepr. – volume: 62 issue: 2 year: 2024 ident: 10.1016/j.soilad.2025.100065_bib10 article-title: Estimating surrogates, utility graphs and indicator sets for soil capacity and security assessments using legacy data publication-title: Soil Res. doi: 10.1071/SR23138 – volume: 76 issue: 2 year: 2025 ident: 10.1016/j.soilad.2025.100065_bib9 article-title: Machine learning and artificial intelligence applications in soil science publication-title: Eur. J. Soil Sci. doi: 10.1111/ejss.70093 – ident: 10.1016/j.soilad.2025.100065_bib1 doi: 10.18653/v1/D19-1371 – ident: 10.1016/j.soilad.2025.100065_bib3 doi: 10.1145/3616855.3635772 – volume: 15 issue: 1 year: 2025 ident: 10.1016/j.soilad.2025.100065_bib11 article-title: Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation publication-title: Sci. Rep. doi: 10.1038/s41598-025-96216-y – ident: 10.1016/j.soilad.2025.100065_bib2 doi: 10.1145/3442188.3445922 – volume: 2024 start-page: 1 year: 2024 ident: 10.1016/j.soilad.2025.100065_bib14 article-title: Exploring the opportunities and challenges of using large language models to represent institutional agency in land system modelling publication-title: EGUsphere – volume: 5 start-page: 13 issue: 1 year: 2024 ident: 10.1016/j.soilad.2025.100065_bib5 article-title: Local climate services for all, courtesy of large language models publication-title: Commun. Earth Environ. doi: 10.1038/s43247-023-01199-1 – volume: 5 start-page: 168 issue: 1 year: 2024 ident: 10.1016/j.soilad.2025.100065_bib8 article-title: Large language models reveal big disparities in current wildfire research publication-title: Commun. Earth Environ. doi: 10.1038/s43247-024-01341-7 – volume: 14 start-page: 3407 issue: 1 year: 2024 ident: 10.1016/j.soilad.2025.100065_bib4 article-title: Evaluating responses by ChatGPT to farmers’ questions on irrigated lowland rice cultivation in Nigeria publication-title: Sci. Rep. doi: 10.1038/s41598-024-53916-1 – volume: 1 start-page: 544 issue: 10 year: 2020 ident: 10.1016/j.soilad.2025.100065_bib6 article-title: The concept and future prospects of soil health publication-title: Nat. Rev. Earth Environ. doi: 10.1038/s43017-020-0080-8 – ident: 10.1016/j.soilad.2025.100065_bib7 |
SSID | ssj0003321021 |
Score | 2.3105242 |
Snippet | Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the... |
SourceID | doaj crossref elsevier |
SourceType | Open Website Index Database Publisher |
StartPage | 100065 |
SubjectTerms | AI in ecology Environmental informatics GPT-4 Knowledge integration Large language models Policy synthesis Semantic analysis Soil health |
Title | Large language models and the future of soil health: Bridging knowledge gaps through scalable semantic intelligence |
URI | https://dx.doi.org/10.1016/j.soilad.2025.100065 https://doaj.org/article/ca29741714a547d3ae1b2f2628ee13fc |
Volume | 4 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQJxYEAkT50g2sEYkTxy4bRa0qRJmo1C2yHRsVQVqRsvLbuYuTKhMsLBmi6C56Z_mdrbt3jN3kRmUmFi7yJrFRJnMRmcTwyKhSjow1XhnqHZ4_57NF9rgUy96oL6oJC_LAAbhbqzmmvDSmW4tMlql2aMrznCvnktRb2n2R83qHKdqDU2pN4UnXK9cUdNXr1bsmcVAuqDYgJj7pcVEj2d-jpB7NTA_ZQZsfwn34ryO256pjVj9RvTZ0d4vQjK-pQVclYP4GQRcE1h7IOYTWxjsYUy8WMhPs7s3gVW9qaEfzQI3hocYpqN0H4ruysOrpc56wxXTy8jCL2mkJkUX6EZG3aiQTb_FIZ0ptEDVrSy49N7FGjucuE5jMuRES_EhZp7SIjUwdYmpVqTOTnrJBta7cGQPpVZYqpb1TeSas11IglVqVCCvjMlVDFnW4FZsgilF01WJvRcC5IJyLgPOQjQnc3bckad28wEAXbaCLvwI9ZLILTdFmB4H10dTqV_fn_-H-gu2TyVDJcskG288vd4X5yNZcN0sPn_PvyQ9gZ-GT |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+language+models+and+the+future+of+soil+health%3A+Bridging+knowledge+gaps+through+scalable+semantic+intelligence&rft.jtitle=Soil+Advances&rft.au=Wu%2C+Yu&rft.date=2025-12-01&rft.pub=Elsevier+B.V&rft.issn=2950-2896&rft.eissn=2950-2896&rft.volume=4&rft_id=info:doi/10.1016%2Fj.soilad.2025.100065&rft.externalDocID=S2950289625000338 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2950-2896&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2950-2896&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2950-2896&client=summon |