Large language models and the future of soil health: Bridging knowledge gaps through scalable semantic intelligence

Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argu...

Full description

Saved in:
Bibliographic Details
Published inSoil Advances Vol. 4; p. 100065
Main Author Wu, Yu
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2025
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argues that large language models (LLMs), exemplified by tools like GPT-4 and domain-specific models such as GeoGalactica, offer transformative potential for soil health science. We highlight emerging applications—including automated indicator extraction, synthesis of management practices, policy analysis, and knowledge democratization—that leverage LLMs’ semantic capabilities to bridge disciplinary silos and scale qualitative insight generation. These applications are synthesized in a conceptual framework that demonstrates how LLMs integrate textual data for soil health assessment. While acknowledging limitations such as hallucinations and lack of numerical reasoning, we present a conceptual framework to guide responsible integration of LLMs into soil health research workflows. We conclude that embracing LLMs not only enhances scientific synthesis but also aligns with urgent calls for more inclusive, anticipatory, and systems-based approaches in soil and ecological governance.
AbstractList Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the operationalization of this concept remains hindered by fragmented knowledge systems and unstructured textual data. This perspective article argues that large language models (LLMs), exemplified by tools like GPT-4 and domain-specific models such as GeoGalactica, offer transformative potential for soil health science. We highlight emerging applications—including automated indicator extraction, synthesis of management practices, policy analysis, and knowledge democratization—that leverage LLMs’ semantic capabilities to bridge disciplinary silos and scale qualitative insight generation. These applications are synthesized in a conceptual framework that demonstrates how LLMs integrate textual data for soil health assessment. While acknowledging limitations such as hallucinations and lack of numerical reasoning, we present a conceptual framework to guide responsible integration of LLMs into soil health research workflows. We conclude that embracing LLMs not only enhances scientific synthesis but also aligns with urgent calls for more inclusive, anticipatory, and systems-based approaches in soil and ecological governance.
ArticleNumber 100065
Author Wu, Yu
Author_xml – sequence: 1
  givenname: Yu
  orcidid: 0009-0001-1565-8222
  surname: Wu
  fullname: Wu, Yu
  email: wuyu20@mails.ucas.ac.cn
  organization: Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
BookMark eNp9kcFu2zAMhoWiBZq1eYMe9ALJJNqO7R0KdEXXFgjQy3YWaIlylClSITkb-vZV5mHYqScSBP-PP_h_YuchBmLsRoq1FHLzeb_O0Xk0axDQlJEQm-aMLaBvxAq6fnP-X3_Jljnvy0pVgRQgFyxvMY3EPYbxiKU5REM-cwyGTzvi9jgdE_Fo-ekI3xH6afeFf03OjC6M_GeIvz2ZIhzxNRdJisdxx7NGj4MnnumAYXKauzCR926koOmaXVj0mZZ_6xX78e3h-_3Tavvy-Hx_t11pkNCsrO76VlrdghwMDtC3WhtoLQwCZXFPddNDRb2Evu80ddiIoa2obo3uDNZDdcWeZ66JuFevyR0wvamITv0ZxDQqTMWcJ6Wx4GvZyhqbAqiQ5AAWNtARycrqwqpnlk4x50T2H08KdcpB7dWcgzrloOYciux2lpWn0i9HSWXtTj8wLpGeihH3MeAdcHSV1Q
Cites_doi 10.1071/SR23138
10.1111/ejss.70093
10.18653/v1/D19-1371
10.1145/3616855.3635772
10.1038/s41598-025-96216-y
10.1145/3442188.3445922
10.1038/s43247-023-01199-1
10.1038/s43247-024-01341-7
10.1038/s41598-024-53916-1
10.1038/s43017-020-0080-8
ContentType Journal Article
Copyright 2025
Copyright_xml – notice: 2025
DBID 6I.
AAFTH
AAYXX
CITATION
DOA
DOI 10.1016/j.soilad.2025.100065
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2950-2896
ExternalDocumentID oai_doaj_org_article_ca29741714a547d3ae1b2f2628ee13fc
10_1016_j_soilad_2025_100065
S2950289625000338
GroupedDBID 0R~
6I.
AAFTH
AALRI
AAXUO
AAYWO
ACVFH
ADCNI
ADVLN
AEUPX
AFPUW
AIGII
AITUG
AKBMS
AKYEP
ALMA_UNASSIGNED_HOLDINGS
APXCP
FDB
GROUPED_DOAJ
M41
M~E
ROL
AAYXX
CITATION
ID FETCH-LOGICAL-c2125-fc8971fc721bdab297ccd27f2b0a1102e45923e912998ce8a50b73e47dc8da4b3
IEDL.DBID DOA
ISSN 2950-2896
IngestDate Wed Aug 27 01:29:32 EDT 2025
Wed Jul 16 16:48:29 EDT 2025
Sat Aug 09 17:32:34 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords AI in ecology
Policy synthesis
Large language models
Semantic analysis
Soil health
Knowledge integration
Environmental informatics
GPT-4
Language English
License This is an open access article under the CC BY-NC license.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2125-fc8971fc721bdab297ccd27f2b0a1102e45923e912998ce8a50b73e47dc8da4b3
ORCID 0009-0001-1565-8222
OpenAccessLink https://doaj.org/article/ca29741714a547d3ae1b2f2628ee13fc
ParticipantIDs doaj_primary_oai_doaj_org_article_ca29741714a547d3ae1b2f2628ee13fc
crossref_primary_10_1016_j_soilad_2025_100065
elsevier_sciencedirect_doi_10_1016_j_soilad_2025_100065
PublicationCentury 2000
PublicationDate December 2025
2025-12-00
2025-12-01
PublicationDateYYYYMMDD 2025-12-01
PublicationDate_xml – month: 12
  year: 2025
  text: December 2025
PublicationDecade 2020
PublicationTitle Soil Advances
PublicationYear 2025
Publisher Elsevier B.V
Elsevier
Publisher_xml – name: Elsevier B.V
– name: Elsevier
References Deng, C., Zhang, T., He, Z., Chen, Q., Shi, Y., Xu, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., 2024. K2: a foundation language model for geoscience knowledge understanding and utilization. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining.
Xu, Fan, Tao, Jiang, You, Houlton, Sun, Gomes, Luo (bib13) 2025
Lin, Chen, Wang, Liu, Piao (bib8) 2024; 5
Beltagy, I., Lo, K., Cohan, A., 2019. SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R., 2022. Galactica: a large language model for science. arXiv preprint arXiv:2211.09085.
Lehmann, Bossio, Kögel-Knabner, Rillig (bib6) 2020; 1
Lin, Z., Deng, C., Zhou, L., Zhang, T., Xu, Y., Xu, Y., He, Z., Shi, Y., Dai, B., Song, Y., 2023. Geogalactica: a scientific large language model in geoscience. arXiv preprint arXiv:2401.00434.
Ibrahim, Senthilkumar, Saito (bib4) 2024; 14
Novielli, Magarelli, Romano, Di Bitonto, Stellacci, Monaco, Amoroso, Bellotti, Tangaro (bib11) 2025; 15
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S., 2021. On the dangers of stochastic parrots: can language models be too big?? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
Minasny, McBratney (bib9) 2025; 76
Ng, Evangelista, Padarian, Pachon, O’Donoghue, Xue, Francos, McBratney (bib10) 2024; 62
Koldunov, Jung (bib5) 2024; 5
Zeng, Brown, Raymond, Byari, Hotz, Rounsevell (bib14) 2024; 2024
10.1016/j.soilad.2025.100065_bib1
10.1016/j.soilad.2025.100065_bib2
Koldunov (10.1016/j.soilad.2025.100065_bib5) 2024; 5
Novielli (10.1016/j.soilad.2025.100065_bib11) 2025; 15
Minasny (10.1016/j.soilad.2025.100065_bib9) 2025; 76
Zeng (10.1016/j.soilad.2025.100065_bib14) 2024; 2024
Ibrahim (10.1016/j.soilad.2025.100065_bib4) 2024; 14
Xu (10.1016/j.soilad.2025.100065_bib13) 2025
10.1016/j.soilad.2025.100065_bib7
Ng (10.1016/j.soilad.2025.100065_bib10) 2024; 62
10.1016/j.soilad.2025.100065_bib3
10.1016/j.soilad.2025.100065_bib12
Lin (10.1016/j.soilad.2025.100065_bib8) 2024; 5
Lehmann (10.1016/j.soilad.2025.100065_bib6) 2020; 1
References_xml – reference: Lin, Z., Deng, C., Zhou, L., Zhang, T., Xu, Y., Xu, Y., He, Z., Shi, Y., Dai, B., Song, Y., 2023. Geogalactica: a scientific large language model in geoscience. arXiv preprint arXiv:2401.00434.
– volume: 1
  start-page: 544
  year: 2020
  end-page: 553
  ident: bib6
  article-title: The concept and future prospects of soil health
  publication-title: Nat. Rev. Earth Environ.
– reference: Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R., 2022. Galactica: a large language model for science. arXiv preprint arXiv:2211.09085.
– reference: Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S., 2021. On the dangers of stochastic parrots: can language models be too big?? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency,
– volume: 14
  start-page: 3407
  year: 2024
  ident: bib4
  article-title: Evaluating responses by ChatGPT to farmers’ questions on irrigated lowland rice cultivation in Nigeria
  publication-title: Sci. Rep.
– reference: Beltagy, I., Lo, K., Cohan, A., 2019. SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
– volume: 5
  start-page: 13
  year: 2024
  ident: bib5
  article-title: Local climate services for all, courtesy of large language models
  publication-title: Commun. Earth Environ.
– year: 2025
  ident: bib13
  article-title: Biogeochemistry-informed neural network (BINN) for improving accuracy of model prediction and scientific understanding of soil organic carbon
  publication-title: arXiv Prepr.
– volume: 2024
  start-page: 1
  year: 2024
  end-page: 35
  ident: bib14
  article-title: Exploring the opportunities and challenges of using large language models to represent institutional agency in land system modelling
  publication-title: EGUsphere
– volume: 15
  year: 2025
  ident: bib11
  article-title: Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation
  publication-title: Sci. Rep.
– volume: 76
  year: 2025
  ident: bib9
  article-title: Machine learning and artificial intelligence applications in soil science
  publication-title: Eur. J. Soil Sci.
– reference: Deng, C., Zhang, T., He, Z., Chen, Q., Shi, Y., Xu, Y., Fu, L., Zhang, W., Wang, X., Zhou, C., 2024. K2: a foundation language model for geoscience knowledge understanding and utilization. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining.
– volume: 62
  year: 2024
  ident: bib10
  article-title: Estimating surrogates, utility graphs and indicator sets for soil capacity and security assessments using legacy data
  publication-title: Soil Res.
– volume: 5
  start-page: 168
  year: 2024
  ident: bib8
  article-title: Large language models reveal big disparities in current wildfire research
  publication-title: Commun. Earth Environ.
– ident: 10.1016/j.soilad.2025.100065_bib12
– year: 2025
  ident: 10.1016/j.soilad.2025.100065_bib13
  article-title: Biogeochemistry-informed neural network (BINN) for improving accuracy of model prediction and scientific understanding of soil organic carbon
  publication-title: arXiv Prepr.
– volume: 62
  issue: 2
  year: 2024
  ident: 10.1016/j.soilad.2025.100065_bib10
  article-title: Estimating surrogates, utility graphs and indicator sets for soil capacity and security assessments using legacy data
  publication-title: Soil Res.
  doi: 10.1071/SR23138
– volume: 76
  issue: 2
  year: 2025
  ident: 10.1016/j.soilad.2025.100065_bib9
  article-title: Machine learning and artificial intelligence applications in soil science
  publication-title: Eur. J. Soil Sci.
  doi: 10.1111/ejss.70093
– ident: 10.1016/j.soilad.2025.100065_bib1
  doi: 10.18653/v1/D19-1371
– ident: 10.1016/j.soilad.2025.100065_bib3
  doi: 10.1145/3616855.3635772
– volume: 15
  issue: 1
  year: 2025
  ident: 10.1016/j.soilad.2025.100065_bib11
  article-title: Leveraging explainable AI to predict soil respiration sensitivity and its drivers for climate change mitigation
  publication-title: Sci. Rep.
  doi: 10.1038/s41598-025-96216-y
– ident: 10.1016/j.soilad.2025.100065_bib2
  doi: 10.1145/3442188.3445922
– volume: 2024
  start-page: 1
  year: 2024
  ident: 10.1016/j.soilad.2025.100065_bib14
  article-title: Exploring the opportunities and challenges of using large language models to represent institutional agency in land system modelling
  publication-title: EGUsphere
– volume: 5
  start-page: 13
  issue: 1
  year: 2024
  ident: 10.1016/j.soilad.2025.100065_bib5
  article-title: Local climate services for all, courtesy of large language models
  publication-title: Commun. Earth Environ.
  doi: 10.1038/s43247-023-01199-1
– volume: 5
  start-page: 168
  issue: 1
  year: 2024
  ident: 10.1016/j.soilad.2025.100065_bib8
  article-title: Large language models reveal big disparities in current wildfire research
  publication-title: Commun. Earth Environ.
  doi: 10.1038/s43247-024-01341-7
– volume: 14
  start-page: 3407
  issue: 1
  year: 2024
  ident: 10.1016/j.soilad.2025.100065_bib4
  article-title: Evaluating responses by ChatGPT to farmers’ questions on irrigated lowland rice cultivation in Nigeria
  publication-title: Sci. Rep.
  doi: 10.1038/s41598-024-53916-1
– volume: 1
  start-page: 544
  issue: 10
  year: 2020
  ident: 10.1016/j.soilad.2025.100065_bib6
  article-title: The concept and future prospects of soil health
  publication-title: Nat. Rev. Earth Environ.
  doi: 10.1038/s43017-020-0080-8
– ident: 10.1016/j.soilad.2025.100065_bib7
SSID ssj0003321021
Score 2.3105242
Snippet Soil health has become a critical lens through which global challenges in sustainability, food security, and climate resilience are addressed. However, the...
SourceID doaj
crossref
elsevier
SourceType Open Website
Index Database
Publisher
StartPage 100065
SubjectTerms AI in ecology
Environmental informatics
GPT-4
Knowledge integration
Large language models
Policy synthesis
Semantic analysis
Soil health
Title Large language models and the future of soil health: Bridging knowledge gaps through scalable semantic intelligence
URI https://dx.doi.org/10.1016/j.soilad.2025.100065
https://doaj.org/article/ca29741714a547d3ae1b2f2628ee13fc
Volume 4
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQJxYEAkT50g2sEYkTxy4bRa0qRJmo1C2yHRsVQVqRsvLbuYuTKhMsLBmi6C56Z_mdrbt3jN3kRmUmFi7yJrFRJnMRmcTwyKhSjow1XhnqHZ4_57NF9rgUy96oL6oJC_LAAbhbqzmmvDSmW4tMlql2aMrznCvnktRb2n2R83qHKdqDU2pN4UnXK9cUdNXr1bsmcVAuqDYgJj7pcVEj2d-jpB7NTA_ZQZsfwn34ryO256pjVj9RvTZ0d4vQjK-pQVclYP4GQRcE1h7IOYTWxjsYUy8WMhPs7s3gVW9qaEfzQI3hocYpqN0H4ruysOrpc56wxXTy8jCL2mkJkUX6EZG3aiQTb_FIZ0ptEDVrSy49N7FGjucuE5jMuRES_EhZp7SIjUwdYmpVqTOTnrJBta7cGQPpVZYqpb1TeSas11IglVqVCCvjMlVDFnW4FZsgilF01WJvRcC5IJyLgPOQjQnc3bckad28wEAXbaCLvwI9ZLILTdFmB4H10dTqV_fn_-H-gu2TyVDJcskG288vd4X5yNZcN0sPn_PvyQ9gZ-GT
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+language+models+and+the+future+of+soil+health%3A+Bridging+knowledge+gaps+through+scalable+semantic+intelligence&rft.jtitle=Soil+Advances&rft.au=Wu%2C+Yu&rft.date=2025-12-01&rft.pub=Elsevier+B.V&rft.issn=2950-2896&rft.eissn=2950-2896&rft.volume=4&rft_id=info:doi/10.1016%2Fj.soilad.2025.100065&rft.externalDocID=S2950289625000338
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2950-2896&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2950-2896&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2950-2896&client=summon