Building an Intelligent Data Exploring Assistant for Geoscientists

Advances in natural‐language processing and large language models (LLMs) are transforming how geoscientists interact with complex data sets, enabling efficient and intuitive scientific analyses. This study introduces the Intelligent Data Exploring Assistant (IDEA), a prototype software framework tha...

Full description

Saved in:
Bibliographic Details
Published inJournal of geophysical research. Machine learning and computation Vol. 2; no. 3
Main Authors Widlansky, Matthew J., Komar, Nemanja
Format Journal Article
LanguageEnglish
Published 01.09.2025
Online AccessGet full text

Cover

Loading…
Abstract Advances in natural‐language processing and large language models (LLMs) are transforming how geoscientists interact with complex data sets, enabling efficient and intuitive scientific analyses. This study introduces the Intelligent Data Exploring Assistant (IDEA), a prototype software framework that integrates existing LLM technology with domain‐specific instructions, data, analytical tools, and computing resources to support geoscientific research. We demonstrate its application through the Station Explorer Assistant (SEA), a web‐based tool designed for sea level scientists. SEA empowers users to analyze and interpret coastal water level data by addressing challenges such as vertical datum conversions and assessing flooding risks. We also demonstrate the generalizability of building an IDEA, whereby we deploy a local instance of the framework to analyze atmospheric observations from Mars collected by NASA's InSight Mission. By combining LLM capabilities with robust domain‐specific customizations, SEA and the Mars IDEA generate accurate analyses, visualizations, and insights through natural‐language prompts. This study highlights the potential of IDEA frameworks to lower technical barriers, enhance educational opportunities, and transform geoscientific workflows while addressing the limitations and uncertainties of current LLM technology. Artificial intelligence (AI) is transforming how scientists explore and understand our world. At the University of Hawaiʻi Sea Level Center (UHSLC), we are developing tools that use large language models, like what ChatGPT uses, to help scientists study sea level changes. One such tool, called the Station Explorer Assistant (SEA), allows researchers to ask questions in everyday language and receive clear explanations and data analyses in response. SEA uses AI to analyze sea level data, compare water levels to normal conditions, and predict potential flooding, drawing on the UHSLC's extensive database. It even writes and runs its own analysis software, which it shows the user to check that its results are accurate. By making sea level science easier to understand and access, SEA can support communities adapting to rising seas and other coastal challenges. SEA technology is generalizable across geoscience domains through a framework we call an Intelligent Data Exploring Assistant (IDEA), which we demonstrate by asking it to analyze wind observations from Mars. Our work highlights how AI can enhance scientific research and communication, and we envision similar tools being created to support scientists in many fields. Large language models can assist geoscientists by generating data analyses and visualizations from natural‐language prompts A general‐purpose Intelligent Data Exploring Assistant shows the potential of artificial intelligence to enhance geoscience research The Station Explorer Assistant analyzes water level data from tide gauges providing insights into sea level variability and risks
AbstractList Advances in natural‐language processing and large language models (LLMs) are transforming how geoscientists interact with complex data sets, enabling efficient and intuitive scientific analyses. This study introduces the Intelligent Data Exploring Assistant (IDEA), a prototype software framework that integrates existing LLM technology with domain‐specific instructions, data, analytical tools, and computing resources to support geoscientific research. We demonstrate its application through the Station Explorer Assistant (SEA), a web‐based tool designed for sea level scientists. SEA empowers users to analyze and interpret coastal water level data by addressing challenges such as vertical datum conversions and assessing flooding risks. We also demonstrate the generalizability of building an IDEA, whereby we deploy a local instance of the framework to analyze atmospheric observations from Mars collected by NASA's InSight Mission. By combining LLM capabilities with robust domain‐specific customizations, SEA and the Mars IDEA generate accurate analyses, visualizations, and insights through natural‐language prompts. This study highlights the potential of IDEA frameworks to lower technical barriers, enhance educational opportunities, and transform geoscientific workflows while addressing the limitations and uncertainties of current LLM technology. Artificial intelligence (AI) is transforming how scientists explore and understand our world. At the University of Hawaiʻi Sea Level Center (UHSLC), we are developing tools that use large language models, like what ChatGPT uses, to help scientists study sea level changes. One such tool, called the Station Explorer Assistant (SEA), allows researchers to ask questions in everyday language and receive clear explanations and data analyses in response. SEA uses AI to analyze sea level data, compare water levels to normal conditions, and predict potential flooding, drawing on the UHSLC's extensive database. It even writes and runs its own analysis software, which it shows the user to check that its results are accurate. By making sea level science easier to understand and access, SEA can support communities adapting to rising seas and other coastal challenges. SEA technology is generalizable across geoscience domains through a framework we call an Intelligent Data Exploring Assistant (IDEA), which we demonstrate by asking it to analyze wind observations from Mars. Our work highlights how AI can enhance scientific research and communication, and we envision similar tools being created to support scientists in many fields. Large language models can assist geoscientists by generating data analyses and visualizations from natural‐language prompts A general‐purpose Intelligent Data Exploring Assistant shows the potential of artificial intelligence to enhance geoscience research The Station Explorer Assistant analyzes water level data from tide gauges providing insights into sea level variability and risks
Author Widlansky, Matthew J.
Komar, Nemanja
Author_xml – sequence: 1
  givenname: Matthew J.
  orcidid: 0000-0002-3765-7327
  surname: Widlansky
  fullname: Widlansky, Matthew J.
  organization: School of Ocean and Earth Science and Technology (SOEST) Cooperative Institute for Marine and Atmospheric Research University of Hawaiʻi at Mānoa Honolulu HI USA, Department of Oceanography SOEST University of Hawaiʻi at Mānoa Honolulu HI USA
– sequence: 2
  givenname: Nemanja
  surname: Komar
  fullname: Komar, Nemanja
  organization: School of Ocean and Earth Science and Technology (SOEST) Cooperative Institute for Marine and Atmospheric Research University of Hawaiʻi at Mānoa Honolulu HI USA
BookMark eNpNUMFOwzAUi9CQGGM3PqAfQOElL1ma4zZgG5rEZfcqTV-moJJOSZHg7-kEh51s2ZZl-ZZNYh-JsXsOjxyEeRIg1NsWABbSXLGpMAZLJThMLvgNm-f8MWYQBVSgp2y1-gpdG-KxsLHYxYG6LhwpDsWzHWzx8n3q-nR2lzmHPNjR8H0qNtRnF8bYqOU7du1tl2n-jzN2eH05rLfl_n2zWy_3pdPKlNJI6XirJVnlQDbNuFM7g97yaiFQqZYqchobcFxIQkkaW_QVNbJ13hPO2MNfrUt9zol8fUrh06afmkN9fqC-fAB_AcopT6w
Cites_doi 10.3390/electronics13173417
10.1175/JCLI‐D‐16‐0836.1
10.18653/v1/2024.findings-emnlp.815
10.5281/zenodo.4124259
10.1029/2023CN000212
10.1111/exsy.13654
10.1038/d41586‐024‐02842‐3
10.18653/v1/2024.sicon-1.2
10.22541/essoar.168132856.66485758/v1
10.1126/science.adg7879
10.1126/science.abq1158
10.1038/d41586‐023‐00107‐z
10.1038/d41586-024-03070-5
10.1038/d41586‐024‐03905‐1
10.1038/d41586‐024‐01003‐w
10.1175/BAMS‐D‐24‐0157.1
10.1071/ES19024
10.1029/2023WR036288
10.1038/s41561-024-01475-5
10.1038/s41586‐024‐07421‐0
10.1038/d41586‐024‐03940‐y
10.1038/d41586‐022‐03479‐w
10.22541/essoar.174042987.76981404/v1
10.1029/2021GL095453
10.1038/s41561‐020‐0544‐y
10.48550/arXiv.2503.23037
10.1038/d41586‐024‐0
10.1007/s10462‐023‐10540‐1
10.1038/d41586‐022‐04383‐z
10.1007/s12371‐024‐01011‐2
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1029/2025JH000649
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2993-5210
ExternalDocumentID 10_1029_2025JH000649
GroupedDBID 0R~
24P
AAMMB
AAYXX
ACCMX
AEFGJ
AGXDD
AIDQK
AIDYY
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
M~E
WIN
ID FETCH-LOGICAL-c759-4944c1d74ea5c04bb0647c93fa1862355de8ec73b0c124e34e73d3f8eb4dcffe3
ISSN 2993-5210
IngestDate Thu Jul 31 00:15:40 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c759-4944c1d74ea5c04bb0647c93fa1862355de8ec73b0c124e34e73d3f8eb4dcffe3
ORCID 0000-0002-3765-7327
OpenAccessLink https://onlinelibrary.wiley.com/doi/pdfdirect/10.1029/2025JH000649
ParticipantIDs crossref_primary_10_1029_2025JH000649
PublicationCentury 2000
PublicationDate 2025-09-00
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-00
PublicationDecade 2020
PublicationTitle Journal of geophysical research. Machine learning and computation
PublicationYear 2025
References CMEMS (e_1_2_8_16_1) 2025
e_1_2_8_24_1
e_1_2_8_47_1
e_1_2_8_26_1
e_1_2_8_49_1
Boerner T. J. (e_1_2_8_7_1) 2023
e_1_2_8_3_1
e_1_2_8_5_1
e_1_2_8_9_1
e_1_2_8_20_1
e_1_2_8_43_1
e_1_2_8_22_1
e_1_2_8_45_1
e_1_2_8_41_1
e_1_2_8_19_1
e_1_2_8_13_1
IOC (e_1_2_8_30_1) 2020; 1
e_1_2_8_36_1
e_1_2_8_15_1
e_1_2_8_38_1
e_1_2_8_32_1
e_1_2_8_34_1
e_1_2_8_51_1
e_1_2_8_29_1
e_1_2_8_46_1
e_1_2_8_27_1
e_1_2_8_48_1
Caldwell P. C. (e_1_2_8_11_1) 2015
Huang J. (e_1_2_8_28_1) 2023; 13
e_1_2_8_2_1
e_1_2_8_4_1
e_1_2_8_6_1
e_1_2_8_8_1
e_1_2_8_21_1
e_1_2_8_42_1
Hancock D. Y. (e_1_2_8_25_1) 2021
e_1_2_8_23_1
e_1_2_8_44_1
e_1_2_8_40_1
e_1_2_8_18_1
e_1_2_8_39_1
Conroy G. (e_1_2_8_17_1) 2024
e_1_2_8_14_1
e_1_2_8_35_1
e_1_2_8_37_1
e_1_2_8_10_1
e_1_2_8_31_1
e_1_2_8_12_1
e_1_2_8_33_1
e_1_2_8_52_1
e_1_2_8_50_1
References_xml – ident: e_1_2_8_52_1
  doi: 10.3390/electronics13173417
– ident: e_1_2_8_37_1
– ident: e_1_2_8_27_1
  doi: 10.1175/JCLI‐D‐16‐0836.1
– start-page: 1
  volume-title: Practice and experience in advanced research computing (PEARC ’21)
  year: 2021
  ident: e_1_2_8_25_1
– ident: e_1_2_8_22_1
  doi: 10.18653/v1/2024.findings-emnlp.815
– start-page: 4
  volume-title: In practice and experience in advanced research computing (PEARC ’23)
  year: 2023
  ident: e_1_2_8_7_1
– ident: e_1_2_8_36_1
  doi: 10.5281/zenodo.4124259
– ident: e_1_2_8_19_1
  doi: 10.1029/2023CN000212
– ident: e_1_2_8_4_1
– ident: e_1_2_8_23_1
  doi: 10.1111/exsy.13654
– ident: e_1_2_8_13_1
  doi: 10.1038/d41586‐024‐02842‐3
– ident: e_1_2_8_10_1
– volume: 13
  start-page: 1148
  issue: 4
  year: 2023
  ident: e_1_2_8_28_1
  article-title: The role of ChatGPT in scientific communication: Writing better scientific review articles
  publication-title: American Journal of Cancer Research
– ident: e_1_2_8_51_1
  doi: 10.18653/v1/2024.sicon-1.2
– ident: e_1_2_8_44_1
  doi: 10.22541/essoar.168132856.66485758/v1
– ident: e_1_2_8_46_1
  doi: 10.1126/science.adg7879
– ident: e_1_2_8_32_1
  doi: 10.1126/science.abq1158
– ident: e_1_2_8_45_1
  doi: 10.1038/d41586‐023‐00107‐z
– volume-title: Do AI models produce more original ideas than researchers?
  year: 2024
  ident: e_1_2_8_17_1
  doi: 10.1038/d41586-024-03070-5
– ident: e_1_2_8_15_1
– ident: e_1_2_8_3_1
  doi: 10.1038/d41586‐024‐03905‐1
– ident: e_1_2_8_42_1
– volume-title: E. U. C. M. S. I. (CMEMS), marine data store (MDS)
  year: 2025
  ident: e_1_2_8_16_1
– ident: e_1_2_8_26_1
  doi: 10.1038/d41586‐024‐01003‐w
– ident: e_1_2_8_34_1
– ident: e_1_2_8_48_1
– volume: 1
  issue: 144
  year: 2020
  ident: e_1_2_8_30_1
  article-title: Quality control of in situ sea level observations: A review and progress towards automated quality control
  publication-title: Manuals and guides
– ident: e_1_2_8_9_1
  doi: 10.1175/BAMS‐D‐24‐0157.1
– ident: e_1_2_8_24_1
  doi: 10.1071/ES19024
– ident: e_1_2_8_43_1
– ident: e_1_2_8_33_1
– ident: e_1_2_8_20_1
  doi: 10.1029/2023WR036288
– ident: e_1_2_8_2_1
  doi: 10.1038/s41561-024-01475-5
– ident: e_1_2_8_47_1
– ident: e_1_2_8_18_1
  doi: 10.1038/s41586‐024‐07421‐0
– ident: e_1_2_8_21_1
– ident: e_1_2_8_31_1
  doi: 10.1038/d41586‐024‐03940‐y
– ident: e_1_2_8_29_1
  doi: 10.1038/d41586‐022‐03479‐w
– ident: e_1_2_8_50_1
  doi: 10.22541/essoar.174042987.76981404/v1
– ident: e_1_2_8_14_1
  doi: 10.1029/2021GL095453
– ident: e_1_2_8_35_1
– ident: e_1_2_8_5_1
  doi: 10.1038/s41561‐020‐0544‐y
– ident: e_1_2_8_8_1
– ident: e_1_2_8_38_1
– ident: e_1_2_8_40_1
  doi: 10.48550/arXiv.2503.23037
– ident: e_1_2_8_49_1
– ident: e_1_2_8_39_1
  doi: 10.1038/d41586‐024‐0
– ident: e_1_2_8_41_1
  doi: 10.1007/s10462‐023‐10540‐1
– volume-title: Sea level measured by tide gauges from global oceans — The joint archive for sea level holdings (NCEI accession 0019568), version 5.5
  year: 2015
  ident: e_1_2_8_11_1
– ident: e_1_2_8_12_1
  doi: 10.1038/d41586‐022‐04383‐z
– ident: e_1_2_8_6_1
  doi: 10.1007/s12371‐024‐01011‐2
SSID ssj0003320807
Score 2.301641
Snippet Advances in natural‐language processing and large language models (LLMs) are transforming how geoscientists interact with complex data sets, enabling efficient...
SourceID crossref
SourceType Index Database
Title Building an Intelligent Data Exploring Assistant for Geoscientists
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ07T8MwFIUtHgsLAgHiLQ8wVSmp7TTJSBGoqgRTEd0qPymopFVJFwZ-O9eJ8wAFqbBEkZW6bb7IPrn2uRehC6N9IZgMPGEI9Zg0Es5k5DFtZNg1cUdH1uB8_9DtP7LBKBhV5Y4yd0kq2vKj0VfyH6rQBlytS_YPZMtOoQHOgS8cgTAcV2LcczWtWzxxfhCbWzMFlClvVbvrAIFViTYL02xho-C5CxLa3n_Rps96Ni8AunRAk7YtUjSxonRahFOcJ26-_L6e__SipjAD5lFZV1C8NWiXY_vM7ep-0G88eeX1wAMJyp1VbnwidusfzP75sopuaHMDLKk9R7Rx2PaJzXpqv2TQz1RSXE1PxZL8j1mr3EuYraKTeFz_9DraJPDaYCta3H9WMTdKiZ876Mvf6bwQ0MFVvYOaSqnJjeEO2nYs8HUOfRet6WQP9QrgmCe4Bhxb4LgEjkvgGIDjb8D30fDudnjT91wRDE-GgS0AyJjsqJBpHkifCWHNwTKmhnfgXRTEotKRliEVvgSlpinTIVXURFowJY3R9ABtJLNEHyJMbPZBzpTiNGaKdLlvQL9EklPJI6L8I3RZ_OfxPE91Mm66uccrXneCtqrH5hRtpIulPgMFl4rzLPJxnsH5AoihSOM
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Building+an+Intelligent+Data+Exploring+Assistant+for+Geoscientists&rft.jtitle=Journal+of+geophysical+research.+Machine+learning+and+computation&rft.au=Widlansky%2C+Matthew+J.&rft.au=Komar%2C+Nemanja&rft.date=2025-09-01&rft.issn=2993-5210&rft.eissn=2993-5210&rft.volume=2&rft.issue=3&rft_id=info:doi/10.1029%2F2025JH000649&rft.externalDBID=n%2Fa&rft.externalDocID=10_1029_2025JH000649
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2993-5210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2993-5210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2993-5210&client=summon