Quantitative Provenance Analysis of the Yangtze and Yellow River Sediments Through Detrital Zircon U‐Pb Geochronology Using an XGBoost Machine Learning Algorithm

Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation o...

Full description

Saved in:
Bibliographic Details
Published inJournal of geophysical research. Machine learning and computation Vol. 2; no. 3
Main Authors Huang, X. T., Guo, Y. L., Wang, P., Hohl, S. V., Zhao, X. L., Li, Y. L., Yang, S. Y.
Format Journal Article
LanguageEnglish
Published 01.09.2025
Online AccessGet full text
ISSN2993-5210
2993-5210
DOI10.1029/2025JH000763

Cover

Loading…
Abstract Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation of detrital zircon geochronology. In this study, we leveraged machine learning techniques to analyze a data set of over 33,000 zircon U‐Pb ages, refining the data to 28,082 ages for our analysis. We employed two characterization strategies: tectonic classification and kernel density estimation, and optimized our models through hyperparameter tuning. Our results demonstrated that the machine learning algorithm, eXtreme Gradient Boosting (XGBoost), significantly improved the accuracy of predicting sediment sources when compared to conventional methods (e.g., multidimensional scaling diagram). Additionally, we found that the most informative age populations were associated with the orogenic events (e.g., Jinning, 800–1,000 Ma, Tianshan, 260–394 Ma, and Nanhua, 680–800 Ma) rather than the movements of Lvliang (1,800–2,500 Ma) and Wutai (2,500–2,800 Ma), as suggested in previous studies. Finally, we tested the optimized models on several case studies, illustrating the effectiveness in identifying provenance signals for modern and quaternary sediments in East China Seas and the Yangtze Delta. While this machine learning approach shows great potential for improving sediment provenance analysis in these case studies, it is still limited by the availability and quality of detrital zircon age data for more detailed provenance analysis on sub‐basin scales. In the last 20 years, many studies have looked at the ages of zircon minerals from the Yangtze and Yellow River Basins and nearby seas, but figuring out where these zircons come from has been tricky. Scientists often debate the best methods due to problems with sampling and analysis. To tackle this issue, we used recent advancements in machine learning to analyze a large data set of zircon ages—over 33,000 from past research. After cleaning the data, we focused on 28,000 ages and developed two models to help classify the sources of these zircons: one based on tectonic history (T model) and another using statistical methods (K model). We trained these models and found they performed well in predicting the provenance of the zircons, showing better accuracy than traditional methods. Notably, the most telling age groups for distinguishing between the two river basins are linked to specific geological events, rather than older movements previously thought to be significant. We further applied these models to real‐world examples, demonstrating that they can effectively differentiate sediment sources in various locations. This study highlights the promise of machine learning in geoscience for analyzing zircon provenance, though we still need more data to improve these methods. We proposed two characterization strategies to assess the ability of machine learning to distinguish the sources of detrital zircon The XGBoost models exhibited improved predictions of provenance compared to conventional U‐Pb comparisons in our case studies While machine learning shows great promise for provenance analysis, it requires further attention and development
AbstractList Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation of detrital zircon geochronology. In this study, we leveraged machine learning techniques to analyze a data set of over 33,000 zircon U‐Pb ages, refining the data to 28,082 ages for our analysis. We employed two characterization strategies: tectonic classification and kernel density estimation, and optimized our models through hyperparameter tuning. Our results demonstrated that the machine learning algorithm, eXtreme Gradient Boosting (XGBoost), significantly improved the accuracy of predicting sediment sources when compared to conventional methods (e.g., multidimensional scaling diagram). Additionally, we found that the most informative age populations were associated with the orogenic events (e.g., Jinning, 800–1,000 Ma, Tianshan, 260–394 Ma, and Nanhua, 680–800 Ma) rather than the movements of Lvliang (1,800–2,500 Ma) and Wutai (2,500–2,800 Ma), as suggested in previous studies. Finally, we tested the optimized models on several case studies, illustrating the effectiveness in identifying provenance signals for modern and quaternary sediments in East China Seas and the Yangtze Delta. While this machine learning approach shows great potential for improving sediment provenance analysis in these case studies, it is still limited by the availability and quality of detrital zircon age data for more detailed provenance analysis on sub‐basin scales. In the last 20 years, many studies have looked at the ages of zircon minerals from the Yangtze and Yellow River Basins and nearby seas, but figuring out where these zircons come from has been tricky. Scientists often debate the best methods due to problems with sampling and analysis. To tackle this issue, we used recent advancements in machine learning to analyze a large data set of zircon ages—over 33,000 from past research. After cleaning the data, we focused on 28,000 ages and developed two models to help classify the sources of these zircons: one based on tectonic history (T model) and another using statistical methods (K model). We trained these models and found they performed well in predicting the provenance of the zircons, showing better accuracy than traditional methods. Notably, the most telling age groups for distinguishing between the two river basins are linked to specific geological events, rather than older movements previously thought to be significant. We further applied these models to real‐world examples, demonstrating that they can effectively differentiate sediment sources in various locations. This study highlights the promise of machine learning in geoscience for analyzing zircon provenance, though we still need more data to improve these methods. We proposed two characterization strategies to assess the ability of machine learning to distinguish the sources of detrital zircon The XGBoost models exhibited improved predictions of provenance compared to conventional U‐Pb comparisons in our case studies While machine learning shows great promise for provenance analysis, it requires further attention and development
Author Wang, P.
Hohl, S. V.
Zhao, X. L.
Huang, X. T.
Guo, Y. L.
Li, Y. L.
Yang, S. Y.
Author_xml – sequence: 1
  givenname: X. T.
  orcidid: 0000-0003-2485-6316
  surname: Huang
  fullname: Huang, X. T.
  organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China
– sequence: 2
  givenname: Y. L.
  orcidid: 0000-0003-0484-8642
  surname: Guo
  fullname: Guo, Y. L.
  organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China
– sequence: 3
  givenname: P.
  surname: Wang
  fullname: Wang, P.
  organization: School of Geography Nanjing Normal University Nanjing People's Republic of China
– sequence: 4
  givenname: S. V.
  orcidid: 0000-0002-0522-4973
  surname: Hohl
  fullname: Hohl, S. V.
  organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China
– sequence: 5
  givenname: X. L.
  orcidid: 0000-0002-8137-7429
  surname: Zhao
  fullname: Zhao, X. L.
  organization: Nanjing Geological Survey Center China Geological Survey Nanjing People's Republic of China
– sequence: 6
  givenname: Y. L.
  orcidid: 0000-0003-3747-7745
  surname: Li
  fullname: Li, Y. L.
  organization: School of Environmental and Geographical Sciences Shanghai Normal University Shanghai People's Republic of China
– sequence: 7
  givenname: S. Y.
  orcidid: 0000-0002-4810-6598
  surname: Yang
  fullname: Yang, S. Y.
  organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China
BookMark eNpNkE1OAkEUhDsGExHZeYB3AEf7x2HoJaKCBiMqJOKGND2vZ8YM3aZ7wODKI3gHb-ZJHKILVlVJVX2LOiQN6ywScszoKaNcnnHK49shpTTpiD3S5FKKKOaMNnb8AWmH8Fp3hOC0S5Mm-X5YKVsVlaqKNcLYuzVaZTVCz6pyE4oAzkCVI8yUzaoPBGVTmGFZund4rCcenjAtlmirAJPcu1WWwyVWviaW8FJ47SxMfz6_xgsYoNN1w7rSZRuYhsJmNQ2eBxfOhQrulM4LizBC5e0265WZqzn58ojsG1UGbP9ri0yuryb9YTS6H9z0e6NIJ7GIzg12KRMmRSN0l1KpFIsNixOtqEw5k0zqRAvkqVow5LwrY51I0eFmgcqwRLTIyR9WexeCRzN_88VS-c2c0fn24vnuxeIX7UZz9w
Cites_doi 10.1016/j.epsl.2005.03.019
10.1016/j.chemgeo.2011.12.016
10.1016/j.palaeo.2024.112552
10.1016/j.quascirev.2015.12.003
10.1038/s41561‐024‐01550‐x
10.1146/annurev‐earth‐050212‐124012
10.1130/G45852.1
10.1190/geo2017‐0590.1
10.1016/j.quascirev.2019.06.002
10.1016/j.margeo.2022.106857
10.1016/j.chemgeo.2024.122406
10.3390/min10050398
10.1126/science.adh9607
10.16562/j.cnki.0256‐1492.2020040101
10.1038/s41597‐023‐02902‐9
10.1038/s41467‐021‐25232‐z
10.1130/B30151.1
10.1130/G45596.1
10.1145/2939672.2939785
10.1038/s43247‐021‐00225‐4
10.1016/j.chemgeo.2019.03.039
10.1126/sciadv.abq2007
10.5281/zenodo.15353787
10.1038/s41586‐019‐0912‐1
10.1130/RF.L001.1
10.1016/j.palaeo.2020.109691
10.1016/j.epsl.2020.116654
10.2113/0530277
10.1016/j.earscirev.2014.05.014
10.1186/1471‐2105‐12‐77
10.1017/CBO9780511781247
10.1029/2019JB019226
10.1029/2020GC009569
10.1016/j.gsf.2017.06.001
10.1038/ngeo777
10.3390/jmse10020142
10.1002/gdj3.193
10.5194/esurf‐4‐445‐2016
10.1130/G49430.1
10.1111/ggr.12355
10.1038/ncomms9511
10.1016/j.epsl.2017.03.001
10.1016/j.chemgeo.2012.04.021
10.1039/c4ja00024b
10.1016/j.earscirev.2021.103596
10.1016/j.chemgeo.2013.01.010
10.1016/j.gr.2015.09.002
10.13278/j.cnkij.juese.20180099
10.1016/j.epsl.2024.118963
10.1007/s11434‐010‐3091‐x
10.1130/b30722.1
10.1016/j.epsl.2006.04.035
10.1029/2020GL091896
10.1111/bre.12245
10.1007/978-3-642-11868-5
10.1016/j.earscirev.2019.102899
10.1016/j.earscirev.2016.10.015
10.1130/ges01237.1
10.1016/j.earscirev.2019.102946
10.1007/s12583‐017‐0769‐x
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1029/2025JH000763
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2993-5210
ExternalDocumentID 10_1029_2025JH000763
GroupedDBID 0R~
24P
AAMMB
AAYXX
ACCMX
AEFGJ
AGXDD
AIDQK
AIDYY
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
M~E
WIN
ID FETCH-LOGICAL-c753-4fe8013fdef3c8009aa15f157ca09d21919c7c3e2dab1e22895c79362fbeaf173
ISSN 2993-5210
IngestDate Thu Aug 14 00:08:49 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c753-4fe8013fdef3c8009aa15f157ca09d21919c7c3e2dab1e22895c79362fbeaf173
ORCID 0000-0003-2485-6316
0000-0003-3747-7745
0000-0002-4810-6598
0000-0002-0522-4973
0000-0002-8137-7429
0000-0003-0484-8642
OpenAccessLink https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2025JH000763
ParticipantIDs crossref_primary_10_1029_2025JH000763
PublicationCentury 2000
PublicationDate 2025-09-00
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-00
PublicationDecade 2020
PublicationTitle Journal of geophysical research. Machine learning and computation
PublicationYear 2025
References e_1_2_9_31_1
e_1_2_9_52_1
e_1_2_9_50_1
e_1_2_9_10_1
e_1_2_9_35_1
e_1_2_9_56_1
e_1_2_9_12_1
e_1_2_9_54_1
e_1_2_9_14_1
e_1_2_9_39_1
e_1_2_9_16_1
e_1_2_9_37_1
e_1_2_9_58_1
e_1_2_9_18_1
e_1_2_9_41_1
e_1_2_9_20_1
e_1_2_9_62_1
e_1_2_9_22_1
e_1_2_9_45_1
e_1_2_9_24_1
e_1_2_9_43_1
e_1_2_9_8_1
e_1_2_9_6_1
e_1_2_9_4_1
e_1_2_9_60_1
e_1_2_9_2_1
e_1_2_9_26_1
e_1_2_9_49_1
e_1_2_9_28_1
e_1_2_9_47_1
e_1_2_9_30_1
e_1_2_9_53_1
e_1_2_9_51_1
e_1_2_9_11_1
e_1_2_9_34_1
e_1_2_9_57_1
e_1_2_9_13_1
e_1_2_9_32_1
Milliman J. D. (e_1_2_9_33_1) 2011
e_1_2_9_15_1
e_1_2_9_38_1
e_1_2_9_17_1
e_1_2_9_36_1
e_1_2_9_59_1
e_1_2_9_19_1
Chen T. (e_1_2_9_7_1) 2016
e_1_2_9_42_1
e_1_2_9_63_1
e_1_2_9_40_1
e_1_2_9_61_1
e_1_2_9_21_1
e_1_2_9_46_1
e_1_2_9_23_1
e_1_2_9_44_1
e_1_2_9_5_1
e_1_2_9_3_1
e_1_2_9_9_1
e_1_2_9_25_1
e_1_2_9_27_1
e_1_2_9_48_1
e_1_2_9_29_1
Wan T. (e_1_2_9_55_1) 2012
References_xml – ident: e_1_2_9_2_1
  doi: 10.1016/j.epsl.2005.03.019
– ident: e_1_2_9_59_1
  doi: 10.1016/j.chemgeo.2011.12.016
– ident: e_1_2_9_60_1
  doi: 10.1016/j.palaeo.2024.112552
– ident: e_1_2_9_19_1
  doi: 10.1016/j.quascirev.2015.12.003
– ident: e_1_2_9_6_1
  doi: 10.1038/s41561‐024‐01550‐x
– ident: e_1_2_9_13_1
  doi: 10.1146/annurev‐earth‐050212‐124012
– ident: e_1_2_9_32_1
  doi: 10.1130/G45852.1
– ident: e_1_2_9_26_1
  doi: 10.1190/geo2017‐0590.1
– ident: e_1_2_9_62_1
  doi: 10.1016/j.quascirev.2019.06.002
– ident: e_1_2_9_56_1
  doi: 10.1016/j.margeo.2022.106857
– ident: e_1_2_9_15_1
  doi: 10.1016/j.chemgeo.2024.122406
– ident: e_1_2_9_22_1
  doi: 10.3390/min10050398
– ident: e_1_2_9_52_1
  doi: 10.1126/science.adh9607
– ident: e_1_2_9_9_1
  doi: 10.16562/j.cnki.0256‐1492.2020040101
– ident: e_1_2_9_37_1
  doi: 10.1038/s41597‐023‐02902‐9
– ident: e_1_2_9_24_1
  doi: 10.1038/s41467‐021‐25232‐z
– ident: e_1_2_9_27_1
  doi: 10.1130/B30151.1
– ident: e_1_2_9_23_1
  doi: 10.1130/G45596.1
– start-page: 785
  volume-title: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, (San Francisco California USA: ACM)
  year: 2016
  ident: e_1_2_9_7_1
  doi: 10.1145/2939672.2939785
– ident: e_1_2_9_14_1
  doi: 10.1038/s43247‐021‐00225‐4
– ident: e_1_2_9_50_1
  doi: 10.1016/j.chemgeo.2019.03.039
– ident: e_1_2_9_61_1
  doi: 10.1126/sciadv.abq2007
– ident: e_1_2_9_46_1
– ident: e_1_2_9_21_1
  doi: 10.5281/zenodo.15353787
– ident: e_1_2_9_57_1
– ident: e_1_2_9_39_1
  doi: 10.1038/s41586‐019‐0912‐1
– ident: e_1_2_9_49_1
  doi: 10.1130/RF.L001.1
– ident: e_1_2_9_58_1
  doi: 10.1016/j.palaeo.2020.109691
– ident: e_1_2_9_40_1
  doi: 10.1016/j.epsl.2020.116654
– ident: e_1_2_9_12_1
  doi: 10.2113/0530277
– ident: e_1_2_9_16_1
  doi: 10.1016/j.earscirev.2014.05.014
– ident: e_1_2_9_41_1
  doi: 10.1186/1471‐2105‐12‐77
– volume-title: River discharge to the coastal ocean: A global synthesis
  year: 2011
  ident: e_1_2_9_33_1
  doi: 10.1017/CBO9780511781247
– ident: e_1_2_9_28_1
  doi: 10.1029/2019JB019226
– ident: e_1_2_9_20_1
  doi: 10.1029/2020GC009569
– ident: e_1_2_9_36_1
  doi: 10.1016/j.gsf.2017.06.001
– ident: e_1_2_9_10_1
  doi: 10.1038/ngeo777
– ident: e_1_2_9_63_1
  doi: 10.3390/jmse10020142
– ident: e_1_2_9_8_1
  doi: 10.1002/gdj3.193
– ident: e_1_2_9_51_1
  doi: 10.5194/esurf‐4‐445‐2016
– ident: e_1_2_9_44_1
  doi: 10.1130/G49430.1
– ident: e_1_2_9_48_1
  doi: 10.1111/ggr.12355
– ident: e_1_2_9_35_1
  doi: 10.1038/ncomms9511
– ident: e_1_2_9_31_1
  doi: 10.1016/j.epsl.2017.03.001
– ident: e_1_2_9_53_1
  doi: 10.1016/j.chemgeo.2012.04.021
– ident: e_1_2_9_38_1
  doi: 10.1039/c4ja00024b
– ident: e_1_2_9_45_1
  doi: 10.1016/j.earscirev.2021.103596
– ident: e_1_2_9_54_1
  doi: 10.1016/j.chemgeo.2013.01.010
– ident: e_1_2_9_30_1
  doi: 10.1016/j.gr.2015.09.002
– ident: e_1_2_9_17_1
  doi: 10.13278/j.cnkij.juese.20180099
– ident: e_1_2_9_18_1
  doi: 10.1016/j.epsl.2024.118963
– ident: e_1_2_9_25_1
  doi: 10.1007/s11434‐010‐3091‐x
– ident: e_1_2_9_5_1
  doi: 10.1130/b30722.1
– ident: e_1_2_9_34_1
  doi: 10.1016/j.epsl.2006.04.035
– ident: e_1_2_9_43_1
  doi: 10.1029/2020GL091896
– ident: e_1_2_9_4_1
  doi: 10.1111/bre.12245
– volume-title: The tectonics of China:Data, maps and evolution
  year: 2012
  ident: e_1_2_9_55_1
  doi: 10.1007/978-3-642-11868-5
– ident: e_1_2_9_3_1
  doi: 10.1016/j.earscirev.2019.102899
– ident: e_1_2_9_11_1
  doi: 10.1016/j.earscirev.2016.10.015
– ident: e_1_2_9_42_1
  doi: 10.1130/ges01237.1
– ident: e_1_2_9_29_1
  doi: 10.1016/j.earscirev.2019.102946
– ident: e_1_2_9_47_1
  doi: 10.1007/s12583‐017‐0769‐x
SSID ssj0003320807
Score 2.3017886
Snippet Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of...
SourceID crossref
SourceType Index Database
Title Quantitative Provenance Analysis of the Yangtze and Yellow River Sediments Through Detrital Zircon U‐Pb Geochronology Using an XGBoost Machine Learning Algorithm
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LjtNAEG2FYcMGgQDxHdUCVpZN3G2P4yV_azSDAiSQsInadncy0mBHwRbSLBBH4A4ciDtwEqo_7nhQkAY2ltVyylbqqev3qpqQh5wyyWQy9EcpLf1IMO7n6Ef4qQjLJGcjIRLVnHz8-iCbRoezeDYY_OyxltomD4qznX0l_6NVXEO9qi7Zf9CsE4oLeI_6xStqGK8X0vGblle6SUzRf8abGjcu3QHQnzSiHMs5r5bNmSkUzFW15Yv3VvExcKcoT0yP28Se1_NcnbClGiQ_nmwwVPamjg0xzlUCvVDDdM3cJsM2wP1h9uppXX9u1CFGK-W0Hrl0y-myRmmrT39xgZeiXnc4sVOHVoETc9qJsa136_Y8bSBrba57FniTwBGJWp38nQfekVv7YB8cu5WsXunc97vAex_0Mx80dtQuu0FSxT1E98PUdcSONbvD0x6Q2U67MaRq7Kp6yWGmq5Nsax87TsAfZtORGXUZn6aL_q8vkcsU4xZlKY6_bpN-jNGhaeF332mbMVDA476AnpvU83cm18hVqyV4YlB3nQxEdYP86CMOtoiDDnFQS0DEgUUcoOrAIA404sAhDizioEMcGMTB9Ne37-MczmENNNZQGlisgQUJdFgDh7WbZPLyxeRZ5ttjPvwCY2U_kgK9JCZLIVmB4UvKeRjLME4KPkxLNKhhWiQFE7TkeSgoHaVxgUblgMpccBkm7BbZq-pK3CaQJjpG4CWGjVFUxLxIwihKBdVzI2V-hzzq_tTF2gxzWezS3t0LPnePXNni8j7ZazateIA-apPv69zOvtb-bwSzlb8
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantitative+Provenance+Analysis+of+the+Yangtze+and+Yellow+River+Sediments+Through+Detrital+Zircon+U%E2%80%90Pb+Geochronology+Using+an+XGBoost+Machine+Learning+Algorithm&rft.jtitle=Journal+of+geophysical+research.+Machine+learning+and+computation&rft.au=Huang%2C+X.+T.&rft.au=Guo%2C+Y.+L.&rft.au=Wang%2C+P.&rft.au=Hohl%2C+S.+V.&rft.date=2025-09-01&rft.issn=2993-5210&rft.eissn=2993-5210&rft.volume=2&rft.issue=3&rft_id=info:doi/10.1029%2F2025JH000763&rft.externalDBID=n%2Fa&rft.externalDocID=10_1029_2025JH000763
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2993-5210&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2993-5210&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2993-5210&client=summon