Quantitative Provenance Analysis of the Yangtze and Yellow River Sediments Through Detrital Zircon U‐Pb Geochronology Using an XGBoost Machine Learning Algorithm
Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation o...
Saved in:
Published in | Journal of geophysical research. Machine learning and computation Vol. 2; no. 3 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
01.09.2025
|
Online Access | Get full text |
ISSN | 2993-5210 2993-5210 |
DOI | 10.1029/2025JH000763 |
Cover
Loading…
Abstract | Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation of detrital zircon geochronology. In this study, we leveraged machine learning techniques to analyze a data set of over 33,000 zircon U‐Pb ages, refining the data to 28,082 ages for our analysis. We employed two characterization strategies: tectonic classification and kernel density estimation, and optimized our models through hyperparameter tuning. Our results demonstrated that the machine learning algorithm, eXtreme Gradient Boosting (XGBoost), significantly improved the accuracy of predicting sediment sources when compared to conventional methods (e.g., multidimensional scaling diagram). Additionally, we found that the most informative age populations were associated with the orogenic events (e.g., Jinning, 800–1,000 Ma, Tianshan, 260–394 Ma, and Nanhua, 680–800 Ma) rather than the movements of Lvliang (1,800–2,500 Ma) and Wutai (2,500–2,800 Ma), as suggested in previous studies. Finally, we tested the optimized models on several case studies, illustrating the effectiveness in identifying provenance signals for modern and quaternary sediments in East China Seas and the Yangtze Delta. While this machine learning approach shows great potential for improving sediment provenance analysis in these case studies, it is still limited by the availability and quality of detrital zircon age data for more detailed provenance analysis on sub‐basin scales.
In the last 20 years, many studies have looked at the ages of zircon minerals from the Yangtze and Yellow River Basins and nearby seas, but figuring out where these zircons come from has been tricky. Scientists often debate the best methods due to problems with sampling and analysis. To tackle this issue, we used recent advancements in machine learning to analyze a large data set of zircon ages—over 33,000 from past research. After cleaning the data, we focused on 28,000 ages and developed two models to help classify the sources of these zircons: one based on tectonic history (T model) and another using statistical methods (K model). We trained these models and found they performed well in predicting the provenance of the zircons, showing better accuracy than traditional methods. Notably, the most telling age groups for distinguishing between the two river basins are linked to specific geological events, rather than older movements previously thought to be significant. We further applied these models to real‐world examples, demonstrating that they can effectively differentiate sediment sources in various locations. This study highlights the promise of machine learning in geoscience for analyzing zircon provenance, though we still need more data to improve these methods.
We proposed two characterization strategies to assess the ability of machine learning to distinguish the sources of detrital zircon The XGBoost models exhibited improved predictions of provenance compared to conventional U‐Pb comparisons in our case studies While machine learning shows great promise for provenance analysis, it requires further attention and development |
---|---|
AbstractList | Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of sediment between these regions remains challenging. Issues related to sampling, analytical methods, and biases complicate the interpretation of detrital zircon geochronology. In this study, we leveraged machine learning techniques to analyze a data set of over 33,000 zircon U‐Pb ages, refining the data to 28,082 ages for our analysis. We employed two characterization strategies: tectonic classification and kernel density estimation, and optimized our models through hyperparameter tuning. Our results demonstrated that the machine learning algorithm, eXtreme Gradient Boosting (XGBoost), significantly improved the accuracy of predicting sediment sources when compared to conventional methods (e.g., multidimensional scaling diagram). Additionally, we found that the most informative age populations were associated with the orogenic events (e.g., Jinning, 800–1,000 Ma, Tianshan, 260–394 Ma, and Nanhua, 680–800 Ma) rather than the movements of Lvliang (1,800–2,500 Ma) and Wutai (2,500–2,800 Ma), as suggested in previous studies. Finally, we tested the optimized models on several case studies, illustrating the effectiveness in identifying provenance signals for modern and quaternary sediments in East China Seas and the Yangtze Delta. While this machine learning approach shows great potential for improving sediment provenance analysis in these case studies, it is still limited by the availability and quality of detrital zircon age data for more detailed provenance analysis on sub‐basin scales.
In the last 20 years, many studies have looked at the ages of zircon minerals from the Yangtze and Yellow River Basins and nearby seas, but figuring out where these zircons come from has been tricky. Scientists often debate the best methods due to problems with sampling and analysis. To tackle this issue, we used recent advancements in machine learning to analyze a large data set of zircon ages—over 33,000 from past research. After cleaning the data, we focused on 28,000 ages and developed two models to help classify the sources of these zircons: one based on tectonic history (T model) and another using statistical methods (K model). We trained these models and found they performed well in predicting the provenance of the zircons, showing better accuracy than traditional methods. Notably, the most telling age groups for distinguishing between the two river basins are linked to specific geological events, rather than older movements previously thought to be significant. We further applied these models to real‐world examples, demonstrating that they can effectively differentiate sediment sources in various locations. This study highlights the promise of machine learning in geoscience for analyzing zircon provenance, though we still need more data to improve these methods.
We proposed two characterization strategies to assess the ability of machine learning to distinguish the sources of detrital zircon The XGBoost models exhibited improved predictions of provenance compared to conventional U‐Pb comparisons in our case studies While machine learning shows great promise for provenance analysis, it requires further attention and development |
Author | Wang, P. Hohl, S. V. Zhao, X. L. Huang, X. T. Guo, Y. L. Li, Y. L. Yang, S. Y. |
Author_xml | – sequence: 1 givenname: X. T. orcidid: 0000-0003-2485-6316 surname: Huang fullname: Huang, X. T. organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China – sequence: 2 givenname: Y. L. orcidid: 0000-0003-0484-8642 surname: Guo fullname: Guo, Y. L. organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China – sequence: 3 givenname: P. surname: Wang fullname: Wang, P. organization: School of Geography Nanjing Normal University Nanjing People's Republic of China – sequence: 4 givenname: S. V. orcidid: 0000-0002-0522-4973 surname: Hohl fullname: Hohl, S. V. organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China – sequence: 5 givenname: X. L. orcidid: 0000-0002-8137-7429 surname: Zhao fullname: Zhao, X. L. organization: Nanjing Geological Survey Center China Geological Survey Nanjing People's Republic of China – sequence: 6 givenname: Y. L. orcidid: 0000-0003-3747-7745 surname: Li fullname: Li, Y. L. organization: School of Environmental and Geographical Sciences Shanghai Normal University Shanghai People's Republic of China – sequence: 7 givenname: S. Y. orcidid: 0000-0002-4810-6598 surname: Yang fullname: Yang, S. Y. organization: State Key Laboratory of Marine Geology Tongji University Shanghai People's Republic of China |
BookMark | eNpNkE1OAkEUhDsGExHZeYB3AEf7x2HoJaKCBiMqJOKGND2vZ8YM3aZ7wODKI3gHb-ZJHKILVlVJVX2LOiQN6ywScszoKaNcnnHK49shpTTpiD3S5FKKKOaMNnb8AWmH8Fp3hOC0S5Mm-X5YKVsVlaqKNcLYuzVaZTVCz6pyE4oAzkCVI8yUzaoPBGVTmGFZund4rCcenjAtlmirAJPcu1WWwyVWviaW8FJ47SxMfz6_xgsYoNN1w7rSZRuYhsJmNQ2eBxfOhQrulM4LizBC5e0265WZqzn58ojsG1UGbP9ri0yuryb9YTS6H9z0e6NIJ7GIzg12KRMmRSN0l1KpFIsNixOtqEw5k0zqRAvkqVow5LwrY51I0eFmgcqwRLTIyR9WexeCRzN_88VS-c2c0fn24vnuxeIX7UZz9w |
Cites_doi | 10.1016/j.epsl.2005.03.019 10.1016/j.chemgeo.2011.12.016 10.1016/j.palaeo.2024.112552 10.1016/j.quascirev.2015.12.003 10.1038/s41561‐024‐01550‐x 10.1146/annurev‐earth‐050212‐124012 10.1130/G45852.1 10.1190/geo2017‐0590.1 10.1016/j.quascirev.2019.06.002 10.1016/j.margeo.2022.106857 10.1016/j.chemgeo.2024.122406 10.3390/min10050398 10.1126/science.adh9607 10.16562/j.cnki.0256‐1492.2020040101 10.1038/s41597‐023‐02902‐9 10.1038/s41467‐021‐25232‐z 10.1130/B30151.1 10.1130/G45596.1 10.1145/2939672.2939785 10.1038/s43247‐021‐00225‐4 10.1016/j.chemgeo.2019.03.039 10.1126/sciadv.abq2007 10.5281/zenodo.15353787 10.1038/s41586‐019‐0912‐1 10.1130/RF.L001.1 10.1016/j.palaeo.2020.109691 10.1016/j.epsl.2020.116654 10.2113/0530277 10.1016/j.earscirev.2014.05.014 10.1186/1471‐2105‐12‐77 10.1017/CBO9780511781247 10.1029/2019JB019226 10.1029/2020GC009569 10.1016/j.gsf.2017.06.001 10.1038/ngeo777 10.3390/jmse10020142 10.1002/gdj3.193 10.5194/esurf‐4‐445‐2016 10.1130/G49430.1 10.1111/ggr.12355 10.1038/ncomms9511 10.1016/j.epsl.2017.03.001 10.1016/j.chemgeo.2012.04.021 10.1039/c4ja00024b 10.1016/j.earscirev.2021.103596 10.1016/j.chemgeo.2013.01.010 10.1016/j.gr.2015.09.002 10.13278/j.cnkij.juese.20180099 10.1016/j.epsl.2024.118963 10.1007/s11434‐010‐3091‐x 10.1130/b30722.1 10.1016/j.epsl.2006.04.035 10.1029/2020GL091896 10.1111/bre.12245 10.1007/978-3-642-11868-5 10.1016/j.earscirev.2019.102899 10.1016/j.earscirev.2016.10.015 10.1130/ges01237.1 10.1016/j.earscirev.2019.102946 10.1007/s12583‐017‐0769‐x |
ContentType | Journal Article |
DBID | AAYXX CITATION |
DOI | 10.1029/2025JH000763 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2993-5210 |
ExternalDocumentID | 10_1029_2025JH000763 |
GroupedDBID | 0R~ 24P AAMMB AAYXX ACCMX AEFGJ AGXDD AIDQK AIDYY ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E WIN |
ID | FETCH-LOGICAL-c753-4fe8013fdef3c8009aa15f157ca09d21919c7c3e2dab1e22895c79362fbeaf173 |
ISSN | 2993-5210 |
IngestDate | Thu Aug 14 00:08:49 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c753-4fe8013fdef3c8009aa15f157ca09d21919c7c3e2dab1e22895c79362fbeaf173 |
ORCID | 0000-0003-2485-6316 0000-0003-3747-7745 0000-0002-4810-6598 0000-0002-0522-4973 0000-0002-8137-7429 0000-0003-0484-8642 |
OpenAccessLink | https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2025JH000763 |
ParticipantIDs | crossref_primary_10_1029_2025JH000763 |
PublicationCentury | 2000 |
PublicationDate | 2025-09-00 |
PublicationDateYYYYMMDD | 2025-09-01 |
PublicationDate_xml | – month: 09 year: 2025 text: 2025-09-00 |
PublicationDecade | 2020 |
PublicationTitle | Journal of geophysical research. Machine learning and computation |
PublicationYear | 2025 |
References | e_1_2_9_31_1 e_1_2_9_52_1 e_1_2_9_50_1 e_1_2_9_10_1 e_1_2_9_35_1 e_1_2_9_56_1 e_1_2_9_12_1 e_1_2_9_54_1 e_1_2_9_14_1 e_1_2_9_39_1 e_1_2_9_16_1 e_1_2_9_37_1 e_1_2_9_58_1 e_1_2_9_18_1 e_1_2_9_41_1 e_1_2_9_20_1 e_1_2_9_62_1 e_1_2_9_22_1 e_1_2_9_45_1 e_1_2_9_24_1 e_1_2_9_43_1 e_1_2_9_8_1 e_1_2_9_6_1 e_1_2_9_4_1 e_1_2_9_60_1 e_1_2_9_2_1 e_1_2_9_26_1 e_1_2_9_49_1 e_1_2_9_28_1 e_1_2_9_47_1 e_1_2_9_30_1 e_1_2_9_53_1 e_1_2_9_51_1 e_1_2_9_11_1 e_1_2_9_34_1 e_1_2_9_57_1 e_1_2_9_13_1 e_1_2_9_32_1 Milliman J. D. (e_1_2_9_33_1) 2011 e_1_2_9_15_1 e_1_2_9_38_1 e_1_2_9_17_1 e_1_2_9_36_1 e_1_2_9_59_1 e_1_2_9_19_1 Chen T. (e_1_2_9_7_1) 2016 e_1_2_9_42_1 e_1_2_9_63_1 e_1_2_9_40_1 e_1_2_9_61_1 e_1_2_9_21_1 e_1_2_9_46_1 e_1_2_9_23_1 e_1_2_9_44_1 e_1_2_9_5_1 e_1_2_9_3_1 e_1_2_9_9_1 e_1_2_9_25_1 e_1_2_9_27_1 e_1_2_9_48_1 e_1_2_9_29_1 Wan T. (e_1_2_9_55_1) 2012 |
References_xml | – ident: e_1_2_9_2_1 doi: 10.1016/j.epsl.2005.03.019 – ident: e_1_2_9_59_1 doi: 10.1016/j.chemgeo.2011.12.016 – ident: e_1_2_9_60_1 doi: 10.1016/j.palaeo.2024.112552 – ident: e_1_2_9_19_1 doi: 10.1016/j.quascirev.2015.12.003 – ident: e_1_2_9_6_1 doi: 10.1038/s41561‐024‐01550‐x – ident: e_1_2_9_13_1 doi: 10.1146/annurev‐earth‐050212‐124012 – ident: e_1_2_9_32_1 doi: 10.1130/G45852.1 – ident: e_1_2_9_26_1 doi: 10.1190/geo2017‐0590.1 – ident: e_1_2_9_62_1 doi: 10.1016/j.quascirev.2019.06.002 – ident: e_1_2_9_56_1 doi: 10.1016/j.margeo.2022.106857 – ident: e_1_2_9_15_1 doi: 10.1016/j.chemgeo.2024.122406 – ident: e_1_2_9_22_1 doi: 10.3390/min10050398 – ident: e_1_2_9_52_1 doi: 10.1126/science.adh9607 – ident: e_1_2_9_9_1 doi: 10.16562/j.cnki.0256‐1492.2020040101 – ident: e_1_2_9_37_1 doi: 10.1038/s41597‐023‐02902‐9 – ident: e_1_2_9_24_1 doi: 10.1038/s41467‐021‐25232‐z – ident: e_1_2_9_27_1 doi: 10.1130/B30151.1 – ident: e_1_2_9_23_1 doi: 10.1130/G45596.1 – start-page: 785 volume-title: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, (San Francisco California USA: ACM) year: 2016 ident: e_1_2_9_7_1 doi: 10.1145/2939672.2939785 – ident: e_1_2_9_14_1 doi: 10.1038/s43247‐021‐00225‐4 – ident: e_1_2_9_50_1 doi: 10.1016/j.chemgeo.2019.03.039 – ident: e_1_2_9_61_1 doi: 10.1126/sciadv.abq2007 – ident: e_1_2_9_46_1 – ident: e_1_2_9_21_1 doi: 10.5281/zenodo.15353787 – ident: e_1_2_9_57_1 – ident: e_1_2_9_39_1 doi: 10.1038/s41586‐019‐0912‐1 – ident: e_1_2_9_49_1 doi: 10.1130/RF.L001.1 – ident: e_1_2_9_58_1 doi: 10.1016/j.palaeo.2020.109691 – ident: e_1_2_9_40_1 doi: 10.1016/j.epsl.2020.116654 – ident: e_1_2_9_12_1 doi: 10.2113/0530277 – ident: e_1_2_9_16_1 doi: 10.1016/j.earscirev.2014.05.014 – ident: e_1_2_9_41_1 doi: 10.1186/1471‐2105‐12‐77 – volume-title: River discharge to the coastal ocean: A global synthesis year: 2011 ident: e_1_2_9_33_1 doi: 10.1017/CBO9780511781247 – ident: e_1_2_9_28_1 doi: 10.1029/2019JB019226 – ident: e_1_2_9_20_1 doi: 10.1029/2020GC009569 – ident: e_1_2_9_36_1 doi: 10.1016/j.gsf.2017.06.001 – ident: e_1_2_9_10_1 doi: 10.1038/ngeo777 – ident: e_1_2_9_63_1 doi: 10.3390/jmse10020142 – ident: e_1_2_9_8_1 doi: 10.1002/gdj3.193 – ident: e_1_2_9_51_1 doi: 10.5194/esurf‐4‐445‐2016 – ident: e_1_2_9_44_1 doi: 10.1130/G49430.1 – ident: e_1_2_9_48_1 doi: 10.1111/ggr.12355 – ident: e_1_2_9_35_1 doi: 10.1038/ncomms9511 – ident: e_1_2_9_31_1 doi: 10.1016/j.epsl.2017.03.001 – ident: e_1_2_9_53_1 doi: 10.1016/j.chemgeo.2012.04.021 – ident: e_1_2_9_38_1 doi: 10.1039/c4ja00024b – ident: e_1_2_9_45_1 doi: 10.1016/j.earscirev.2021.103596 – ident: e_1_2_9_54_1 doi: 10.1016/j.chemgeo.2013.01.010 – ident: e_1_2_9_30_1 doi: 10.1016/j.gr.2015.09.002 – ident: e_1_2_9_17_1 doi: 10.13278/j.cnkij.juese.20180099 – ident: e_1_2_9_18_1 doi: 10.1016/j.epsl.2024.118963 – ident: e_1_2_9_25_1 doi: 10.1007/s11434‐010‐3091‐x – ident: e_1_2_9_5_1 doi: 10.1130/b30722.1 – ident: e_1_2_9_34_1 doi: 10.1016/j.epsl.2006.04.035 – ident: e_1_2_9_43_1 doi: 10.1029/2020GL091896 – ident: e_1_2_9_4_1 doi: 10.1111/bre.12245 – volume-title: The tectonics of China:Data, maps and evolution year: 2012 ident: e_1_2_9_55_1 doi: 10.1007/978-3-642-11868-5 – ident: e_1_2_9_3_1 doi: 10.1016/j.earscirev.2019.102899 – ident: e_1_2_9_11_1 doi: 10.1016/j.earscirev.2016.10.015 – ident: e_1_2_9_42_1 doi: 10.1130/ges01237.1 – ident: e_1_2_9_29_1 doi: 10.1016/j.earscirev.2019.102946 – ident: e_1_2_9_47_1 doi: 10.1007/s12583‐017‐0769‐x |
SSID | ssj0003320807 |
Score | 2.3017886 |
Snippet | Over the past two decades, a large number of zircon U‐Pb ages from the Yangtze and Yellow River Basins have been published, yet distinguishing the sources of... |
SourceID | crossref |
SourceType | Index Database |
Title | Quantitative Provenance Analysis of the Yangtze and Yellow River Sediments Through Detrital Zircon U‐Pb Geochronology Using an XGBoost Machine Learning Algorithm |
Volume | 2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LjtNAEG2FYcMGgQDxHdUCVpZN3G2P4yV_azSDAiSQsInadncy0mBHwRbSLBBH4A4ciDtwEqo_7nhQkAY2ltVyylbqqev3qpqQh5wyyWQy9EcpLf1IMO7n6Ef4qQjLJGcjIRLVnHz8-iCbRoezeDYY_OyxltomD4qznX0l_6NVXEO9qi7Zf9CsE4oLeI_6xStqGK8X0vGblle6SUzRf8abGjcu3QHQnzSiHMs5r5bNmSkUzFW15Yv3VvExcKcoT0yP28Se1_NcnbClGiQ_nmwwVPamjg0xzlUCvVDDdM3cJsM2wP1h9uppXX9u1CFGK-W0Hrl0y-myRmmrT39xgZeiXnc4sVOHVoETc9qJsa136_Y8bSBrba57FniTwBGJWp38nQfekVv7YB8cu5WsXunc97vAex_0Mx80dtQuu0FSxT1E98PUdcSONbvD0x6Q2U67MaRq7Kp6yWGmq5Nsax87TsAfZtORGXUZn6aL_q8vkcsU4xZlKY6_bpN-jNGhaeF332mbMVDA476AnpvU83cm18hVqyV4YlB3nQxEdYP86CMOtoiDDnFQS0DEgUUcoOrAIA404sAhDizioEMcGMTB9Ne37-MczmENNNZQGlisgQUJdFgDh7WbZPLyxeRZ5ttjPvwCY2U_kgK9JCZLIVmB4UvKeRjLME4KPkxLNKhhWiQFE7TkeSgoHaVxgUblgMpccBkm7BbZq-pK3CaQJjpG4CWGjVFUxLxIwihKBdVzI2V-hzzq_tTF2gxzWezS3t0LPnePXNni8j7ZazateIA-apPv69zOvtb-bwSzlb8 |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantitative+Provenance+Analysis+of+the+Yangtze+and+Yellow+River+Sediments+Through+Detrital+Zircon+U%E2%80%90Pb+Geochronology+Using+an+XGBoost+Machine+Learning+Algorithm&rft.jtitle=Journal+of+geophysical+research.+Machine+learning+and+computation&rft.au=Huang%2C+X.+T.&rft.au=Guo%2C+Y.+L.&rft.au=Wang%2C+P.&rft.au=Hohl%2C+S.+V.&rft.date=2025-09-01&rft.issn=2993-5210&rft.eissn=2993-5210&rft.volume=2&rft.issue=3&rft_id=info:doi/10.1029%2F2025JH000763&rft.externalDBID=n%2Fa&rft.externalDocID=10_1029_2025JH000763 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2993-5210&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2993-5210&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2993-5210&client=summon |