Large language model enhanced corpus of CO2 reduction electrocatalysts and synthesis procedures

CO 2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natur...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 11; no. 1; pp. 347 - 12
Main Authors Chen, Xueqing, Gao, Yang, Wang, Ludi, Cui, Wenjuan, Huang, Jiamin, Du, Yi, Wang, Bin
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 06.04.2024
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
Abstract CO 2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO 2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO 2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.
AbstractList CO 2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO 2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO 2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.
Abstract CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.
CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.
CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts from domain literature can help scientists find new and effective electrocatalysts. Herein, we used various advanced machine learning, natural language processing techniques and large language models (LLMs) approaches to extract relevant information about the CO2 electrocatalytic reduction process from scientific literature. By applying the extraction pipeline, we present an open-source corpus for electrocatalytic CO2 reduction. The database contains two types of corpus: (1) the benchmark corpus, which is a collection of 6,985 records extracted from 1,081 publications by catalysis postgraduates; and (2) the extended corpus, which consists of content extracted from 5,941 documents using traditional NLP techniques and LLMs techniques. The Extended Corpus I and II contain 77,016 and 30,283 records, respectively. Furthermore, several domain literature fine-tuned LLMs were developed. Overall, this work will contribute to the exploration of new and effective electrocatalysts by leveraging information from domain literature using cutting-edge computer techniques.
ArticleNumber 347
Author Chen, Xueqing
Huang, Jiamin
Wang, Ludi
Gao, Yang
Du, Yi
Cui, Wenjuan
Wang, Bin
Author_xml – sequence: 1
  givenname: Xueqing
  orcidid: 0009-0008-8926-9626
  surname: Chen
  fullname: Chen, Xueqing
  organization: Laboratory of Big Data Knowledge, Computer Network Information Center, Chinese Academy of Sciences, University of Chinese Academy of Sciences
– sequence: 2
  givenname: Yang
  orcidid: 0000-0002-3451-1904
  surname: Gao
  fullname: Gao, Yang
  organization: CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology (NCNST)
– sequence: 3
  givenname: Ludi
  orcidid: 0000-0002-9346-6250
  surname: Wang
  fullname: Wang, Ludi
  organization: Laboratory of Big Data Knowledge, Computer Network Information Center, Chinese Academy of Sciences
– sequence: 4
  givenname: Wenjuan
  orcidid: 0000-0002-1858-8194
  surname: Cui
  fullname: Cui, Wenjuan
  organization: Laboratory of Big Data Knowledge, Computer Network Information Center, Chinese Academy of Sciences
– sequence: 5
  givenname: Jiamin
  surname: Huang
  fullname: Huang, Jiamin
  organization: CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology (NCNST)
– sequence: 6
  givenname: Yi
  orcidid: 0000-0003-3121-8937
  surname: Du
  fullname: Du, Yi
  email: duyi@cnic.cn
  organization: Laboratory of Big Data Knowledge, Computer Network Information Center, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Hangzhou Institute for Advanced Study, UCAS
– sequence: 7
  givenname: Bin
  orcidid: 0000-0001-9576-2646
  surname: Wang
  fullname: Wang, Bin
  email: wangb@nanoctr.cn
  organization: CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology (NCNST)
BookMark eNp9kk1r3DAQhk1JoWmaP9CToJde3Iy-LOlUytKPwEIu7VnI0njXi1faSnZh_32VOLRNDj1p0DzzMAzv6-YipohN85bCBwpc3xRBpVEtMNECpxpa86K5ZCBZK0THL_6pXzXXpRwAgHIBUsFlY7cu75BMLu4WV4tjCjgRjHsXPQbiUz4thaSBbO4YyRgWP48pEpzQzzl5N7vpXOZCXAyknOO8xzIWcqqtymYsb5qXg5sKXj--V82PL5-_b76127uvt5tP29aLzsytVMx1dSOhaBiUAGMoDyhQSiaFoIFx0AoQOINBYRd6ximntO-p6GSvgV81t6s3JHewpzweXT7b5Eb78JHyzro8j35Ca4zskOKgeq0F9MZpI7liPfM96qCH6vq4uk5Lf8TgMc7ZTU-kTztx3Ntd-mVp3VtrLqrh_aMhp58Lltkex-JxqmfGtBTLgQsmuspW9N0z9JCWHOut7iluOO0UrRRbKZ9TKRmHP9tQsPchsGsIbA2BfQiBNXWIr0OlwnGH-a_6P1O_AUQHtb0
Cites_doi 10.1038/s41467-020-17266-6
10.1038/s41586-020-2242-8
10.1038/s41597-019-0224-1
10.1038/s41597-020-00602-2
10.1093/bioinformatics/btp535
10.1038/s41597-023-02089-z
10.1039/C3CS60323G
10.1023/A:1010933404324
10.1002/adma.201802066
10.1021/acscatal.3c00759
10.1021/acs.chemmater.1c02961
10.1038/s41586-018-0337-2
10.57760/sciencedb.13290
10.57760/sciencedb.13292
10.1038/s41597-022-01321-6
10.1162/neco.1997.9.8.1735
10.1038/s41578-022-00466-5
10.57760/sciencedb.13293
10.1038/s41524-019-0204-1
10.1021/acs.chemmater.0c02553
10.1021/acs.jcim.6b00207
10.1038/s41560-019-0450-y
10.1021/acs.jcim.0c00199
10.1186/s13054-023-04393-x
10.18653/v1/P16-1101
10.18653/v1/N16-1030
10.1007/11875741_11
10.18653/v1/2023.ijcnlp-main.45
10.1038/s41467-024-45914-8
10.1021/jacs.3c05819
10.18653/v1/D19-1371
10.18653/v1/P16-2067
10.7759/cureus.35179
10.3115/1220575.1220634
10.18653/v1/D15-1162
ContentType Journal Article
Copyright The Author(s) 2024
The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2024. The Author(s).
Copyright_xml – notice: The Author(s) 2024
– notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2024. The Author(s).
DBID C6C
AAYXX
CITATION
3V.
7X7
7XB
88E
8FE
8FH
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
LK8
M0S
M1P
M7P
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
7X8
5PM
DOA
DOI 10.1038/s41597-024-03180-9
DatabaseName Springer Nature Link
CrossRef
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Biological Sciences
ProQuest Health & Medical Collection
Medical Database
Biological Science Database
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
ProQuest Central Student
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Health & Medical Research Collection
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList


MEDLINE - Academic
CrossRef
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: BENPR
  name: ProQuest Central (New)
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2052-4463
EndPage 12
ExternalDocumentID oai_doaj_org_article_9956e1ef7b8840b9a895372b2cbe8d8f
PMC10998834
10_1038_s41597_024_03180_9
GrantInformation_xml – fundername: National Natural Science Foundation of China (National Science Foundation of China)
  grantid: T2322027
  funderid: 501100001809
– fundername: the National Key Research and Development Plan of China under Grant No. 2022YFF0711900
– fundername: the National Key Research and Development Plan of China under Grant No. 2021YFA1202802 the CAS Pioneer Hundred Talents Program
– fundername: the National Key Research and Development Plan of China under Grant No.2022YFF0712200 Information Science Database in National Basic Science Data Center under Grant No.NBSDC-DB-25
– fundername: Youth Innovation Promotion Association of the Chinese Academy of Sciences (Youth Innovation Promotion Association CAS)
  funderid: 501100004739
– fundername: the Young Elite Scientists Sponsorship Program by Beijing Association for Science and Technology (BYESS2023410)
GroupedDBID 0R~
3V.
53G
5VS
7X7
88E
8FE
8FH
8FI
8FJ
AAJSJ
ABUWG
ACGFS
ACSFO
ACSMW
ADBBV
ADRAZ
AFKRA
AGHDO
AJTQC
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
BVXVI
C6C
CCPQU
DIK
EBLON
EBS
EJD
FYUFA
GROUPED_DOAJ
HCIFZ
HMCUK
HYE
KQ8
LK8
M1P
M48
M7P
M~E
NAO
OK1
PGMZT
PIMPY
PQQKQ
PROAC
PSQYO
RNT
RNTTT
RPM
SNYQT
UKHRP
AASML
AAYXX
CITATION
PHGZM
PHGZT
7XB
8FK
AARCD
AZQEC
DWQXO
GNUQQ
K9.
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQUKI
PRINS
7X8
5PM
PUEGO
ID FETCH-LOGICAL-c469t-572a6057471df7409913de4e5525441d230870e0320f7e6db231311bb1465b803
IEDL.DBID C6C
ISSN 2052-4463
IngestDate Wed Aug 27 01:25:06 EDT 2025
Thu Aug 21 18:34:42 EDT 2025
Fri Jul 11 02:45:30 EDT 2025
Wed Aug 13 09:51:15 EDT 2025
Tue Jul 01 00:39:02 EDT 2025
Fri Feb 21 02:39:06 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c469t-572a6057471df7409913de4e5525441d230870e0320f7e6db231311bb1465b803
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ORCID 0009-0008-8926-9626
0000-0002-1858-8194
0000-0002-9346-6250
0000-0001-9576-2646
0000-0003-3121-8937
0000-0002-3451-1904
OpenAccessLink https://www.nature.com/articles/s41597-024-03180-9
PQID 3033931671
PQPubID 2041912
PageCount 12
ParticipantIDs doaj_primary_oai_doaj_org_article_9956e1ef7b8840b9a895372b2cbe8d8f
pubmedcentral_primary_oai_pubmedcentral_nih_gov_10998834
proquest_miscellaneous_3034246883
proquest_journals_3033931671
crossref_primary_10_1038_s41597_024_03180_9
springer_journals_10_1038_s41597_024_03180_9
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-04-06
PublicationDateYYYYMMDD 2024-04-06
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-06
  day: 06
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
PublicationTitle Scientific data
PublicationTitleAbbrev Sci Data
PublicationYear 2024
Publisher Nature Publishing Group UK
Nature Publishing Group
Nature Portfolio
Publisher_xml – name: Nature Publishing Group UK
– name: Nature Publishing Group
– name: Nature Portfolio
References Wang (CR41) 2023
CR39
CR38
Zhong (CR2) 2020; 581
Butler, Davies, Cartwright, Isayev, Walsh (CR6) 2018; 559
CR34
CR33
CR31
CR30
Swain, Cole (CR37) 2016; 56
Huo (CR23) 2019; 5
Pedregosa (CR49) 2011; 12
Gao, Wang, Chen, Du, Wang (CR3) 2023; 13
Wang (CR40) 2023
CR48
CR47
CR46
CR45
CR44
CR43
He (CR8) 2020; 32
Brown (CR28) 2020; 33
Kononova (CR11) 2019; 6
Hiszpanski (CR20) 2020; 60
Breiman (CR25) 2001; 45
Paula (CR10) 2022; 34
Huang, Cole (CR9) 2020; 7
Blei, Ng, Jordan (CR24) 2003; 3
Zheng, Jiang, Wang (CR5) 2018; 30
Hettne (CR35) 2009; 25
Wang (CR12) 2023; 10
CR19
CR18
CR17
CR15
CR14
CR13
Peng (CR7) 2022; 7
Birdja (CR1) 2019; 4
Radford (CR29) 2019; 1
Qiao, Liu, Hong, Zhang (CR4) 2014; 43
Azamfirei, Kudchadkar, Fackler (CR16) 2023; 27
Vaucher (CR36) 2020; 11
CR27
CR26
Cruse (CR22) 2022; 9
Hochreiter, Schmidhuber (CR32) 1997; 9
CR21
Wang (CR42) 2023
3180_CR34
DM Blei (3180_CR24) 2003; 3
3180_CR33
L Wang (3180_CR12) 2023; 10
AM Hiszpanski (3180_CR20) 2020; 60
3180_CR39
3180_CR38
3180_CR31
3180_CR30
K Cruse (3180_CR22) 2022; 9
J Qiao (3180_CR4) 2014; 43
T He (3180_CR8) 2020; 32
3180_CR47
3180_CR46
3180_CR45
3180_CR44
A Radford (3180_CR29) 2019; 1
3180_CR48
MC Swain (3180_CR37) 2016; 56
3180_CR43
S Hochreiter (3180_CR32) 1997; 9
KT Butler (3180_CR6) 2018; 559
L Wang (3180_CR42) 2023
L Breiman (3180_CR25) 2001; 45
YY Birdja (3180_CR1) 2019; 4
M Zhong (3180_CR2) 2020; 581
F Pedregosa (3180_CR49) 2011; 12
3180_CR14
3180_CR13
3180_CR18
3180_CR17
3180_CR15
T Brown (3180_CR28) 2020; 33
AJ Paula (3180_CR10) 2022; 34
Y Gao (3180_CR3) 2023; 13
H Huo (3180_CR23) 2019; 5
L Wang (3180_CR41) 2023
O Kononova (3180_CR11) 2019; 6
KM Hettne (3180_CR35) 2009; 25
L Wang (3180_CR40) 2023
3180_CR27
3180_CR26
R Azamfirei (3180_CR16) 2023; 27
3180_CR21
T Zheng (3180_CR5) 2018; 30
S Huang (3180_CR9) 2020; 7
AC Vaucher (3180_CR36) 2020; 11
J Peng (3180_CR7) 2022; 7
3180_CR19
References_xml – volume: 11
  year: 2020
  ident: CR36
  article-title: Automated extraction of chemical synthesis actions from experimental procedures
  publication-title: Nat. Commun.
  doi: 10.1038/s41467-020-17266-6
– ident: CR45
– ident: CR39
– volume: 581
  start-page: 178
  year: 2020
  end-page: 183
  ident: CR2
  article-title: Accelerated discovery of CO electrocatalysts using active machine learning
  publication-title: Nature
  doi: 10.1038/s41586-020-2242-8
– volume: 6
  year: 2019
  ident: CR11
  article-title: Text-mined dataset of inorganic materials synthesis recipes
  publication-title: Sci. data
  doi: 10.1038/s41597-019-0224-1
– volume: 7
  year: 2020
  ident: CR9
  article-title: A database of battery materials auto-generated using ChemDataExtractor
  publication-title: Sci. Data
  doi: 10.1038/s41597-020-00602-2
– ident: CR21
– ident: CR46
– ident: CR19
– volume: 25
  start-page: 2983
  year: 2009
  end-page: 2991
  ident: CR35
  article-title: A dictionary to identify small molecules and drugs in free text
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp535
– ident: CR15
– volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: CR49
  article-title: Scikit-learn: Machine learning in python
  publication-title: J. Mach. Learn. Res.
– volume: 10
  year: 2023
  ident: CR12
  article-title: A corpus of CO2 electrocatalytic reduction process extracted from the scientific literature
  publication-title: Sci. Data
  doi: 10.1038/s41597-023-02089-z
– volume: 33
  start-page: 1877
  year: 2020
  end-page: 1901
  ident: CR28
  article-title: Language models are few-shot learners
  publication-title: Advances in neural information processing systems
– volume: 43
  start-page: 631
  year: 2014
  end-page: 675
  ident: CR4
  article-title: A review of catalysts for the electroreduction of carbon dioxide to produce low-carbon fuels
  publication-title: Chem. Soc. Rev.
  doi: 10.1039/C3CS60323G
– volume: 45
  start-page: 5
  year: 2001
  end-page: 32
  ident: CR25
  article-title: Random forests
  publication-title: Mach. Learn.
  doi: 10.1023/A:1010933404324
– volume: 30
  start-page: 1802066
  year: 2018
  ident: CR5
  article-title: Recent advances in electrochemical CO2-to-CO conversion on heterogeneous catalysts
  publication-title: Adv. Mater.
  doi: 10.1002/adma.201802066
– volume: 13
  start-page: 8525
  year: 2023
  end-page: 8534
  ident: CR3
  article-title: Revisiting electrocatalyst design by a knowledge graph of Cu-based catalysts for CO reduction
  publication-title: ACS Catal.
  doi: 10.1021/acscatal.3c00759
– ident: CR26
– volume: 34
  start-page: 979
  year: 2022
  end-page: 990
  ident: CR10
  article-title: Machine learning and natural language processing enable a data-oriented experimental design approach for producing biochar and hydrochar from biomass
  publication-title: Chem. Mater.
  doi: 10.1021/acs.chemmater.1c02961
– volume: 559
  start-page: 547
  year: 2018
  end-page: 555
  ident: CR6
  article-title: Machine learning for molecular and materials science
  publication-title: Nature
  doi: 10.1038/s41586-018-0337-2
– year: 2023
  ident: CR40
  publication-title: ScienceDB
  doi: 10.57760/sciencedb.13290
– ident: CR18
– ident: CR43
– ident: CR47
– year: 2023
  ident: CR41
  publication-title: ScienceDB
  doi: 10.57760/sciencedb.13292
– ident: CR14
– ident: CR30
– volume: 9
  year: 2022
  ident: CR22
  article-title: Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities
  publication-title: Sci. Data
  doi: 10.1038/s41597-022-01321-6
– volume: 9
  start-page: 1735
  year: 1997
  end-page: 1780
  ident: CR32
  article-title: J. Long short-term memory
  publication-title: Neural Comput.
  doi: 10.1162/neco.1997.9.8.1735
– ident: CR33
– volume: 7
  start-page: 991
  year: 2022
  end-page: 1009
  ident: CR7
  article-title: Human- and machine-centred designs of molecules and materials for sustainability and decarbonization
  publication-title: Nat. Rev. Mater.
  doi: 10.1038/s41578-022-00466-5
– volume: 1
  start-page: 9
  year: 2019
  ident: CR29
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAI blog
– ident: CR27
– year: 2023
  ident: CR42
  publication-title: ScienceDB
  doi: 10.57760/sciencedb.13293
– volume: 5
  year: 2019
  ident: CR23
  article-title: Semi-supervised machine-learning classification of materials synthesis procedures
  publication-title: npj Comput. Mater.
  doi: 10.1038/s41524-019-0204-1
– ident: CR44
– ident: CR48
– ident: CR38
– volume: 32
  start-page: 7861
  year: 2020
  end-page: 7873
  ident: CR8
  article-title: Similarity of precursors in solid-state synthesis as text-mined from scientific literature
  publication-title: Chem. Mater.
  doi: 10.1021/acs.chemmater.0c02553
– ident: CR17
– ident: CR31
– ident: CR13
– ident: CR34
– volume: 3
  start-page: 993
  year: 2003
  end-page: 1022
  ident: CR24
  article-title: Latent dirichlet allocation
  publication-title: J. Mach. Learn. Res.
– volume: 56
  start-page: 1894
  year: 2016
  end-page: 1904
  ident: CR37
  article-title: ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.6b00207
– volume: 4
  start-page: 732
  year: 2019
  end-page: 745
  ident: CR1
  article-title: Advances and challenges in understanding the electrocatalytic conversion of carbon dioxide to fuels
  publication-title: Nat. Energy
  doi: 10.1038/s41560-019-0450-y
– volume: 60
  start-page: 2876
  year: 2020
  end-page: 2887
  ident: CR20
  article-title: Nanomaterial synthesis insights from machine learning of scientific articles by extracting, structuring, and visualizing knowledge
  publication-title: J. Chem. Inf. Model
  doi: 10.1021/acs.jcim.0c00199
– volume: 27
  start-page: 1
  year: 2023
  end-page: 2
  ident: CR16
  article-title: Large language models and the perils of their hallucinations
  publication-title: Crit. Care
  doi: 10.1186/s13054-023-04393-x
– year: 2023
  ident: 3180_CR40
  doi: 10.57760/sciencedb.13290
– ident: 3180_CR13
– ident: 3180_CR43
  doi: 10.18653/v1/P16-1101
– volume: 7
  start-page: 991
  year: 2022
  ident: 3180_CR7
  publication-title: Nat. Rev. Mater.
  doi: 10.1038/s41578-022-00466-5
– ident: 3180_CR26
– ident: 3180_CR45
– volume: 30
  start-page: 1802066
  year: 2018
  ident: 3180_CR5
  publication-title: Adv. Mater.
  doi: 10.1002/adma.201802066
– volume: 4
  start-page: 732
  year: 2019
  ident: 3180_CR1
  publication-title: Nat. Energy
  doi: 10.1038/s41560-019-0450-y
– volume: 581
  start-page: 178
  year: 2020
  ident: 3180_CR2
  publication-title: Nature
  doi: 10.1038/s41586-020-2242-8
– volume: 9
  start-page: 1735
  year: 1997
  ident: 3180_CR32
  publication-title: Neural Comput.
  doi: 10.1162/neco.1997.9.8.1735
– volume: 6
  year: 2019
  ident: 3180_CR11
  publication-title: Sci. data
  doi: 10.1038/s41597-019-0224-1
– volume: 32
  start-page: 7861
  year: 2020
  ident: 3180_CR8
  publication-title: Chem. Mater.
  doi: 10.1021/acs.chemmater.0c02553
– ident: 3180_CR33
  doi: 10.18653/v1/N16-1030
– volume: 11
  year: 2020
  ident: 3180_CR36
  publication-title: Nat. Commun.
  doi: 10.1038/s41467-020-17266-6
– ident: 3180_CR34
  doi: 10.1007/11875741_11
– ident: 3180_CR15
  doi: 10.18653/v1/2023.ijcnlp-main.45
– volume: 559
  start-page: 547
  year: 2018
  ident: 3180_CR6
  publication-title: Nature
  doi: 10.1038/s41586-018-0337-2
– ident: 3180_CR19
  doi: 10.1038/s41467-024-45914-8
– volume: 5
  year: 2019
  ident: 3180_CR23
  publication-title: npj Comput. Mater.
  doi: 10.1038/s41524-019-0204-1
– volume: 27
  start-page: 1
  year: 2023
  ident: 3180_CR16
  publication-title: Crit. Care
  doi: 10.1186/s13054-023-04393-x
– ident: 3180_CR48
– ident: 3180_CR17
  doi: 10.1021/jacs.3c05819
– ident: 3180_CR27
– year: 2023
  ident: 3180_CR42
  doi: 10.57760/sciencedb.13293
– ident: 3180_CR31
  doi: 10.18653/v1/D19-1371
– ident: 3180_CR30
– volume: 43
  start-page: 631
  year: 2014
  ident: 3180_CR4
  publication-title: Chem. Soc. Rev.
  doi: 10.1039/C3CS60323G
– volume: 12
  start-page: 2825
  year: 2011
  ident: 3180_CR49
  publication-title: J. Mach. Learn. Res.
– volume: 7
  year: 2020
  ident: 3180_CR9
  publication-title: Sci. Data
  doi: 10.1038/s41597-020-00602-2
– volume: 33
  start-page: 1877
  year: 2020
  ident: 3180_CR28
  publication-title: Advances in neural information processing systems
– ident: 3180_CR44
  doi: 10.18653/v1/P16-2067
– ident: 3180_CR14
  doi: 10.7759/cureus.35179
– ident: 3180_CR47
  doi: 10.3115/1220575.1220634
– volume: 10
  year: 2023
  ident: 3180_CR12
  publication-title: Sci. Data
  doi: 10.1038/s41597-023-02089-z
– volume: 25
  start-page: 2983
  year: 2009
  ident: 3180_CR35
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp535
– volume: 56
  start-page: 1894
  year: 2016
  ident: 3180_CR37
  publication-title: J. Chem. Inf. Model.
  doi: 10.1021/acs.jcim.6b00207
– volume: 1
  start-page: 9
  year: 2019
  ident: 3180_CR29
  publication-title: OpenAI blog
– ident: 3180_CR18
– ident: 3180_CR39
– volume: 60
  start-page: 2876
  year: 2020
  ident: 3180_CR20
  publication-title: J. Chem. Inf. Model
  doi: 10.1021/acs.jcim.0c00199
– volume: 45
  start-page: 5
  year: 2001
  ident: 3180_CR25
  publication-title: Mach. Learn.
  doi: 10.1023/A:1010933404324
– ident: 3180_CR46
– ident: 3180_CR21
– year: 2023
  ident: 3180_CR41
  doi: 10.57760/sciencedb.13292
– volume: 3
  start-page: 993
  year: 2003
  ident: 3180_CR24
  publication-title: J. Mach. Learn. Res.
– volume: 13
  start-page: 8525
  year: 2023
  ident: 3180_CR3
  publication-title: ACS Catal.
  doi: 10.1021/acscatal.3c00759
– ident: 3180_CR38
  doi: 10.18653/v1/D15-1162
– volume: 34
  start-page: 979
  year: 2022
  ident: 3180_CR10
  publication-title: Chem. Mater.
  doi: 10.1021/acs.chemmater.1c02961
– volume: 9
  year: 2022
  ident: 3180_CR22
  publication-title: Sci. Data
  doi: 10.1038/s41597-022-01321-6
SSID ssj0001340570
Score 2.3221073
Snippet CO 2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts...
CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to catalysts...
Abstract CO2 electroreduction has garnered significant attention from both the academic and industrial communities. Extracting crucial information related to...
SourceID doaj
pubmedcentral
proquest
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 347
SubjectTerms 639/301/299/886
639/301/299/890
Algorithms
Artificial intelligence
Carbon dioxide
Catalysis
Catalysts
Chatbots
Data Descriptor
Data mining
Datasets
Humanities and Social Sciences
Information processing
Language
Large language models
Machine learning
Metadata
multidisciplinary
Natural language processing
Science
Science (multidisciplinary)
Scientists
Subject specialists
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NS-UwEB_E017EjxXrF1nYg6LFNklfk6PKiiyrXhS8haZJ0EufmPcO_vfOpH1qhcWL0FPT0jQzk5lfJvkNwG-ta1so3eQ6CJ5LX6JJ4aSYNzy41ssi6MS2f3U9ubyTf--r-w-lvmhPWE8P3A_cCZ289KUPtVWIRaxulK5EzS1vrVdOBZp90ed9AFNpdUVQIFIMp2QKoU4ieioiHuUyJz0ucj3yRImwfxRlft4j-SlRmvzPxSqsDIEjO-07vAZLvluHtcE0IzsY-KMPN8D8o93dbLESyVKxG-a7h5TsZwg3n-aRTQM7v-HsmahbSThsKIiT1nNe4iyypnMsvnQYIcbHyJKnc3NE5z_h7uLP7fllPtRRyFsEv7O8qnmDqIXgpws1AjpdCuelryriJysdJ1bAwlMp9VB7KjAliITHWpxFK6sKsQnL3bTzW8CUq6ydNNZRepUrawtpEV4HnPHb0EqfwdFiTM1TT5dhUppbKNNLwKAETJKA0Rmc0bC_PUlU1-kGKoAZFMB8pQAZ7C6EZgb7iwYds9B0yL_M4NdbM1oOpUOazk_n6RnJ5UQpkYEaCXvUoXFL9_iQOLgpoYhvygyOF3rx_vX___H2d_zxDvzgSY_xmuzC8ux57vcwNJrZ_WQFrzBTCow
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Health & Medical Collection
  dbid: 7X7
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwEB5BuXBBlIcItMhIHEAQNbGdjX1CUFFViMeFSnuz4timvSTLevfQf98Zr9NVKoGUU-woiefhefkbgLdat7ZSuit1ELyUvkaRQqVYdjy43ssq6IS2_-Pn4vxCfls2yxxwi7msctKJSVG7sacY-QmqWqHp2Hb9afW3pK5RlF3NLTTuwwOCLqOSrnbZ7mMsgsyRKp-VqYQ6ibhfEfwolyVxc1Xq2X6UYPtntubdSsk76dK0C509hkfZfGSfd_Q-hHt-eAKHWUAje5dRpN8_BfOdarzZFI9kqeUN88NlSvkzdDpX28jGwE5_cbYmAFciEcttcVJU5zpuIusGx-L1gHZivIos7Xduiz76M7g4-_r79LzM3RTKHl3gTdm0vEPfhZxQF1p063QtnJe-aQilrHacsAErTw3VQ-upzZQgKB5rUZc2VlXiORwM4-BfAFOusXbRWUdJVq6sraRFJzug3u9DL30BH6Y1NasdaIZJyW6hzI4CBilgEgWMLuALLfvtTAK8TjfG9R-T5cfQAVxf-9BahS6p1Z3SjWi55b31yqlQwNFENJOlMJo9zxTw5nYY5YeSIt3gx22aI7lcKCUKUDNizz5oPjJcXSYkbkor4pOygI8TX-zf_u8_fvn_j30FD3niULwWR3CwWW_9MZo-G_s68fcNdiQBrQ
  priority: 102
  providerName: ProQuest
– databaseName: Scholars Portal Journals: Open Access
  dbid: M48
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB2VcuGCWj5EaEFG4gCCQGI7G_uAEFRUFaJwYaXerDi2aSWULetdif33zHidolTlgJRTYiuJZ8Yzz2O_AXiudWsrpbtSB8FL6Ws0KZwUy44H13tZBZ3Y9k-_zk7m8vNZc7YDY7mjPIDxRmhH9aTmy59vfv_avEeDf7c9Mq7eRnRCxCnKZUkqWpX6FtxGz9SSoZ7mcD-tuQgKT6p8dubmrhP_lGj8J7Hn9Z2T19KnySsd78HdHE6yD1v578OOH-7BfjbYyF5kVumX98F8oT3fbFyfZKkEDvPDedoCwBCEXq4jWwR29I2zJRG6kshYLpOTVnk2cRVZNzgWNwPGjfEisuT_3Box-wOYH3_6fnRS5uoKZY-QeFU2Le8QyxAodaFFmKdr4bz0TUOsZbXjxBVYeSqwHlpPZacEUfNYi3NrY1UlHsLusBj8I2DKNdbOOuso6cqVtZW0CLoD-oE-9NIX8GocU3O5JdEwKfktlNlKwKAETJKA0QV8pGG_akkE2OnGYvnDZHsydCDX1z60ViFEtbpTuhEtt7y3XjkVCjgchWZGpTLoroWmo_91Ac-uHqM9UZKkG_xindpILmdKiQLURNiTD5o-GS7OEzM3pRmxpyzg9agXf9_-7z9-_H_ND-AOTxqL1-wQdlfLtX-CodHKPk36_gcJuwkh
  priority: 102
  providerName: Scholars Portal
Title Large language model enhanced corpus of CO2 reduction electrocatalysts and synthesis procedures
URI https://link.springer.com/article/10.1038/s41597-024-03180-9
https://www.proquest.com/docview/3033931671
https://www.proquest.com/docview/3034246883
https://pubmed.ncbi.nlm.nih.gov/PMC10998834
https://doaj.org/article/9956e1ef7b8840b9a895372b2cbe8d8f
Volume 11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7yuPRSmj6o22RRoYeW1tSWZFs6bpaEsDRpaRvYm7BsKcnFG1a7h_z7zmjtBIf0UDA2WBKWNRrNS_oG4KPWlc2UrlPtBU-ly5GlcFFMa-7bxsnM64i2f35Rnl3K-aJY7AAfzsLETfsR0jIu08PusG8BBQ3hhnKZ0jTMUr0L-wTdTrN6Vs4e_CqCVJCsPx-TCfVE05EMilD9I_3y8e7IRyHSKHlOX8DzXmVk020nD2DHdS_hoGfKwD71yNGfX4H5Tvu62eCDZDHNDXPddQzzMzQ0bzeBLT2b_eBsRaCtRBbWp8KJnpy7sA6s7loW7jrUDcNNYFHGtThW4TVcnp78mZ2lfQaFtEGzd50WFa_RXiHDs_UVmnI6F62TrigImSxvOeEBZo6SqPvKUWopQfA71uL6WViViTew1y079xaYagtry9q2FFjlytpMWjSsPa71jW-kS-DLMKbmdguUYWKAWyizpYBBCphIAaMTOKZhv69JINfxxXJ1ZXqiGzp063LnK6vQDLW6VroQFbe8sU61yidwOBDN9JwXDIpkoel4f57Ah_ti5BkKhNSdW25iHcllqZRIQI2IPerQuKS7uY7o2xRKxJYyga_DvHj4-r__-N3_VX8Pz3icsXiVh7C3Xm3cEao_azuB3WpRTWB_Op3_nuPz-OTi569J5IJJdCng_Vyqv4j6BV8
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFH8a3QEuiPEhwgYYCSQQREtsp7EPCLGxqWNdQWiTdjNx4rBd0q5uhfpP8Tfynpus6iS4TcopdhLH79vP_j2A11rnNlG6iHUteCxdiiKFSjEueF2VTia1Dmj7J6P-4Ex-Pc_ON-BPdxaGtlV2OjEo6mpc0hr5LqpaoenYdvppchVT1SjKrnYlNJZscewWvzFk8x-PviB933B-eHC6P4jbqgJxiaHgLM5yXqAPT8FYVecY3uhUVE66LCO0rrTihJGXOCosXueOyi0JgqSxFnVKZlUi8L13YFMKDGV6sLl3MPr-Y7WqI8gBStrTOYlQux4tJAGechmT_CSxXrOAoVDAmnd7c2_mjQRtsHuHD-B-67Cyz0sO24IN1zyErVYlePa2xa1-9wjMkHaVs24FlIUiO8w1F2GTAcMwdzL3bFyz_W-cTQkylpiCtYV4wjrSws88K5qK-UWDnqm_9CxY2Go-df4xnN3KTD-BXjNu3FNgqsqs7Re2orQuV9Ym0mJYX6OlKetSugjed3NqJkuYDhPS60KZJQUMUsAEChgdwR5N-3VPgtgON8bTX6aVWENHfl3q6twqDIKtLpTORM4tL61Tlaoj2OmIZlq592bFpRG8um5GiaU0TNG48Tz0kVz2lRIRqDVirw1ovaW5vAjY35TIxCdlBB86vlh9_d9__Oz_g30JdwenJ0MzPBodb8M9HrgVr_4O9GbTuXuOjtfMvmi5ncHP2xawv2byPDA
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB6VIiEuiPIQgQJGAgkE0SZ2srEPCEHLqqWlcKDS3tw4sWkvybLeFdq_xq9jxpt0lUpwq5RTno7n7Rl_A_BSqcIkUpWxcoLHmU1RpFApxiV3dWWzxKmAtv_1ZHxwmn2Z5tMt-NPvhaGyyl4nBkVdtxWtkY9Q1QpF27bTkevKIr7vTz7MfsXUQYoyrX07jTWLHNnVbwzf_PvDfaT1K84nn3_sHcRdh4G4wrBwEecFL9Gfp8CsdgWGOioVtc1snhNyV1pzwstLLDUZd4Wl1kuC4GmMQf2SG5kIfO8NuFmIPCUZK6bFZn1HkCuUdPt0EiFHHm0lQZ_yLCZJSmI1sIWhZcDAz71apXklVRss4OQu3OlcV_ZxzWs7sGWbe7DTKQfPXncI1m_ugz6m-nLWr4Wy0G6H2eY8lBswDHhnS89ax_a-cTYn8FhiD9a15AkrSiu_8KxsauZXDfqo_sKzYGvr5dz6B3B6LfP8ELabtrGPgMk6N2ZcmpoSvFwak2QGA3yHNqdyVWYjeNvPqZ6tATt0SLQLqdcU0EgBHSigVQSfaNov7ySw7XCinf_Unexq2vxrU-sKIzEcNqqUKhcFN7wyVtbSRbDbE013GsDrDb9G8OLyMsouJWTKxrbLcE_Gs7GUIgI5IPZgQMMrzcV5QAGnlCY-mUXwrueLzdf__ceP_z_Y53ALxUofH54cPYHbPDArHuNd2F7Ml_YpemAL8yywOoOz65atv9DPPwA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Large+language+model+enhanced+corpus+of+CO2+reduction+electrocatalysts+and+synthesis+procedures&rft.jtitle=Scientific+data&rft.au=Chen%2C+Xueqing&rft.au=Gao%2C+Yang&rft.au=Wang%2C+Ludi&rft.au=Cui%2C+Wenjuan&rft.date=2024-04-06&rft.pub=Nature+Publishing+Group+UK&rft.eissn=2052-4463&rft.volume=11&rft.issue=1&rft_id=info:doi/10.1038%2Fs41597-024-03180-9&rft.externalDocID=10_1038_s41597_024_03180_9
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2052-4463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2052-4463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2052-4463&client=summon