Information Extraction Method based on Dilated Convolution and Character-Enhanced Word Embedding

With the establishment and application of various kinds of knowledge graphs, information extraction has become one of the most important tasks in natural language processing. Due to the complexity of the Chinese language, traditional pipeline and joint extraction methods cannot solve entity confusio...

Full description

Saved in:
Bibliographic Details
Published in2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) pp. 138 - 143
Main Authors He, Zhaorong, Luo, Xiaonan, Zhong, Yanru, Jiang, Chaohao, Zhao, Leixian
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
Abstract With the establishment and application of various kinds of knowledge graphs, information extraction has become one of the most important tasks in natural language processing. Due to the complexity of the Chinese language, traditional pipeline and joint extraction methods cannot solve entity confusion in most cases when dealing with one-to-many entity relationships. In order to solve this problem, this paper presents an idea based on seq2seq decoding, which directly predicts two object entities of a triple through the subject entity, uses the method of dilated convolution and character-enhanced word embedding, and adds self-attention to optimize the encoding process of Chinese words. Not only simplifies the extraction process, but also solves the entity conflict when facing the one-to-many entity relationship extraction problem. Experiments show that our proposed method performs better on datasets than traditional relational extraction models and has better scalability.
AbstractList With the establishment and application of various kinds of knowledge graphs, information extraction has become one of the most important tasks in natural language processing. Due to the complexity of the Chinese language, traditional pipeline and joint extraction methods cannot solve entity confusion in most cases when dealing with one-to-many entity relationships. In order to solve this problem, this paper presents an idea based on seq2seq decoding, which directly predicts two object entities of a triple through the subject entity, uses the method of dilated convolution and character-enhanced word embedding, and adds self-attention to optimize the encoding process of Chinese words. Not only simplifies the extraction process, but also solves the entity conflict when facing the one-to-many entity relationship extraction problem. Experiments show that our proposed method performs better on datasets than traditional relational extraction models and has better scalability.
Author Zhao, Leixian
He, Zhaorong
Jiang, Chaohao
Luo, Xiaonan
Zhong, Yanru
Author_xml – sequence: 1
  givenname: Zhaorong
  surname: He
  fullname: He, Zhaorong
  email: zhaorong_he@outlook.com
  organization: Guilin University of Electronic Technology,Guilin,China
– sequence: 2
  givenname: Xiaonan
  surname: Luo
  fullname: Luo, Xiaonan
  email: luoxn@guet.edu.cn
  organization: Guilin University of Electronic Technology,Guilin,China
– sequence: 3
  givenname: Yanru
  surname: Zhong
  fullname: Zhong, Yanru
  email: Rosezhong@guet.edu.cn
  organization: Guilin University of Electronic Technology,Guilin,China
– sequence: 4
  givenname: Chaohao
  surname: Jiang
  fullname: Jiang, Chaohao
  email: 2009853ZII30002@student.must.edu.mo
  organization: Macau University of Science and Technology,Faculty of Information Technology,Macau,China
– sequence: 5
  givenname: Leixian
  surname: Zhao
  fullname: Zhao, Leixian
  email: LeixianZhao@outlook.com
  organization: Guilin University of Electronic Technology,Guilin,China
BookMark eNotjE1OwzAUhI0EC1o4ARLKBVL8l9heohBopSI2IJblOX4hlhIbuQHR22MKi9HMSN_MgpyGGJCQa0ZXjFFz0xwspkYaVakVp5yuKKWCnZAFU1wzLaVW5-RtE_qYJph9DEX7PSfojvER5yG6wsIeXZH7nR9hzrGJ4SuOn0cGQu4D_E4wlW0YIHQZeY3JFe1k0Tkf3i_IWQ_jHi__fUle7tvnZl1unx42ze229IzpudS9Q2V5neWE6TvZGds5CdxUFVe1UlCBEZZbYWvJJDUMTC2slsgoUGbFklz9_XpE3H0kP0E67IzgJuPiB0KAUxg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CyberC49757.2020.00031
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728184487
9781728184487
EndPage 143
ExternalDocumentID 9329414
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61562016
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-8fde7b267b2d39fc4c9bcd4a295527677a5a93b2b3b6414091a963b84e10a01b3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:59 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-8fde7b267b2d39fc4c9bcd4a295527677a5a93b2b3b6414091a963b84e10a01b3
PageCount 6
ParticipantIDs ieee_primary_9329414
PublicationCentury 2000
PublicationDate 2020-Oct.
PublicationDateYYYYMMDD 2020-10-01
PublicationDate_xml – month: 10
  year: 2020
  text: 2020-Oct.
PublicationDecade 2020
PublicationTitle 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)
PublicationTitleAbbrev CYBERC
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7382786
Snippet With the establishment and application of various kinds of knowledge graphs, information extraction has become one of the most important tasks in natural...
SourceID ieee
SourceType Publisher
StartPage 138
SubjectTerms character-enhanced word embedding
Chinese natural language processing
Computational modeling
Convolution
Data mining
dilated convolution
Information extraction
Information retrieval
Natural language processing
self-attention
Stability analysis
Task analysis
Title Information Extraction Method based on Dilated Convolution and Character-Enhanced Word Embedding
URI https://ieeexplore.ieee.org/document/9329414
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJyZALeJbHhhxG8eOHc8lVYVUxEBFt-KvigoIqEoR8Os5O2lBiIEhUpwhju6SnF9y7z2Ezi0XQlJqiVPSEw6AgGhKNSQkc5n1RhsdCM7jazGa8KtpNm2hiw0Xxnsfm898L-zGf_nuxa7Cp7I-rDVUdK3eAuBWc7Ua0i9NVH_wYfxywJXMJOC-NLRsJcE77odrSiwawx00Xk9X94o89laV6dnPX0qM_72eXdT9pufhm03h2UMtX3bQfUMsCoHGxXu1rBkLeBwtonGoVg7D-HLxBMtLh-Fcb81th3UJ47VyMynKh9gWgO8AmOLi2XgX5umiybC4HYxIY59AFoAaKpLPnZcmFbA5puaWW2Ws4zpVQXVNSKkzrZhJDTOCB90rquFpNDn3NNEJNWwftcuX0h8grIMqu5E5c4JxL2iuLbNCpYB15hrekIeoE6Ize60VMmZNYI7-PnyMtkN-6pa4E9Sulit_CqW9Mmcxp19D0Kb8
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEN009aAnNa3x2z14lJaFZZc9V5qqpfHQxt7qfjU2VmoaatRf7yzQaowHDyQsB5bMAG8H3nuD0KWmjHFCtGcEtx6FgsCThEhISGQibZVU0gmc0wHrjejtOBrX0NVGC2OtLchntuV2i3_5ZqFX7lNZG9YaouhavQW4H5FSrVXJfokv2p0PZZcdKnjEofILHGnLd93jfvRNKWCju4vS9YQlW-S5tcpVS3_-8mL87xXtoea3QA_fb6BnH9Vs1kCPlbTIhRon7_my1CzgtGgSjR1eGQzj69kcFpgGw7neqhsPywzGa-9mL8meCmIAfoDSFCcvyho3TxONusmw0_OqBgreDOqG3IunxnIVMNhMKKaaaqG0oTIQzneNcS4jKUIVqFAx6pyviITnUcXUEl_6RIUHqJ4tMnuIsHS-7IrHoWEhtYzEUoeaiQCqnamEd-QRarjoTF5Lj4xJFZjjvw9foO3eMO1P-jeDuxO043JVEuROUT1fruwZAH2uzov8fgEoNapF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+International+Conference+on+Cyber-Enabled+Distributed+Computing+and+Knowledge+Discovery+%28CyberC%29&rft.atitle=Information+Extraction+Method+based+on+Dilated+Convolution+and+Character-Enhanced+Word+Embedding&rft.au=He%2C+Zhaorong&rft.au=Luo%2C+Xiaonan&rft.au=Zhong%2C+Yanru&rft.au=Jiang%2C+Chaohao&rft.date=2020-10-01&rft.pub=IEEE&rft.spage=138&rft.epage=143&rft_id=info:doi/10.1109%2FCyberC49757.2020.00031&rft.externalDocID=9329414