An Identification Method of Question Subjects Based on Word Embedding and LSTM

Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question s...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 1631; no. 1; pp. 12120 - 12130
Main Authors Gao, M X, Fu, Z X
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.09.2020
Subjects
Online AccessGet full text
ISSN1742-6588
1742-6596
DOI10.1088/1742-6596/1631/1/012120

Cover

Loading…
Abstract Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question subjects based on word embedding and LSTM (IQS-WE-L), and uses question set on the MadSci website for experimentation, which has three subjects. We firstly use the Word2vec to train the Wikipedia database to generate a dictionary. Then based on word vectors, we propose four feature extraction methods: W2V, W2V-TFIDF, W2V-c-TFIDF and W2V-c, which formalizes the text features into vectors through word embedding and other features. Finally, we build an LSTM network for classification training to identify the subject of the question and quantitative evaluate effect of four feature extraction methods we proposed. Experimental data shows that the method proposed in this paper can effectively identify the subject of the question. When classifying the subject of the question, the F1 value can reach a maximum of 0.9339.
AbstractList Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question subjects based on word embedding and LSTM (IQS-WE-L), and uses question set on the MadSci website for experimentation, which has three subjects. We firstly use the Word2vec to train the Wikipedia database to generate a dictionary. Then based on word vectors, we propose four feature extraction methods: W2V, W2V-TFIDF, W2V-c-TFIDF and W2V-c, which formalizes the text features into vectors through word embedding and other features. Finally, we build an LSTM network for classification training to identify the subject of the question and quantitative evaluate effect of four feature extraction methods we proposed. Experimental data shows that the method proposed in this paper can effectively identify the subject of the question. When classifying the subject of the question, the F1 value can reach a maximum of 0.9339.
Author Fu, Z X
Gao, M X
Author_xml – sequence: 1
  givenname: M X
  surname: Gao
  fullname: Gao, M X
  email: gaomx@bjut.edu.cn
  organization: Department of Information Science, Beijing University of Technology , China
– sequence: 2
  givenname: Z X
  surname: Fu
  fullname: Fu, Z X
  organization: Department of Information Science, Beijing University of Technology , China
BookMark eNqFkN9LwzAQx4NMcJv-DQZ8E2ovSdu0j3NMnWz-YBMfQ5qk2uGS2nQP_ve2ViaC4L3ccfe9X58RGlhnDUKnBC4IpGlIeESDJM6SkCSMhCQEQgmFAzTcVwb7OE2P0Mj7DQBrjQ_R3cTiuTa2KYtSyaZ0Fi9N8-o0dgV-3Bn_lVrt8o1RjceX0pu2ZPGzqzWebXOjdWlfsLQaL1br5TE6LOSbNyfffoyermbr6U2wuL-eTyeLQFEeQSBVnkGSaqMZQEwznsfAilxmkYxYxmjKlIpUQnkKoGgsSU4iTmPCCFcFxIqN0Vk_t6rde3el2LhdbduVgsYcsiiL2__GiPcqVTvva1OIqi63sv4QBEQHT3RYRIdIdPAEET28tvO87yxd9TP69mG6-i0UlS5aMftD_N-KT_KwfYE
ContentType Journal Article
Copyright Published under licence by IOP Publishing Ltd
2020. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Published under licence by IOP Publishing Ltd
– notice: 2020. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID O3W
TSCCA
AAYXX
CITATION
8FD
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
H8D
HCIFZ
L7M
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOI 10.1088/1742-6596/1631/1/012120
DatabaseName Institute of Physics Open Access Journal Titles
IOPscience (Open Access)
CrossRef
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials - QC
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
Aerospace Database
SciTech Collection (ProQuest)
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
Aerospace Database
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
Advanced Technologies Database with Aerospace
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: O3W
  name: Institute of Physics Open Access Journal Titles
  url: http://iopscience.iop.org/
  sourceTypes:
    Enrichment Source
    Publisher
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
DocumentTitleAlternate An Identification Method of Question Subjects Based on Word Embedding and LSTM
EISSN 1742-6596
ExternalDocumentID 10_1088_1742_6596_1631_1_012120
JPCS_1631_1_012120
GroupedDBID 1JI
29L
2WC
4.4
5B3
5GY
5PX
5VS
7.Q
AAJIO
AAJKP
ABHWH
ACAFW
ACHIP
AEFHF
AEJGL
AFKRA
AFYNE
AIYBF
AKPSB
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ASPBG
ATQHT
AVWKF
AZFZN
BENPR
BGLVJ
CCPQU
CEBXE
CJUJL
CRLBU
CS3
DU5
E3Z
EBS
EDWGO
EQZZN
F5P
FRP
GROUPED_DOAJ
GX1
HCIFZ
HH5
IJHAN
IOP
IZVLO
J9A
KNG
KQ8
LAP
N5L
N9A
O3W
OK1
P2P
PIMPY
PJBAE
RIN
RNS
RO9
ROL
SY9
T37
TR2
TSCCA
UCJ
W28
XSB
~02
AAYXX
CITATION
OVT
PHGZM
PHGZT
8FD
8FE
8FG
ABUWG
AZQEC
DWQXO
H8D
L7M
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c2740-acb9068ded3005297b503fba94a4393283cc4c627800c25a1b147251317cf05c3
IEDL.DBID BENPR
ISSN 1742-6588
IngestDate Mon Jul 14 10:38:34 EDT 2025
Tue Jul 01 03:13:22 EDT 2025
Wed Aug 21 03:33:31 EDT 2024
Thu Jan 07 14:56:21 EST 2021
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2740-acb9068ded3005297b503fba94a4393283cc4c627800c25a1b147251317cf05c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2570949533?pq-origsite=%requestingapplication%
PQID 2570949533
PQPubID 4998668
PageCount 11
ParticipantIDs crossref_primary_10_1088_1742_6596_1631_1_012120
iop_journals_10_1088_1742_6596_1631_1_012120
proquest_journals_2570949533
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20200901
PublicationDateYYYYMMDD 2020-09-01
PublicationDate_xml – month: 09
  year: 2020
  text: 20200901
  day: 01
PublicationDecade 2020
PublicationPlace Bristol
PublicationPlace_xml – name: Bristol
PublicationTitle Journal of physics. Conference series
PublicationTitleAlternate J. Phys.: Conf. Ser
PublicationYear 2020
Publisher IOP Publishing
Publisher_xml – name: IOP Publishing
References Yan (JPCS_1631_1_012120bib8) 2019; 20
Bengio (JPCS_1631_1_012120bib11) 2003; 3
Yang (JPCS_1631_1_012120bib7) 2020; 40
Fan (JPCS_1631_1_012120bib2) 2012; 28
Hinton (JPCS_1631_1_012120bib10) 1986; 1
Mikolov (JPCS_1631_1_012120bib12) 2013
Wu (JPCS_1631_1_012120bib6) 2020; 29
Lei (JPCS_1631_1_012120bib3) 2018; 8
Li (JPCS_1631_1_012120bib5) 2019; 16
Kang (JPCS_1631_1_012120bib9) 2019; 11
Huang (JPCS_1631_1_012120bib1) 2020; 23
Zhu (JPCS_1631_1_012120bib4) 2020; 34
References_xml – volume: 23
  start-page: 1
  year: 2020
  ident: JPCS_1631_1_012120bib1
  article-title: Research on short text classification based on bag of words and TF-IDF
  publication-title: Software Engineering
– volume: 28
  start-page: 47
  year: 2012
  ident: JPCS_1631_1_012120bib2
  article-title: Research on Chinese short text classification based on Wikipedia
  publication-title: New Technol. Lib. Inf Ser.
– volume: 20
  start-page: 44
  year: 2019
  ident: JPCS_1631_1_012120bib8
  article-title: Research on social media text sentiment analysis based on machine learning
  publication-title: China Computer & Communication
– volume: 3
  start-page: 1137
  year: 2003
  ident: JPCS_1631_1_012120bib11
  article-title: A neural probabilistic language model
  publication-title: Journal of Machine Learning Research
– volume: 34
  start-page: 149
  year: 2020
  ident: JPCS_1631_1_012120bib4
  article-title: Text classification for ship industry news
  publication-title: Journal of Electronic Measurement and Instrumentation
– volume: 1
  start-page: 12
  year: 1986
  ident: JPCS_1631_1_012120bib10
  article-title: Learning distributed representations of concepts
  publication-title: Proceedings of the Eighth Annual Conference of the Cognitive Science Society
– volume: 40
  start-page: 42
  year: 2020
  ident: JPCS_1631_1_012120bib7
  article-title: Text classification method based on convolutional neural network using topic information
  publication-title: Journal of Modern Information
– volume: 16
  start-page: 245
  year: 2019
  ident: JPCS_1631_1_012120bib5
  article-title: Design of Web Sensitive Word filtering system based on decision tree
  publication-title: Computer Knowledge and Technology
– volume: 29
  start-page: 130
  year: 2020
  ident: JPCS_1631_1_012120bib6
  article-title: Optimization of Word2Vec and LSTM multi-category sentiment classification algorithm
  publication-title: Computer Systems & Applications
– volume: 11
  start-page: 177
  year: 2019
  ident: JPCS_1631_1_012120bib9
  article-title: Text classification using convolutional capsule network based on dual-channel word vectors
  publication-title: Computer Engineering
– year: 2013
  ident: JPCS_1631_1_012120bib12
  publication-title: Efficient estimation of word representations in vector space
– volume: 8
  start-page: 269
  year: 2018
  ident: JPCS_1631_1_012120bib3
  article-title: Chinese short text classification based on word vector extension
  publication-title: Computer Applications and Software
SSID ssj0033337
Score 2.234728
Snippet Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is...
SourceID proquest
crossref
iop
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 12120
SubjectTerms Classification
Embedding
Experimentation
Feature extraction
Identification methods
Physics
Questions
Websites
SummonAdditionalLinks – databaseName: Institute of Physics Open Access Journal Titles
  dbid: O3W
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFA06EXwRP3E6JaCP1jVJ0yaPUzbGcFPYxvYW8tG-2Q03_783aYcOEbFPpb1tw2l777nJuQlCdwQcI80Jj3SSFFFicxMZR4oopZpbGxsqbVD5jtL-NBnM-fx7LcxiWbv-B9itJgquIKwFcaINHJpGKZdpG7gEaZO2n5aMQtq-x0QqvKzvhc023pjBllVFkf4iITYar99vtBWhdqEVP9x0iD29I3RYk0bcqZp4jHby8gTtB_GmXZ2iUafEVcFtUffA4WFYGBovChx6NP0hcBG-z2WFHyFwwakSzyDxxN03kzsfwLAuHX4eT4ZnaNrrTp76Ub1OQmQhp4wjbY2MU-Fyx8LAXWZ4zAqjZaKBbjAgENYmNqUZkENLuSaGJBnwGqAOtoi5ZeeoUS7K_AJhwpxk3DHCMpZoyUymLTHUCckzl0rZRPEGG7WspsNQYRhbCOXhVB5O5eFURFVwNtE9YKjqX2P1t_ntlvng9Wm8baGWrmii1uaVfJn6Bflk0Mte_u-ZV-iA-jQ6SMdaqLF-_8ivgWuszU34mD4BoI_BXQ
  priority: 102
  providerName: IOP Publishing
Title An Identification Method of Question Subjects Based on Word Embedding and LSTM
URI https://iopscience.iop.org/article/10.1088/1742-6596/1631/1/012120
https://www.proquest.com/docview/2570949533
Volume 1631
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3NT9swFH-irZC4oA2G6GCVpe2I1TiOk_g0AWrpqlGqAYKb5Y9E4kDaUfj_9-w4YmjSyCFS7Hd6dn7v9z7sB_CNITCmFRNUZ1lNM1sZahyraZ5qYW1iUmlDle8in91m83txHwNum1hW2WFiAGq3sj5GPvbd1mQohvy-_k191yifXY0tNHowQAguRR8GZ5PF8leHxRyfoj0SmVK0tWVX4YVuXxyT-RgpCRuzsb_dzLf9_ss-9R5W639AOlie6QfYjZSRnLZr_BG2qmYPtkPppt3sw-K0Ie1x2zrG38hlaAtNVjUJ8Uw_hADhIy4bcoZmC6cacoduJ5k8msp580V048jP65vLT3A7ndycz2jskkAtepQJ1dbIJC9d5XhI2xVGJLw2WmYayQZH-mBtZvO0QGpoU6GZYVmBrAaJg60TYfkB9JtVUx0CYdxJLhxnvOCZltwU2jKTulKKwuVSDiHpdKPW7WUYKiSxy1J5dSqvTuXVqZhq1TmEE9Shij_G5n3xr2_E58vz67cSau3qIRx3S_Iq-rpBPv9_-gh2Uu80h0KxY-g_P71UX5BZPJsR9MrpxShuIvz6cbXE9xW_-wMyfMWH
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB7xUNVeEH2pKY9aAm61srb35QNCPIICJBEqQeVm_NiVeugmJSDEn-I3MvbuiiIkOLFHe7SHz-P5ZsZjD8AmQ8PIC5ZQHccljW1hqHGspCnXibWR4dKGKt9R2j-Pjy-Sizm4b-_C-LLK1iYGQ-0m1ufIu77bmgzFkDvTf9R3jfKnq20LjVotToq7WwzZZttHB7i-W5wf9sb7fdp0FaAWI7CIamtklOaucCIcc2UmiURptIw1krNAurU2tinP0JWyPNHMsDhDLwCJ1pZRYgX-dx4W0c2QuIsW93qj01-t7Rf4ZfUVTE6R2_O2ogzDzGZMpl10gViXdf1rar7N-H98OP9nMn1GCoHpDpdhqXFRyW6tUx9hrqg-wbtQKmpnn2G0W5H6em_Z5PvIMLShJpOShPypH0KD5DM8M7KHNIlTFfmNmJHeX1M4T5dEV44MzsbDL3D-Jvh9hYVqUhXfgDDhpEicYCITsZbCZNoyw10uk8ylUnYgarFR0_rxDRUOzfNceTiVh1N5OBVTNZwd-IkYqmYjzl4X33gifny6f_ZUQk1d2YHVdkkeRR8V8vvL0z_gfX88HKjB0ehkBT5wH7CHIrVVWLi-uinW0Ku5NuuNKhG4fGvtfQBhkv0R
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT4QwEJ74iMaL8RlXV22iRxHaUqBHH7vxuZqo0VvTB9xkN67-f6eF1WyMMXIiMEDztcx8005nAA4pKkZWUhHpNK2i1JYmMo5WUca0sDYxTNoQ5TvILp7SqxfxMgP9r70ww1Gr-o_xtEkU3EDYBsQVMXJoFmVCZjFyCRrT2KclY0k8ctUszAuONhXH9R1_nmhkjkfebIz0DxbFJM7r95dNWalZbMkPVR3sT38FllviSE6aZq7CTFmvwUII4LTjdRic1KTZdFu1s3DkNhSHJsOKhFlNfwnVhJ93GZNTNF54qybP6HyS3qspnTdiRNeO3Dw83m7AU7_3eHYRtbUSIot-ZRJpa2SSFa50PCze5UYkvDJaphopB0cSYW1qM5YjQbRMaGpomiO3Qfpgq0RYvglz9bAut4BQ7iQXjlOe81RLbnJtqWGukCJ3mZQdSCbYqFGTEkOFpeyiUB5O5eFUHk5FVQNnB44QQ9X-HuO_xQ-mxK_uzx6mJRR2dge6ky75FvVF-WSImd3-3zf3YfH-vK9uLgfXO7DEvFcdIsm6MPf-9lHuIvV4N3thXH0C8kPFVQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Identification+Method+of+Question+Subjects+Based+on+Word+Embedding+and+LSTM&rft.jtitle=Journal+of+physics.+Conference+series&rft.au=Gao%2C+M+X&rft.au=Fu%2C+Z+X&rft.date=2020-09-01&rft.pub=IOP+Publishing&rft.issn=1742-6588&rft.eissn=1742-6596&rft.volume=1631&rft.issue=1&rft_id=info:doi/10.1088%2F1742-6596%2F1631%2F1%2F012120
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1742-6588&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1742-6588&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1742-6588&client=summon