An Identification Method of Question Subjects Based on Word Embedding and LSTM
Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question s...
Saved in:
Published in | Journal of physics. Conference series Vol. 1631; no. 1; pp. 12120 - 12130 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Bristol
IOP Publishing
01.09.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 1742-6588 1742-6596 |
DOI | 10.1088/1742-6596/1631/1/012120 |
Cover
Loading…
Abstract | Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question subjects based on word embedding and LSTM (IQS-WE-L), and uses question set on the MadSci website for experimentation, which has three subjects. We firstly use the Word2vec to train the Wikipedia database to generate a dictionary. Then based on word vectors, we propose four feature extraction methods: W2V, W2V-TFIDF, W2V-c-TFIDF and W2V-c, which formalizes the text features into vectors through word embedding and other features. Finally, we build an LSTM network for classification training to identify the subject of the question and quantitative evaluate effect of four feature extraction methods we proposed. Experimental data shows that the method proposed in this paper can effectively identify the subject of the question. When classifying the subject of the question, the F1 value can reach a maximum of 0.9339. |
---|---|
AbstractList | Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is usually short text. Therefore, in view of its sparse features and irregular structure, this paper proposes an identification method of question subjects based on word embedding and LSTM (IQS-WE-L), and uses question set on the MadSci website for experimentation, which has three subjects. We firstly use the Word2vec to train the Wikipedia database to generate a dictionary. Then based on word vectors, we propose four feature extraction methods: W2V, W2V-TFIDF, W2V-c-TFIDF and W2V-c, which formalizes the text features into vectors through word embedding and other features. Finally, we build an LSTM network for classification training to identify the subject of the question and quantitative evaluate effect of four feature extraction methods we proposed. Experimental data shows that the method proposed in this paper can effectively identify the subject of the question. When classifying the subject of the question, the F1 value can reach a maximum of 0.9339. |
Author | Fu, Z X Gao, M X |
Author_xml | – sequence: 1 givenname: M X surname: Gao fullname: Gao, M X email: gaomx@bjut.edu.cn organization: Department of Information Science, Beijing University of Technology , China – sequence: 2 givenname: Z X surname: Fu fullname: Fu, Z X organization: Department of Information Science, Beijing University of Technology , China |
BookMark | eNqFkN9LwzAQx4NMcJv-DQZ8E2ovSdu0j3NMnWz-YBMfQ5qk2uGS2nQP_ve2ViaC4L3ccfe9X58RGlhnDUKnBC4IpGlIeESDJM6SkCSMhCQEQgmFAzTcVwb7OE2P0Mj7DQBrjQ_R3cTiuTa2KYtSyaZ0Fi9N8-o0dgV-3Bn_lVrt8o1RjceX0pu2ZPGzqzWebXOjdWlfsLQaL1br5TE6LOSbNyfffoyermbr6U2wuL-eTyeLQFEeQSBVnkGSaqMZQEwznsfAilxmkYxYxmjKlIpUQnkKoGgsSU4iTmPCCFcFxIqN0Vk_t6rde3el2LhdbduVgsYcsiiL2__GiPcqVTvva1OIqi63sv4QBEQHT3RYRIdIdPAEET28tvO87yxd9TP69mG6-i0UlS5aMftD_N-KT_KwfYE |
ContentType | Journal Article |
Copyright | Published under licence by IOP Publishing Ltd 2020. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: Published under licence by IOP Publishing Ltd – notice: 2020. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | O3W TSCCA AAYXX CITATION 8FD 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO H8D HCIFZ L7M P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
DOI | 10.1088/1742-6596/1631/1/012120 |
DatabaseName | Institute of Physics Open Access Journal Titles IOPscience (Open Access) CrossRef Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials - QC ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Aerospace Database SciTech Collection (ProQuest) Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China |
DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences Aerospace Database ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic Advanced Technologies Database with Aerospace ProQuest One Academic (New) |
DatabaseTitleList | Publicly Available Content Database CrossRef |
Database_xml | – sequence: 1 dbid: O3W name: Institute of Physics Open Access Journal Titles url: http://iopscience.iop.org/ sourceTypes: Enrichment Source Publisher – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
DocumentTitleAlternate | An Identification Method of Question Subjects Based on Word Embedding and LSTM |
EISSN | 1742-6596 |
ExternalDocumentID | 10_1088_1742_6596_1631_1_012120 JPCS_1631_1_012120 |
GroupedDBID | 1JI 29L 2WC 4.4 5B3 5GY 5PX 5VS 7.Q AAJIO AAJKP ABHWH ACAFW ACHIP AEFHF AEJGL AFKRA AFYNE AIYBF AKPSB ALMA_UNASSIGNED_HOLDINGS ARAPS ASPBG ATQHT AVWKF AZFZN BENPR BGLVJ CCPQU CEBXE CJUJL CRLBU CS3 DU5 E3Z EBS EDWGO EQZZN F5P FRP GROUPED_DOAJ GX1 HCIFZ HH5 IJHAN IOP IZVLO J9A KNG KQ8 LAP N5L N9A O3W OK1 P2P PIMPY PJBAE RIN RNS RO9 ROL SY9 T37 TR2 TSCCA UCJ W28 XSB ~02 AAYXX CITATION OVT PHGZM PHGZT 8FD 8FE 8FG ABUWG AZQEC DWQXO H8D L7M P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-c2740-acb9068ded3005297b503fba94a4393283cc4c627800c25a1b147251317cf05c3 |
IEDL.DBID | BENPR |
ISSN | 1742-6588 |
IngestDate | Mon Jul 14 10:38:34 EDT 2025 Tue Jul 01 03:13:22 EDT 2025 Wed Aug 21 03:33:31 EDT 2024 Thu Jan 07 14:56:21 EST 2021 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c2740-acb9068ded3005297b503fba94a4393283cc4c627800c25a1b147251317cf05c3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2570949533?pq-origsite=%requestingapplication% |
PQID | 2570949533 |
PQPubID | 4998668 |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_1088_1742_6596_1631_1_012120 iop_journals_10_1088_1742_6596_1631_1_012120 proquest_journals_2570949533 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20200901 |
PublicationDateYYYYMMDD | 2020-09-01 |
PublicationDate_xml | – month: 09 year: 2020 text: 20200901 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Bristol |
PublicationPlace_xml | – name: Bristol |
PublicationTitle | Journal of physics. Conference series |
PublicationTitleAlternate | J. Phys.: Conf. Ser |
PublicationYear | 2020 |
Publisher | IOP Publishing |
Publisher_xml | – name: IOP Publishing |
References | Yan (JPCS_1631_1_012120bib8) 2019; 20 Bengio (JPCS_1631_1_012120bib11) 2003; 3 Yang (JPCS_1631_1_012120bib7) 2020; 40 Fan (JPCS_1631_1_012120bib2) 2012; 28 Hinton (JPCS_1631_1_012120bib10) 1986; 1 Mikolov (JPCS_1631_1_012120bib12) 2013 Wu (JPCS_1631_1_012120bib6) 2020; 29 Lei (JPCS_1631_1_012120bib3) 2018; 8 Li (JPCS_1631_1_012120bib5) 2019; 16 Kang (JPCS_1631_1_012120bib9) 2019; 11 Huang (JPCS_1631_1_012120bib1) 2020; 23 Zhu (JPCS_1631_1_012120bib4) 2020; 34 |
References_xml | – volume: 23 start-page: 1 year: 2020 ident: JPCS_1631_1_012120bib1 article-title: Research on short text classification based on bag of words and TF-IDF publication-title: Software Engineering – volume: 28 start-page: 47 year: 2012 ident: JPCS_1631_1_012120bib2 article-title: Research on Chinese short text classification based on Wikipedia publication-title: New Technol. Lib. Inf Ser. – volume: 20 start-page: 44 year: 2019 ident: JPCS_1631_1_012120bib8 article-title: Research on social media text sentiment analysis based on machine learning publication-title: China Computer & Communication – volume: 3 start-page: 1137 year: 2003 ident: JPCS_1631_1_012120bib11 article-title: A neural probabilistic language model publication-title: Journal of Machine Learning Research – volume: 34 start-page: 149 year: 2020 ident: JPCS_1631_1_012120bib4 article-title: Text classification for ship industry news publication-title: Journal of Electronic Measurement and Instrumentation – volume: 1 start-page: 12 year: 1986 ident: JPCS_1631_1_012120bib10 article-title: Learning distributed representations of concepts publication-title: Proceedings of the Eighth Annual Conference of the Cognitive Science Society – volume: 40 start-page: 42 year: 2020 ident: JPCS_1631_1_012120bib7 article-title: Text classification method based on convolutional neural network using topic information publication-title: Journal of Modern Information – volume: 16 start-page: 245 year: 2019 ident: JPCS_1631_1_012120bib5 article-title: Design of Web Sensitive Word filtering system based on decision tree publication-title: Computer Knowledge and Technology – volume: 29 start-page: 130 year: 2020 ident: JPCS_1631_1_012120bib6 article-title: Optimization of Word2Vec and LSTM multi-category sentiment classification algorithm publication-title: Computer Systems & Applications – volume: 11 start-page: 177 year: 2019 ident: JPCS_1631_1_012120bib9 article-title: Text classification using convolutional capsule network based on dual-channel word vectors publication-title: Computer Engineering – year: 2013 ident: JPCS_1631_1_012120bib12 publication-title: Efficient estimation of word representations in vector space – volume: 8 start-page: 269 year: 2018 ident: JPCS_1631_1_012120bib3 article-title: Chinese short text classification based on word vector extension publication-title: Computer Applications and Software |
SSID | ssj0033337 |
Score | 2.234728 |
Snippet | Using the subject of the question can locate the question area, narrow the scope of the query, and provide users with better answers. The question text is... |
SourceID | proquest crossref iop |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 12120 |
SubjectTerms | Classification Embedding Experimentation Feature extraction Identification methods Physics Questions Websites |
SummonAdditionalLinks | – databaseName: Institute of Physics Open Access Journal Titles dbid: O3W link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFA06EXwRP3E6JaCP1jVJ0yaPUzbGcFPYxvYW8tG-2Q03_783aYcOEbFPpb1tw2l777nJuQlCdwQcI80Jj3SSFFFicxMZR4oopZpbGxsqbVD5jtL-NBnM-fx7LcxiWbv-B9itJgquIKwFcaINHJpGKZdpG7gEaZO2n5aMQtq-x0QqvKzvhc023pjBllVFkf4iITYar99vtBWhdqEVP9x0iD29I3RYk0bcqZp4jHby8gTtB_GmXZ2iUafEVcFtUffA4WFYGBovChx6NP0hcBG-z2WFHyFwwakSzyDxxN03kzsfwLAuHX4eT4ZnaNrrTp76Ub1OQmQhp4wjbY2MU-Fyx8LAXWZ4zAqjZaKBbjAgENYmNqUZkENLuSaGJBnwGqAOtoi5ZeeoUS7K_AJhwpxk3DHCMpZoyUymLTHUCckzl0rZRPEGG7WspsNQYRhbCOXhVB5O5eFURFVwNtE9YKjqX2P1t_ntlvng9Wm8baGWrmii1uaVfJn6Bflk0Mte_u-ZV-iA-jQ6SMdaqLF-_8ivgWuszU34mD4BoI_BXQ priority: 102 providerName: IOP Publishing |
Title | An Identification Method of Question Subjects Based on Word Embedding and LSTM |
URI | https://iopscience.iop.org/article/10.1088/1742-6596/1631/1/012120 https://www.proquest.com/docview/2570949533 |
Volume | 1631 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3NT9swFH-irZC4oA2G6GCVpe2I1TiOk_g0AWrpqlGqAYKb5Y9E4kDaUfj_9-w4YmjSyCFS7Hd6dn7v9z7sB_CNITCmFRNUZ1lNM1sZahyraZ5qYW1iUmlDle8in91m83txHwNum1hW2WFiAGq3sj5GPvbd1mQohvy-_k191yifXY0tNHowQAguRR8GZ5PF8leHxRyfoj0SmVK0tWVX4YVuXxyT-RgpCRuzsb_dzLf9_ss-9R5W639AOlie6QfYjZSRnLZr_BG2qmYPtkPppt3sw-K0Ie1x2zrG38hlaAtNVjUJ8Uw_hADhIy4bcoZmC6cacoduJ5k8msp580V048jP65vLT3A7ndycz2jskkAtepQJ1dbIJC9d5XhI2xVGJLw2WmYayQZH-mBtZvO0QGpoU6GZYVmBrAaJg60TYfkB9JtVUx0CYdxJLhxnvOCZltwU2jKTulKKwuVSDiHpdKPW7WUYKiSxy1J5dSqvTuXVqZhq1TmEE9Shij_G5n3xr2_E58vz67cSau3qIRx3S_Iq-rpBPv9_-gh2Uu80h0KxY-g_P71UX5BZPJsR9MrpxShuIvz6cbXE9xW_-wMyfMWH |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxsxEB7xUNVeEH2pKY9aAm61srb35QNCPIICJBEqQeVm_NiVeugmJSDEn-I3MvbuiiIkOLFHe7SHz-P5ZsZjD8AmQ8PIC5ZQHccljW1hqHGspCnXibWR4dKGKt9R2j-Pjy-Sizm4b-_C-LLK1iYGQ-0m1ufIu77bmgzFkDvTf9R3jfKnq20LjVotToq7WwzZZttHB7i-W5wf9sb7fdp0FaAWI7CIamtklOaucCIcc2UmiURptIw1krNAurU2tinP0JWyPNHMsDhDLwCJ1pZRYgX-dx4W0c2QuIsW93qj01-t7Rf4ZfUVTE6R2_O2ogzDzGZMpl10gViXdf1rar7N-H98OP9nMn1GCoHpDpdhqXFRyW6tUx9hrqg-wbtQKmpnn2G0W5H6em_Z5PvIMLShJpOShPypH0KD5DM8M7KHNIlTFfmNmJHeX1M4T5dEV44MzsbDL3D-Jvh9hYVqUhXfgDDhpEicYCITsZbCZNoyw10uk8ylUnYgarFR0_rxDRUOzfNceTiVh1N5OBVTNZwd-IkYqmYjzl4X33gifny6f_ZUQk1d2YHVdkkeRR8V8vvL0z_gfX88HKjB0ehkBT5wH7CHIrVVWLi-uinW0Ku5NuuNKhG4fGvtfQBhkv0R |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT4QwEJ74iMaL8RlXV22iRxHaUqBHH7vxuZqo0VvTB9xkN67-f6eF1WyMMXIiMEDztcx8005nAA4pKkZWUhHpNK2i1JYmMo5WUca0sDYxTNoQ5TvILp7SqxfxMgP9r70ww1Gr-o_xtEkU3EDYBsQVMXJoFmVCZjFyCRrT2KclY0k8ctUszAuONhXH9R1_nmhkjkfebIz0DxbFJM7r95dNWalZbMkPVR3sT38FllviSE6aZq7CTFmvwUII4LTjdRic1KTZdFu1s3DkNhSHJsOKhFlNfwnVhJ93GZNTNF54qybP6HyS3qspnTdiRNeO3Dw83m7AU7_3eHYRtbUSIot-ZRJpa2SSFa50PCze5UYkvDJaphopB0cSYW1qM5YjQbRMaGpomiO3Qfpgq0RYvglz9bAut4BQ7iQXjlOe81RLbnJtqWGukCJ3mZQdSCbYqFGTEkOFpeyiUB5O5eFUHk5FVQNnB44QQ9X-HuO_xQ-mxK_uzx6mJRR2dge6ky75FvVF-WSImd3-3zf3YfH-vK9uLgfXO7DEvFcdIsm6MPf-9lHuIvV4N3thXH0C8kPFVQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+Identification+Method+of+Question+Subjects+Based+on+Word+Embedding+and+LSTM&rft.jtitle=Journal+of+physics.+Conference+series&rft.au=Gao%2C+M+X&rft.au=Fu%2C+Z+X&rft.date=2020-09-01&rft.pub=IOP+Publishing&rft.issn=1742-6588&rft.eissn=1742-6596&rft.volume=1631&rft.issue=1&rft_id=info:doi/10.1088%2F1742-6596%2F1631%2F1%2F012120 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1742-6588&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1742-6588&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1742-6588&client=summon |