통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용

The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of th...

Full description

Saved in:

Bibliographic Details
Published in	정보과학회 컴퓨팅의 실제 논문지 Vol. 23; no. 7; pp. 446 - 451
Main Authors	이정훈(Jung-Hun Lee), 김민호(Minho Kim), 권혁철(Hyuk-Chul Kwon)
Format	Journal Article
Language	Korean
Published	Korean Institute of Information Scientists and Engineers 2017 한국정보과학회
Subjects	컴퓨터학 확률언어모형 statistical language model natural language processing 문맥의존 철자오류 교정 context-sensitive spelling error correction local document frequency 자연언어처리 지역적 문서 빈도 text mining 텍스트 마이닝
Online Access	Get full text
ISSN	2383-6318 2383-6326
DOI	10.5626/KTCP.2017.23.7.446

Cover

Abstract	The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model. 본 논문에서의 문맥의존 철자오류(Context-Sensitive Spelling Error) 교정 기법은 샤논(Shannon)의 노이지 채널 모형(noisy channel model)을 기반으로 한다. 논문에서 제안하는 교정 기법의 향상에는 보간(interpolation)을 사용하며, 일반적인 보간 방법은 확률의 중간 값을 채우는 방식으로 N-gram에 존재하지 않는 빈도를 (N-1)-gram과 (N-2)-gram 등에서 얻는다. 이와 같은 방식은 동일 통계 말뭉치를 기반으로 계산하는데 제안하는 방식에서는 통계 말뭉치와 교정 문서간의 빈도 정보를 이용하여 보간 한다. 교정 문서의 빈도를 이용하였을 때 이점은 다음과 같다. 첫째 통계 말뭉치에 존재하지 않고 교정 문서에서만 나타나는 신조어의 확률을 얻을 수 있다. 둘째 확률 값이 모호한 두 교정 후보가 있더라도 교정 문서를 참고로 교정하게 되어 모호성을 해소한다. 제안한 방법은 기존 교정 모형보다 정밀도와 재현율의 성능향상을 보였다.
AbstractList	본 논문에서의 문맥의존 철자오류(Context-Sensitive Spelling Error) 교정 기법은 샤논(Shannon)의 노이지 채널 모형(noisy channel model)을 기반으로 한다. 논문에서 제안하는 교정 기법의 향상에는 보간(interpolation)을 사용하며, 일반적인 보간 방법은 확률의 중간 값을 채우는 방식으로 N-gram에 존재하지 않는 빈도를 (N-1)-gram과 (N-2)-gram 등에서 얻는다. 이와 같은 방식은 동일 통계 말뭉치를 기반으로 계산하는데 제안하는 방식에서는 통계 말뭉치와 교정 문서간의 빈도 정보를 이용하여 보간 한다. 교정 문서의 빈도를 이용하였을 때 이점은 다음과 같다. 첫째 통계 말뭉치에 존재하지 않고 교정 문서에서만 나타나는 신조어의 확률을 얻을 수 있다. 둘째 확률 값이 모호한 두 교정 후보가 있더라도 교정 문서를 참고로 교정하게 되어 모호성을 해소한다. 제안한 방법은 기존 교정 모형보다 정밀도와 재현율의 성능향상을 보였다. The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon’s noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model. KCI Citation Count: 2 The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model. 본 논문에서의 문맥의존 철자오류(Context-Sensitive Spelling Error) 교정 기법은 샤논(Shannon)의 노이지 채널 모형(noisy channel model)을 기반으로 한다. 논문에서 제안하는 교정 기법의 향상에는 보간(interpolation)을 사용하며, 일반적인 보간 방법은 확률의 중간 값을 채우는 방식으로 N-gram에 존재하지 않는 빈도를 (N-1)-gram과 (N-2)-gram 등에서 얻는다. 이와 같은 방식은 동일 통계 말뭉치를 기반으로 계산하는데 제안하는 방식에서는 통계 말뭉치와 교정 문서간의 빈도 정보를 이용하여 보간 한다. 교정 문서의 빈도를 이용하였을 때 이점은 다음과 같다. 첫째 통계 말뭉치에 존재하지 않고 교정 문서에서만 나타나는 신조어의 확률을 얻을 수 있다. 둘째 확률 값이 모호한 두 교정 후보가 있더라도 교정 문서를 참고로 교정하게 되어 모호성을 해소한다. 제안한 방법은 기존 교정 모형보다 정밀도와 재현율의 성능향상을 보였다.
Author	이정훈(Jung-Hun Lee) 권혁철(Hyuk-Chul Kwon) 김민호(Minho Kim)
Author_xml	– sequence: 1 fullname: 이정훈(Jung-Hun Lee) – sequence: 2 fullname: 김민호(Minho Kim) – sequence: 3 fullname: 권혁철(Hyuk-Chul Kwon)
BackLink	https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002245279$$DAccess content in National Research Foundation of Korea (NRF)
BookMark	eNpFkEtLAlEYhg9RkJl_oNVsWrRwOrc5c2YpZmVKRrg_zIzHGLQxnFq0S5LWbiwLDYNEgoLwAv4m5_gf8hK1el8-Hh54vy2w7ld8CcAOgrrBMNvP5JNnOobI1DHRTZ1StgYimHASZwSz9b-O-CaIBYHnQIw4ptxCEXAxexhPh3XVrWnh5yTs91Snpd5Gmhp01WtDtd7DXkubjhuq29Smk-9w0JwD2uyxp-5rqlPXVLs-a7Y11b9TT1-_FlWfH7rNcDhass9t9fKxDTaKdjmQsd-MgvxhKp88jmdzR-lkIhsvWZTFCxxLTKRFmGFLjKALadHF3CGLIcixTWpYnBnSdakhEXQIJVhCx4SOLYk0TBIFeyutXy2KkuuJiu0t86IiSlWROM-nBaIQc7hgd1dsyQuuPeEXgrI4SWRyi09ijNhczpmF_jn_pupdyoJni6t5sau34jR3kIImhsTgjPwAhfCSqQ
ContentType	Journal Article
DBID	DBRKI TDB JDI ACYCR
DEWEY	005
DOI	10.5626/KTCP.2017.23.7.446
DatabaseName	DBPIA - 디비피아 Nurimedia DBPIA Journals KoreaScience Korean Citation Index
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
DocumentTitleAlternate	The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction
DocumentTitle_FL	The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction
EISSN	2383-6326
EndPage	451
ExternalDocumentID	oai_kci_go_kr_ARTI_1402807 JAKO201722163438691 NODE07203586
GroupedDBID	.UV ALMA_UNASSIGNED_HOLDINGS DBRKI TDB JDI ACYCR M~E
ID	FETCH-LOGICAL-k946-d82e23e9365ae210c04fc28b363261ba7459865ecc45e10b3432e0b70bae3e573
ISSN	2383-6318
IngestDate	Tue Nov 21 21:43:23 EST 2023 Fri Dec 22 11:58:55 EST 2023 Thu Feb 06 13:24:09 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Issue	7
Keywords	확률언어모형 statistical language model natural language processing 문맥의존 철자오류 교정 context-sensitive spelling error correction local document frequency 자연언어처리 지역적 문서 빈도 text mining 텍스트 마이닝
Language	Korean
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-k946-d82e23e9365ae210c04fc28b363261ba7459865ecc45e10b3432e0b70bae3e573
Notes	KISTI1.1003/JNL.JAKO201722163438691
OpenAccessLink	http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201722163438691&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
PageCount	6
ParticipantIDs	nrf_kci_oai_kci_go_kr_ARTI_1402807 kisti_ndsl_JAKO201722163438691 nurimedia_primary_NODE07203586
PublicationCentury	2000
PublicationDate	2017
PublicationDateYYYYMMDD	2017-01-01
PublicationDate_xml	– year: 2017 text: 2017
PublicationDecade	2010
PublicationTitle	정보과학회 컴퓨팅의 실제 논문지
PublicationTitleAlternate	KIISE transactions on computing practices
PublicationYear	2017
Publisher	Korean Institute of Information Scientists and Engineers 한국정보과학회
Publisher_xml	– name: Korean Institute of Information Scientists and Engineers – name: 한국정보과학회
SSID	ssib021824891 ssib044742771 ssib053377435 ssib019653237
Score	1.610844
Snippet	The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used... 본 논문에서의 문맥의존 철자오류(Context-Sensitive Spelling Error) 교정 기법은 샤논(Shannon)의 노이지 채널 모형(noisy channel model)을 기반으로 한다. 논문에서 제안하는 교정 기법의 향상에는 보간(interpolation)을 사용하며,...
SourceID	nrf kisti nurimedia
SourceType	Open Website Open Access Repository Publisher
StartPage	446
SubjectTerms	컴퓨터학
Title	통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용
URI	https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE07203586 http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO201722163438691&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01 https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002245279
Volume	23
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	정보과학회 컴퓨팅의 실제 논문지, 2017, 23(7), , pp.446-451
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR3LbtNA0CrlABfEU5RHZSH2FDn4sbZ3j7aTKrRqyyFIvVlx4j6UKkGhEYIDoiLi3EugoBYFiapCAgn1IfWbGvcfmNl1UqdU4nFJ1rvz2h1nd2azM6soDy3uVKpMpxp4V-Cg8LiqVdBZcS23CquLoVcWcb9jds4pPaXTC_bC2Pj1zKml9lqUr746N67kf7QKdaBXjJL9B80OiUIFlEG_8Akahs-_0jEpFghziG-Tokd8izBKigHxdDAPc6ToEy8gPhMFl3g2tvEC4UwAGcSnOSz5JmJgW5FwWWDEowLNhnJOELfTNgDltqxixNcRCgjwU-I5FIo7KT8GQhlpG5P8eCDkLCASD0QViMckZ5CzcG4fBNIAXBcMfeyyT89w5sgAqzzi8azpfR6mGDU_GEjDRcEjTPLxfQEEVRbxmBhsEGOkq_Dsi8GStAMhMoAgzYzsooODdzvF92lGIiAHeAzM_WmYfbVSu5ETR6T4KRLIysUwAd2pgdAMGZhsdqWx3MyJm7FHMRwUWQJKPUhtm6z0sl3XguX2am7mBVr_PLv9I-NcZRhFs4X_1YycKUlD2ORaIwJqYa6Qub4H6S0zW8BgrVmaY6XrX5ytk1kMBiukjAhPZwI3s9zRdPtYWk5Upg4-uyiDhY37QzPl4AkepXTzppV380PUkWTnc_OFoo4nA2zmXFAumq5r4Dnd2dfFwSKBGTCtTIIlvIGAstOET5S6FNEGz-DOgIcjLtod9lcGz6Fcj36XCrxUdN1WwNhstMBGvdRo40UbMFtnDM_yVeVK6jGqnvz5X1PG6s0bytLJu8Pj_U7SW1f734_6uzvJ9mby5UBN9nrJ541k82t_Z1M9PtxIel31-Ohnf68LAOrJ-53k7Xqy3VGTrc5Jd0tNdt8kH36kVJIOVPS6_f0DAftxK_n07aZSniqWg5KW3pmi1Tl1tBozY9OKueXYldg09KpOF6smiyxUqRFVXIr3Mdgwb1M7NvQIw8pjPXL1qBJbse1at5TxRrMR31ZUoGMbrlGJeC2ikcs5lmtuZEU1xwReE8qkGKewUXu-Gk57M_M4jKYJDh61mMONCeUBDGBYr66EmMMev5eaYb0Vgqf-ODQoHupwgcpwfMNnMsFOmH0L7vwJ4K5yGRnLbdF7yvhaqx3fB0dhLZoUL84vZWrl7g
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%ED%86%B5%EA%B3%84%EC%A0%81+%EB%AC%B8%EB%A7%A5%EC%9D%98%EC%A1%B4+%EC%B2%A0%EC%9E%90%EC%98%A4%EB%A5%98+%EA%B5%90%EC%A0%95+%EA%B8%B0%EB%B2%95%EC%9D%98+%ED%96%A5%EC%83%81%EC%9D%84+%EC%9C%84%ED%95%9C+%EC%A7%80%EC%97%AD%EC%A0%81+%EB%AC%B8%EC%84%9C+%EC%A0%95%EB%B3%B4%EC%9D%98+%ED%99%9C%EC%9A%A9&rft.jtitle=%EC%A0%95%EB%B3%B4%EA%B3%BC%ED%95%99%ED%9A%8C+%EC%BB%B4%ED%93%A8%ED%8C%85%EC%9D%98+%EC%8B%A4%EC%A0%9C+%EB%85%BC%EB%AC%B8%EC%A7%80&rft.au=%EC%9D%B4%EC%A0%95%ED%9B%88%28Jung-Hun+Lee%29&rft.au=%EA%B9%80%EB%AF%BC%ED%98%B8%28Minho+Kim%29&rft.au=%EA%B6%8C%ED%98%81%EC%B2%A0%28Hyuk-Chul+Kwon%29&rft.date=2017&rft.pub=Korean+Institute+of+Information+Scientists+and+Engineers&rft.issn=2383-6318&rft.eissn=2383-6326&rft.volume=23&rft.issue=7&rft.spage=446&rft.epage=451&rft_id=info:doi/10.5626%2FKTCP.2017.23.7.446&rft.externalDocID=NODE07203586
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2383-6318&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2383-6318&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2383-6318&client=summon