언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법
With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly...
Saved in:
Published in | 스마트미디어저널 Vol. 11; no. 11; pp. 17 - 24 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | Korean |
Published |
한국스마트미디어학회
31.12.2022
Korean Institute of Smart Media |
Subjects | |
Online Access | Get full text |
ISSN | 2287-1322 |
Cover
Abstract | With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%. 음성인식 기술과 인공지능 기술을 기반으로 한 스마트스피커의 보급으로 비장애인뿐만 아니라 시각장애인이나 지체장애인들도 홈 네트워크 서비스를 연동하여 주택의 전등이나 TV와 같은 가전제품을 음성을 통해 쉽게 제어할 수 있게 되어 삶의 질이 대폭 향상되었다. 하지만 언어장애인의 경우 조음장애나 구음장애 등으로 부정확한 발음을 하게 됨으로서 스마트스피커의 유용한 서비스를 사용하는 것이 불가능하다. 본 논문에서는 스마트스피커에서 제공되는 기능 중 일부 서비스를 대상으로 언어장애인이 이용할 수 있도록 개인화된 음성분류기법을 제안한다. 본 논문에서는 소량의 데이터와 짧은 학습시간으로도 언어장애인이 구사하는 문장의 인식률과 정확도를 높여 스마트스피커가 제공하는 서비스를 실제로 이용할 수 있도록 하는 것이 목표이다. 본 논문에서는 ResNet18 모델을 fine tuning하고 데이터 증강과 one cycle learning rate 최적화 기법을 추가하여 적용하였으며, 실험을 통하여 30개의 스마트스피커 명령어 별로 10회 녹음한 후 3분 이내로 학습할 경우 음성분류 정확도가 95.2% 정도가 됨을 보였다. |
---|---|
AbstractList | With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%. 음성인식 기술과 인공지능 기술을 기반으로 한 스마트스피커의 보급으로 비장애인뿐만 아니라 시각장애인이나 지체장애인들도 홈 네트워크 서비스를 연동하여 주택의 전등이나 TV와 같은 가전제품을 음성을 통해 쉽게 제어할 수 있게 되어 삶의 질이 대폭 향상되었다. 하지만 언어장애인의 경우 조음장애나 구음장애 등으로 부정확한 발음을 하게 됨으로서 스마트스피커의 유용한 서비스를 사용하는 것이 불가능하다. 본 논문에서는 스마트스피커에서 제공되는 기능 중 일부 서비스를 대상으로 언어장애인이 이용할 수 있도록 개인화된 음성분류기법을 제안한다. 본 논문에서는 소량의 데이터와 짧은 학습시간으로도 언어장애인이 구사하는 문장의 인식률과 정확도를 높여 스마트스피커가 제공하는 서비스를 실제로 이용할 수 있도록 하는 것이 목표이다. 본 논문에서는 ResNet18 모델을 fine tuning하고 데이터 증강과 one cycle learning rate 최적화 기법을 추가하여 적용하였으며, 실험을 통하여 30개의 스마트스피커 명령어 별로 10회 녹음한 후 3분 이내로 학습할 경우 음성분류 정확도가 95.2% 정도가 됨을 보였다. |
Author | 이승권 전광일 최우진 |
Author_xml | – sequence: 1 fullname: 이승권 – sequence: 2 fullname: 전광일 – sequence: 3 fullname: 최우진 |
BookMark | eNpNjD9Lw0Achm-oYK39Drc4Bn53Se4uYyn-L3TpHnLNBUpLAqaLo5hBioNCWyok0qHFtYpowH6i5PIdrOjg9LzD8z4HqBZGoaqhOqWCG8SkdB8143gggVBiMcJZHUk9z_X8XT-v9Wyps1xnC6wnq_Llrprku1FNt_prhfXysfjc6uQVV_O1vr3RWYJ1mlSzFBebdPernqblQ4p1dv8jlR9JuV7gIt-Ub7NDtBd4o1g1_9hAvZPjXvvM6HRPz9utjjFkAIbNCDN9r0-EQzgBaivFqRMoLrzAAUsF0gqYxTmRHrFEAGYguc-kL4TPQNrcbKCj3-xwEI8HbujHI_eiddmlQCkhtk0FZbbz37uOZOTKKBr2VThWV64FBMDkAJQRML8BOyp6vQ |
ContentType | Journal Article |
Copyright | COPYRIGHT(C) KYOBO BOOK CENTRE ALL RIGHTS RESERVED |
Copyright_xml | – notice: COPYRIGHT(C) KYOBO BOOK CENTRE ALL RIGHTS RESERVED |
DBID | P5Y SSSTE JDI |
DEWEY | 302.23 657.84 |
DatabaseName | 교보문고 스콜라 Scholar(스콜라) KoreaScience |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Anthropology Social Sciences (General) Business |
DocumentTitleAlternate | Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people |
EndPage | 24 |
ExternalDocumentID | JAKO202211552826597 4010037002610 |
GroupedDBID | P5Y SSSTE .UV JDI |
ID | FETCH-LOGICAL-k600-56163dac189171025ee729fe78af904efb4f64771ba148f03fb7d6bd88d60b573 |
ISSN | 2287-1322 |
IngestDate | Fri Dec 22 12:03:24 EST 2023 Thu Apr 10 10:28:03 EDT 2025 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Issue | 11 |
Keywords | deep learning personalized speech classification scheme smart speaker speech-impaired people 스마트스피커 딥러닝 장애인접근성 개인화된 음성분류기법 언어장애인 disabled accessibility |
Language | Korean |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-k600-56163dac189171025ee729fe78af904efb4f64771ba148f03fb7d6bd88d60b573 |
Notes | KISTI1.1003/JNL.JAKO202211552826597 |
OpenAccessLink | http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO202211552826597&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01 |
PageCount | 8 |
ParticipantIDs | kisti_ndsl_JAKO202211552826597 kyobo_bookcenter_4010037002610 |
PublicationCentury | 2000 |
PublicationDate | 2022-12-31 |
PublicationDateYYYYMMDD | 2022-12-31 |
PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-31 day: 31 |
PublicationDecade | 2020 |
PublicationTitle | 스마트미디어저널 |
PublicationTitleAlternate | Smart media journal |
PublicationYear | 2022 |
Publisher | 한국스마트미디어학회 Korean Institute of Smart Media |
Publisher_xml | – name: 한국스마트미디어학회 – name: Korean Institute of Smart Media |
SSID | ssib012146176 ssib036278714 ssib022315842 ssib051117086 ssib044760798 |
Score | 1.8169761 |
Snippet | With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or... |
SourceID | kisti kyobo |
SourceType | Open Access Repository Publisher |
StartPage | 17 |
TableOfContents | 서론 Ⅱ. 관련 연구 Ⅲ. 언어장애인을 위한 개인화된 음성모델 학습 Ⅳ. 실험 및 결과 Ⅴ. 결론 REFERENCES |
Title | 언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법 |
URI | https://scholar.kyobobook.co.kr/article/detail/4010037002610 http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO202211552826597&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01 |
Volume | 11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA9tT15EUfGz5OCcSiSTTJKZY7K7pVSqlxV6WzKbBKSyC3V70INQ3IMUDwptqbArPbR4rSK6YP-i3dn_wfcmyX6I-NFL9jHzm_fezJtk3kv2zRjG_czPpISnnOUIHlsM5oQVM7tppTCbGPNg1YgxUXjjkb_2hK1vepsLi7uz2SUd-aD58rd5JRexKpSBXTFL9j8sO2EKBUCDfeEKFobrP9mY1CpE-CTiJcE0USOhpwmPhLYmqiWmSgRfQYqHJAR0RMKAcKirYkkOKqoAykhUwZIIYEw3A36CkhpAg6KOA4iuaLhfyOUu-KeFOJ63ExqIIA_oFc3BxtKpdlAnUCToJOwclHOYExORyNesIhSmO6P1Bm5Y55D8MM3S3_5jV6FktSBArmBzowg95ZonCBOTbzqlvqxgHAktH1SqzEKwMdM1TI9X3mgOEnll9_VQYCNQj82-hXGccq_H4r6ZDKBmDdatXrR_OR-hibBUXq8HDgS3Fr48mFu86OxNSmeWojwltnBq8jz1X3YWh9Aatx_SYbi9aCy6FBeKjVe18oFM9anvU38VXEkKDuvEHwbnB57308_wjAW-HUzjd3DmaWDr01YnykMQiJHRU_x90Zaz3lz9inG5CMPMML-nrhoLW-1rhlSHA3X4VX08VQfHqj9Q_SNT7Z2MPr0Z7w2AGO-fqx8npjp-P_x-rrqfzfHhqXq9q_pdU_W644OeOTzrQbvxh_3Ru56p-m8RNPrWHZ0emcPB2ejLwXWjvlqrV9as4gwSawtCAQuiC99N4iblgqIv7qUpRKNZGvA4EzZLM8kyTOWmMqaMZ7abySDxZcJ54tvSC9wbxlKr3UpvGqb03NilsokRO3OozVOeBXYzwVR0L5HBLWNZD0yjlTx_1lgPHz7GeUZxi0Tu-BD3IwBHrIHhN_51O91uzJnw9t8Ad4xL07l711jqbO-k98Cn7shlbfef786fkg |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%EC%96%B8%EC%96%B4%EC%9E%A5%EC%95%A0%EC%9D%B8%EC%9D%98+%EC%8A%A4%EB%A7%88%ED%8A%B8%EC%8A%A4%ED%94%BC%EC%BB%A4+%EC%A0%91%EA%B7%BC%EC%84%B1+%ED%96%A5%EC%83%81%EC%9D%84+%EC%9C%84%ED%95%9C+%EA%B0%9C%EC%9D%B8%ED%99%94%EB%90%9C+%EC%9D%8C%EC%84%B1+%EB%B6%84%EB%A5%98+%EA%B8%B0%EB%B2%95&rft.jtitle=%EC%8A%A4%EB%A7%88%ED%8A%B8%EB%AF%B8%EB%94%94%EC%96%B4%EC%A0%80%EB%84%90&rft.au=%EC%9D%B4%EC%8A%B9%EA%B6%8C&rft.au=%EC%A0%84%EA%B4%91%EC%9D%BC&rft.au=%EC%B5%9C%EC%9A%B0%EC%A7%84&rft.date=2022-12-31&rft.pub=%ED%95%9C%EA%B5%AD%EC%8A%A4%EB%A7%88%ED%8A%B8%EB%AF%B8%EB%94%94%EC%96%B4%ED%95%99%ED%9A%8C&rft.issn=2287-1322&rft.volume=11&rft.issue=11&rft.spage=17&rft.epage=24&rft.externalDocID=4010037002610 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2287-1322&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2287-1322&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2287-1322&client=summon |